Some calls to strcpy(3) triggers a false warning from static
analysers that are less intelligent than humans, and reducing the
number of these false hits helps us notice real issues. A few
calls to strcpy(3) in "git rerere" that are already safe has been
rewritten to avoid false wanings.
* jk/rerere-xsnprintf:
rerere: replace strcpy with xsnprintf
This shouldn't overflow, as we are copying a sha1 hex into a
41-byte buffer. But it does not hurt to use a bound-checking
function, which protects us and makes auditing for overflows
easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up and minor fixes.
* jc/rerere: (21 commits)
rerere: un-nest merge() further
rerere: use "struct rerere_id" instead of "char *" for conflict ID
rerere: call conflict-ids IDs
rerere: further clarify do_rerere_one_path()
rerere: further de-dent do_plain_rerere()
rerere: refactor "replay" part of do_plain_rerere()
rerere: explain the remainder
rerere: explain "rerere forget" codepath
rerere: explain the primary codepath
rerere: explain MERGE_RR management helpers
rerere: fix benign off-by-one non-bug and clarify code
rerere: explain the rerere I/O abstraction
rerere: do not leak mmfile[] for a path with multiple stage #1 entries
rerere: stop looping unnecessarily
rerere: drop want_sp parameter from is_cmarker()
rerere: report autoupdated paths only after actually updating them
rerere: write out each record of MERGE_RR in one go
rerere: lift PATH_MAX limitation
rerere: plug conflict ID leaks
rerere: handle conflicts with multiple stage #1 entries
...
There's a bug in builtin/am.c in which we take a lock on
MERGE_RR recursively. But rather than fix am.c, this patch
fixes the confusing interface from rerere.c that caused the
bug. Read on for the gory details.
The setup_rerere() function both reads the existing MERGE_RR
file, and takes MERGE_RR.lock. In the rerere() and
rerere_forget() functions, we end up in write_rr(), which
will then commit the lock file.
But for functions like rerere_clear() that do not write to
MERGE_RR, we expect the caller to have handled
setup_rerere(). That caller would then need to release the
lockfile, but it can't; the lock struct is local to
rerere.c.
For builtin/rerere.c, this is OK. We run a single rerere
operation and then exit immediately, which has the side
effect of rolling back the lockfile.
But in builtin/am.c, this is actively wrong. If we run "git
am -3 --skip", we call setup-rerere twice without releasing
the lock:
1. The "--skip" causes us to call am_rerere_clear(), which
calls setup_rerere(), but never drops the lock.
2. We then proceed to the next patch.
3. The "--3way" may cause us to call rerere() to handle
conflicts in that patch, but we are already holding the
lock. The lockfile code dies with:
BUG: prepare_tempfile_object called for active object
We could fix this by having rerere_clear() call
rollback_lock_file(). But it feels a bit odd for it to roll
back a lockfile that it did not itself take. So let's
simplify the interface further, and handle setup_rerere in
the function itself, taking away the question from the
caller over whether they need to do so.
We can give rerere_gc() the same treatment, as well (even
though it doesn't have any callers besides builtin/rerere.c
at this point). Note that these functions don't take flags
from their callers to pass along to setup_rerere; that's OK,
because the flags would not be meaningful for what they are
doing.
Both of those functions need to hold the lock because even
though they do not write to MERGE_RR, they are still writing
and should be protected from a simultaneous "rerere" run.
But rerere_remaining(), "rerere diff", and "rerere status"
are all read-only operations. They want to setup_rerere(),
but do not care about taking the lock in the first place.
Since our update of MERGE_RR is the usual atomic rename done
by commit_lock_file, they can just do a lockless read. For
that, we teach setup_rerere a READONLY flag to avoid the
lock.
As a bonus, this pushes builtin/rerere.c's setup_rerere call
closer to the functions that use it. Which means that "git
rerere totally-bogus-command" will no longer silently
exit(0) in a repository without rerere enabled.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By consistently using "upon failure, set 'ret' and jump to out"
pattern, flatten the function further.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This gives a thin abstraction between the conflict ID that is a hash
value obtained by inspecting the conflicts and the name of the
directory under $GIT_DIR/rr-cache/, in which the previous resolution
is recorded to be replayed. The plan is to make sure that the
presence of the directory does not imply the presense of a previous
resolution and vice-versa, and later allow us to have more than one
pair of <preimage, postimage> for a given conflict ID.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most places we call conflict IDs "name" and some others we call them
"hex"; update all of them to "id".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the body of a loop that attempts to replay recorded
resolution for each conflicted path into a helper function, not
because I want to call it from multiple places later, but because
the logic has become too deeply nested and hard to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere gc" and "rerere
clear".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere forget".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the codepath reached from rerere(), the primary
interface to the subsystem.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the "$GIT_DIR/MERGE_RR" file and in-core merge_rr
that are used to keep track of the status of "rerere" session in
progress.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rerere_io_putconflict() wants to use a limited fixed-sized buf[] on
stack repeatedly to formulate a longer string, but its implementation
is doubly confusing:
* When it knows that the whole thing fits in buf[], it wants to
fill early part of buf[] with conflict marker characters,
followed by a LF and a NUL. It miscounts the size of the buffer
by 1 and does not use the last byte of buf[].
* When it needs to show only the early part of a long conflict
marker string (because the whole thing does not fit in buf[]), it
adjusts the number of bytes shown in the current round in a
strange-looking way. It makes sure that this round does not emit
all bytes and leaves at least one byte to the next round, so that
"it all fits" case will pick up the rest and show the terminating
LF. While this is correct, one needs to stop and think for a
while to realize why it is correct without an explanation.
Fix the benign off-by-one, and add comments to explain the
strange-looking size adjustment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments.
This one covers our thin I/O abstraction to read from either
a file or a memory while optionally writing out to a file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Plug the leak by reading only the first one of the same stage
entries.
Strictly speaking, this fix does change the semantics, in that we
used to use the last stage #1 entry as the common ancestor when
doing the plain-vanilla three-way merge, but with the leak fix, we
will use the first stage #1 entry. But it is not a grave backward
compatibility breakage. Either way, we are arbitrarily picking one
of multiple stage #1 entries and using it, ignoring others, and
there is no meaning in the ordering of these stage #1 entries.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
handle_cache() loops 3 times starting from an index entry that is
unmerged, while ignoring an entry for a path that is different from
what we are looking for.
As the index is sorted, once we see a different path, we know we saw
all stages for the path we are interested in. Just loop while we
see the same path and then break, instead of continuing for 3 times.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the nature of the conflict marker line determines if there should
be a SP and label after it, the caller shouldn't have to pass the
parameter redundantly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing the hash for a conflict, a HT, and the path
with three separate write_in_full() calls, format them into a
single record into a strbuf and write it out in one go.
As a more recent "rerere remaining" codepath abuses the .util field
of the merge_rr data to store a sentinel token, make sure that
codepath does not call into this function (of course, "remaining" is
a read-only operation and currently does not call it).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MERGE_RR file records a collection of NUL-terminated entries,
each of which consists of
- a hash that identifies the conflict
- a HT
- the pathname
We used to read this piece-by-piece, and worse yet, read the
pathname part a byte at a time into a fixed buffer of size PATH_MAX.
Instead, read a whole entry using strbuf_getwholeline() and parse
out the fields. This way, we issue fewer read(2) calls and more
importantly we do not have to limit the pathname to PATH_MAX.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge_rr string list stores the conflict ID (a hexadecimal
string that is used to index into $GIT_DIR/rr-cache) in the .util
field of its elements, and when do_plain_rerere() resolves a
conflict, the field is cleared. Also, when rerere_forget()
recomputes the conflict ID to updates the preimage file, the
conflict ID for the path is updated.
We forgot to free the existing conflict ID when we did these two
operations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When ac49f5ca (rerere "remaining", 2011-02-16) split out a new
helper function check_one_conflict() out of find_conflict()
function, so that the latter will use the returned value from the
new helper to update the loop control variable that is an index into
active_cache[], the new variable incremented the index by one too
many when it found a path with only stage #1 entry at the very end
of active_cache[].
This "strange" return value does not have any effect on the loop
control of two callers of this function, as they all notice that
active_nr+2 is larger than active_nr just like active_nr+1 is, but
nevertheless it puzzles the readers when they are trying to figure
out what the function is trying to do.
In fact, there is no need to do an early return. The code that
follows after skipping the stage #1 entry is fully prepared to
handle a case where the entry is at the very end of active_cache[].
Help future readers from unnecessary confusion by dropping an early
return. We skip the stage #1 entry, and if there are stage #2 and
stage #3 entries for the same path, we diagnose the path as
THREE_STAGED (otherwise we say PUNTED), and then we skip all entries
for the same path.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rerere forget" in a repository without rerere enabled gave a
cryptic error message; it should be a silent no-op instead.
* jk/rerere-forget-check-enabled:
rerere: exit silently on "forget" when rerere is disabled
If you run "git rerere forget foo" in a repository that does
not have rerere enabled, git hits an internal error:
$ git init -q
$ git rerere forget foo
fatal: BUG: attempt to commit unlocked object
The problem is that setup_rerere() will not actually take
the lock if the rerere system is disabled. We should notice
this and return early. We can return with a success code
here, because we know there is nothing to forget.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rerere" (invoked internally from many mergy operations) did
not correctly signal errors when told to update the working tree
files and failed to do so for whatever reason.
* jn/rerere-fail-on-auto-update-failure:
rerere: error out on autoupdate failure
"git rerere" (invoked internally from many mergy operations) did
not correctly signal errors when told to update the working tree
files and failed to do so for whatever reason.
* jn/rerere-fail-on-auto-update-failure:
rerere: error out on autoupdate failure
We have been silently tolerating errors by returning early with an
error that the caller ignores since rerere.autoupdate was introduced
in v1.6.0-rc0~120^2 (2008-06-22). So on error (for example if the
index is already locked), rerere can return success silently without
updating the index or with only some items in the index updated.
Better to treat such failures as a fatal error so the operator can
figure out what is wrong and fix it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
fsck: simplify fsck_commit_buffer() by using commit_list_count()
commit: use commit_list_append() instead of duplicating its code
merge: simplify merge_trivial() by using commit_list_append()
use strbuf_addch for adding single characters
use strbuf_addbuf for adding strbufs
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop that fills in the buffers that are later passed to the merge
driver exits early when not all stages of a path are present in the index.
But since the buffer pointers are not initialized in advance, the
subsequent accesses are undefined.
Initialize buffer pointers in advance to avoid undefined behavior later.
That is not sufficient, though, to get correct operation of handle_cache().
The function replays a conflicted merge to extract the part inside the
conflict markers. As written, the loop exits early when a stage is missing.
Consequently, the buffers for later stages that would be present in the
index are not filled in and the merge is replayed with incomplete data.
Fix it by investigating all stages of the given path.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using 'git rerere forget .' after a merge that involved binary files
runs into an infinite loop if the binary file contains a zero byte.
Replace a strchrnul by memchr because the former does not make progress
as soon as the NUL is encountered.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of files and directories we create had tighter than
necessary permission bits when the user wanted to have group
writability (e.g. by setting "umask 002").
* ar/clone-honor-umask-at-top:
add: create ADD_EDIT.patch with mode 0666
rerere: make rr-cache fanout directory honor umask
Restore umasks influence on the permissions of work tree created by clone
This is the last remaining call to mkdir(2) that restricts the permission
bits by passing 0755. Just use the same mkdir_in_gitdir() used to create
the leaf directories.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
git-submodule.sh: separate parens by a space to avoid confusing some shells
Documentation/technical/api-diff.txt: correct name of diff_unmerge()
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
* jm/maint-misc-fix:
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'