The shape of the conflict in a path determines the conflict ID. The
preimage and postimage pair that was recorded for the conflict ID
previously may or may not replay well for the conflict we just saw.
Currently, we punt when the previous resolution does not cleanly
replay, but ideally we should then be able to record the currently
conflicted path by assigning a new 'variant', and then record the
resolution the user is going to make.
Introduce a mechanism to have more than one variant for a given
conflict ID; we do not actually assign any variant other than 0th
variant yet at this step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We record the preimage only when there is no directory to record the
conflict we encountered, i.e. when $GIT_DIR/rr-cache/$ID does not
exist. As the plan is to allow multiple <preimage,postimage> pairs
as variants for the same conflict ID eventually, this logic needs to
go.
As the first step in that direction, stop the "did we create the
directory? Then we record the preimage" logic. Instead, we record
if a preimage does not exist when we saw a conflict in a path. Also
make sure that we remove a stale postimage, which most likely is
totally unrelated to the resolution of this new conflict, when we
create a new preimage under $ID when $GIT_DIR/rr-cache/$ID already
exists.
In later patches, we will further update this logic to be "do we
have <preimage,postimage> pair that cleanly resolve the current
conflicts? If not, record a new preimage as a new variant", but
that does not happen at this stage yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If by some accident there is only $GIT_DIR/rr-cache/$ID directory
existed, we wouldn't have recorded a preimage for a conflict that
is newly encountered, which would mean after a manual resolution,
we wouldn't have recorded it by storing the postimage, because the
logic used to be "if there is no rr-cache/$ID directory, then we are
the first so record the preimage". Instead, record preimage if we
do not have one.
In addition, if there is only $GIT_DIR/rr-cache/$ID/postimage
without corresponding preimage, we would have tried to call into
merge() and punted.
These would have been a situation frustratingly hard to recover
from.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will help fixing bootstrap corner-case issues, e.g. having an
empty $GIT_DIR/rr-cache/$ID directory would fail to record a
preimage, in later changes in this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plan is to keep assigning the backward compatible conflict ID
based on the hash of the (normalized) text of conflicts, keep using
that conflict ID as the directory name under $GIT_DIR/rr-cache/, but
allow each conflicted path to use a separate "variant" to record
resolutions, i.e. having more than one <preimage,postimage> pairs
under $GIT_DIR/rr-cache/$ID/ directory. As the first step in that
direction, separate the shared "conflict ID" out of the rerere_id
structure.
The plan is to keep information per $ID in rerere_dir, that can be
shared among rerere_id that is per conflicted path.
When we are done with rerere(), which can be directly called from
other programs like "git apply", "git commit" and "git merge", the
shared rerere_dir structures can be freed entirely, so they are not
reference-counted and they are not freed when we release rerere_id's
that reference them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shouldn't overflow, as we are copying a sha1 hex into a
41-byte buffer. But it does not hurt to use a bound-checking
function, which protects us and makes auditing for overflows
easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By consistently using "upon failure, set 'ret' and jump to out"
pattern, flatten the function further.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This gives a thin abstraction between the conflict ID that is a hash
value obtained by inspecting the conflicts and the name of the
directory under $GIT_DIR/rr-cache/, in which the previous resolution
is recorded to be replayed. The plan is to make sure that the
presence of the directory does not imply the presense of a previous
resolution and vice-versa, and later allow us to have more than one
pair of <preimage, postimage> for a given conflict ID.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most places we call conflict IDs "name" and some others we call them
"hex"; update all of them to "id".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Extract the body of a loop that attempts to replay recorded
resolution for each conflicted path into a helper function, not
because I want to call it from multiple places later, but because
the logic has become too deeply nested and hard to read.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere gc" and "rerere
clear".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This covers the codepath that implements "rerere forget".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the codepath reached from rerere(), the primary
interface to the subsystem.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments, while
sprinkling "NEEDSWORK" comment to highlight iffy bits and
questionable assumptions.
This one covers the "$GIT_DIR/MERGE_RR" file and in-core merge_rr
that are used to keep track of the status of "rerere" session in
progress.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rerere_io_putconflict() wants to use a limited fixed-sized buf[] on
stack repeatedly to formulate a longer string, but its implementation
is doubly confusing:
* When it knows that the whole thing fits in buf[], it wants to
fill early part of buf[] with conflict marker characters,
followed by a LF and a NUL. It miscounts the size of the buffer
by 1 and does not use the last byte of buf[].
* When it needs to show only the early part of a long conflict
marker string (because the whole thing does not fit in buf[]), it
adjusts the number of bytes shown in the current round in a
strange-looking way. It makes sure that this round does not emit
all bytes and leaves at least one byte to the next round, so that
"it all fits" case will pick up the rest and show the terminating
LF. While this is correct, one needs to stop and think for a
while to realize why it is correct without an explanation.
Fix the benign off-by-one, and add comments to explain the
strange-looking size adjustment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explain the internals of rerere as in-code comments.
This one covers our thin I/O abstraction to read from either
a file or a memory while optionally writing out to a file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Plug the leak by reading only the first one of the same stage
entries.
Strictly speaking, this fix does change the semantics, in that we
used to use the last stage #1 entry as the common ancestor when
doing the plain-vanilla three-way merge, but with the leak fix, we
will use the first stage #1 entry. But it is not a grave backward
compatibility breakage. Either way, we are arbitrarily picking one
of multiple stage #1 entries and using it, ignoring others, and
there is no meaning in the ordering of these stage #1 entries.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
handle_cache() loops 3 times starting from an index entry that is
unmerged, while ignoring an entry for a path that is different from
what we are looking for.
As the index is sorted, once we see a different path, we know we saw
all stages for the path we are interested in. Just loop while we
see the same path and then break, instead of continuing for 3 times.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the nature of the conflict marker line determines if there should
be a SP and label after it, the caller shouldn't have to pass the
parameter redundantly.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of writing the hash for a conflict, a HT, and the path
with three separate write_in_full() calls, format them into a
single record into a strbuf and write it out in one go.
As a more recent "rerere remaining" codepath abuses the .util field
of the merge_rr data to store a sentinel token, make sure that
codepath does not call into this function (of course, "remaining" is
a read-only operation and currently does not call it).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MERGE_RR file records a collection of NUL-terminated entries,
each of which consists of
- a hash that identifies the conflict
- a HT
- the pathname
We used to read this piece-by-piece, and worse yet, read the
pathname part a byte at a time into a fixed buffer of size PATH_MAX.
Instead, read a whole entry using strbuf_getwholeline() and parse
out the fields. This way, we issue fewer read(2) calls and more
importantly we do not have to limit the pathname to PATH_MAX.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The merge_rr string list stores the conflict ID (a hexadecimal
string that is used to index into $GIT_DIR/rr-cache) in the .util
field of its elements, and when do_plain_rerere() resolves a
conflict, the field is cleared. Also, when rerere_forget()
recomputes the conflict ID to updates the preimage file, the
conflict ID for the path is updated.
We forgot to free the existing conflict ID when we did these two
operations.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A conflicted index can have multiple stage #1 entries when dealing
with a criss-cross merge and using the "resolve" merge strategy.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When ac49f5ca (rerere "remaining", 2011-02-16) split out a new
helper function check_one_conflict() out of find_conflict()
function, so that the latter will use the returned value from the
new helper to update the loop control variable that is an index into
active_cache[], the new variable incremented the index by one too
many when it found a path with only stage #1 entry at the very end
of active_cache[].
This "strange" return value does not have any effect on the loop
control of two callers of this function, as they all notice that
active_nr+2 is larger than active_nr just like active_nr+1 is, but
nevertheless it puzzles the readers when they are trying to figure
out what the function is trying to do.
In fact, there is no need to do an early return. The code that
follows after skipping the stage #1 entry is fully prepared to
handle a case where the entry is at the very end of active_cache[].
Help future readers from unnecessary confusion by dropping an early
return. We skip the stage #1 entry, and if there are stage #2 and
stage #3 entries for the same path, we diagnose the path as
THREE_STAGED (otherwise we say PUNTED), and then we skip all entries
for the same path.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rerere" (invoked internally from many mergy operations) did
not correctly signal errors when told to update the working tree
files and failed to do so for whatever reason.
* jn/rerere-fail-on-auto-update-failure:
rerere: error out on autoupdate failure
We have been silently tolerating errors by returning early with an
error that the caller ignores since rerere.autoupdate was introduced
in v1.6.0-rc0~120^2 (2008-06-22). So on error (for example if the
index is already locked), rerere can return success silently without
updating the index or with only some items in the index updated.
Better to treat such failures as a fatal error so the operator can
figure out what is wrong and fix it.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
fsck: simplify fsck_commit_buffer() by using commit_list_count()
commit: use commit_list_append() instead of duplicating its code
merge: simplify merge_trivial() by using commit_list_append()
use strbuf_addch for adding single characters
use strbuf_addbuf for adding strbufs
This patch activates the DO_MATCH_DIRECTORY code in m_p_i(), which
makes "git diff HEAD submodule/" and "git diff HEAD submodule" produce
the same output. Previously only the version without trailing slash
returns the difference (if any).
That's the effect of new ce_path_match(). dir_path_match() is not
executed by the new tests. And it should not introduce regressions.
Previously if path "dir/" is passed in with pathspec "dir/", they
obviously match. With new dir_path_match(), the path becomes
_directory_ "dir" vs pathspec "dir/", which is not executed by the old
code path in m_p_i(). The new code path is executed and produces the
same result.
The other case is pathspec "dir" and path "dir/" is now turned to
"dir" (with DO_MATCH_DIRECTORY). Still the same result before or after
the patch.
So why change? Because of the next patch about clean.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A long time ago, for some reason I was not happy with
match_pathspec(). I created a better version, match_pathspec_depth()
that was suppose to replace match_pathspec()
eventually. match_pathspec() has finally been gone since 6 months
ago. Use the shorter name for match_pathspec_depth().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is
- diff-lib.c: setting CE_UPTODATE
- name-hash.c: setting CE_HASHED
- preload-index.c, read-cache.c, unpack-trees.c and
builtin/update-index: obvious
- entry.c: write_entry() may refresh the checked out entry via
fill_stat_cache_info(). This causes "non-const struct cache_entry
*" in builtin/apply.c, builtin/checkout-index.c and
builtin/checkout.c
- builtin/ls-files.c: --with-tree changes stagemask and may set
CE_UPDATE
Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.
So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.
The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:
diff --git a/cache.h b/cache.h
index 430d021..1692891 100644
--- a/cache.h
+++ b/cache.h
@@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
struct index_state {
- struct cache_entry **cache;
+ const struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
will help quickly identify them without bogus warnings.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The loop that fills in the buffers that are later passed to the merge
driver exits early when not all stages of a path are present in the index.
But since the buffer pointers are not initialized in advance, the
subsequent accesses are undefined.
Initialize buffer pointers in advance to avoid undefined behavior later.
That is not sufficient, though, to get correct operation of handle_cache().
The function replays a conflicted merge to extract the part inside the
conflict markers. As written, the loop exits early when a stage is missing.
Consequently, the buffers for later stages that would be present in the
index are not filled in and the merge is replayed with incomplete data.
Fix it by investigating all stages of the given path.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using 'git rerere forget .' after a merge that involved binary files
runs into an infinite loop if the binary file contains a zero byte.
Replace a strchrnul by memchr because the former does not make progress
as soon as the NUL is encountered.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A handful of files and directories we create had tighter than
necessary permission bits when the user wanted to have group
writability (e.g. by setting "umask 002").
* ar/clone-honor-umask-at-top:
add: create ADD_EDIT.patch with mode 0666
rerere: make rr-cache fanout directory honor umask
Restore umasks influence on the permissions of work tree created by clone
This is the last remaining call to mkdir(2) that restricts the permission
bits by passing 0755. Just use the same mkdir_in_gitdir() used to create
the leaf directories.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
git-submodule.sh: separate parens by a space to avoid confusing some shells
Documentation/technical/api-diff.txt: correct name of diff_unmerge()
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
* jm/maint-misc-fix:
read_gitfile_gently: use ssize_t to hold read result
remove tests of always-false condition
rerere.c: diagnose a corrupt MERGE_RR when hitting EOF between TAB and '\0'
If we reach EOF after the SHA1-then-TAB, yet before the NUL that
terminates each file name, we would fill the file name buffer with \255
bytes resulting from the repeatedly-failing fgetc (returns EOF/-1) and
ultimately complain about "filename too long", because no NUL was
encountered.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>