spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.
* jk/cocci:
show_dirstat: simplify same-content check
read-cache: use oideq() in ce_compare functions
convert hashmap comparison functions to oideq()
convert "hashcmp() != 0" to "!hasheq()"
convert "oidcmp() != 0" to "!oideq()"
convert "hashcmp() == 0" to hasheq()
convert "oidcmp() == 0" to oideq()
introduce hasheq() and oideq()
coccinelle: use <...> for function exclusion
The unpack_trees() API used in checking out a branch and merging
walks one or more trees along with the index. When the cache-tree
in the index tells us that we are walking a tree whose flattened
contents is known (i.e. matches a span in the index), as linearly
scanning a span in the index is much more efficient than having to
open tree objects recursively and listing their entries, the walk
can be optimized, which is done in this topic.
* nd/unpack-trees-with-cache-tree:
Document update for nd/unpack-trees-with-cache-tree
cache-tree: verify valid cache-tree in the test suite
unpack-trees: add missing cache invalidation
unpack-trees: reuse (still valid) cache-tree from src_index
unpack-trees: reduce malloc in cache-tree walk
unpack-trees: optimize walking same trees with cache-tree
unpack-trees: add performance tracing
trace.h: support nested performance tracing
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:
if (oidcmp(E1, E2))
As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.
There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.
* nd/no-the-index: (24 commits)
blame.c: remove implicit dependency on the_index
apply.c: remove implicit dependency on the_index
apply.c: make init_apply_state() take a struct repository
apply.c: pass struct apply_state to more functions
resolve-undo.c: use the right index instead of the_index
archive-*.c: use the right repository
archive.c: avoid access to the_index
grep: use the right index instead of the_index
attr: remove index from git_attr_set_direction()
entry.c: use the right index instead of the_index
submodule.c: use the right index instead of the_index
pathspec.c: use the right index instead of the_index
unpack-trees: avoid the_index in verify_absent()
unpack-trees: convert clear_ce_flags* to avoid the_index
unpack-trees: don't shadow global var the_index
unpack-trees: add a note about path invalidation
unpack-trees: remove 'extern' on function declaration
ls-files: correct index argument to get_convert_attr_ascii()
preload-index.c: use the right index instead of the_index
dir.c: remove an implicit dependency on the_index in pathspec code
...
Performance measurements are listed right now as a flat list, which is
fine when we measure big blocks. But when we start adding more and
more measurements, some of them could be just part of a bigger
measurement and a flat list gives a wrong impression that they are
executed at the same level instead of nested.
Add trace_performance_enter() and trace_performance_leave() to allow
indent these nested measurements. For now it does not help much
because the only nested thing is (lazy) name hash initialization
(e.g. called in diff-index from "git status"). This will help more
because I'm going to add some more tracing that's actually nested.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many more strings are prepared for l10n.
* nd/i18n: (23 commits)
transport-helper.c: mark more strings for translation
transport.c: mark more strings for translation
sha1-file.c: mark more strings for translation
sequencer.c: mark more strings for translation
replace-object.c: mark more strings for translation
refspec.c: mark more strings for translation
refs.c: mark more strings for translation
pkt-line.c: mark more strings for translation
object.c: mark more strings for translation
exec-cmd.c: mark more strings for translation
environment.c: mark more strings for translation
dir.c: mark more strings for translation
convert.c: mark more strings for translation
connect.c: mark more strings for translation
config.c: mark more strings for translation
commit-graph.c: mark more strings for translation
builtin/replace.c: mark more strings for translation
builtin/pack-objects.c: mark more strings for translation
builtin/grep.c: mark strings for translation
builtin/config.c: mark more strings for translation
...
Make the match_patchspec API and friends take an index_state instead
of assuming the_index in dir.c. All external call sites are converted
blindly to keep the patch simple and retain current behavior.
Individual call sites may receive further updates to use the right
index instead of the_index.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.
There is one ugly temporary workaround added in attr.c that needs some
more explanation.
Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.
The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are
- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
(on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
if not redundant
- errno_errno() is used instead of explicit strerror()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-grafts:
commit: allow lookup_commit_graft to handle arbitrary repositories
commit: allow prepare_commit_graft to handle arbitrary repositories
shallow: migrate shallow information into the object parser
path.c: migrate global git_path_* to take a repository argument
cache: convert get_graft_file to handle arbitrary repositories
commit: convert read_graft_file to handle arbitrary repositories
commit: convert register_commit_graft to handle arbitrary repositories
commit: convert commit_graft_pos() to handle arbitrary repositories
shallow: add repository argument to is_repository_shallow
shallow: add repository argument to check_shallow_file_for_update
shallow: add repository argument to register_shallow
shallow: add repository argument to set_alternate_shallow_file
commit: add repository argument to lookup_commit_graft
commit: add repository argument to prepare_commit_graft
commit: add repository argument to read_graft_file
commit: add repository argument to register_commit_graft
commit: add repository argument to commit_graft_pos
object: move grafts to object parser
object-store: move object access functions to object-store.h
Make it easier to find references to core.excludesfile and the default
$XDG_CONFIG_HOME/git/ignore path.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code to read compressed bitmap was not careful to avoid reading
past the end of the file, which has been corrected.
* jk/ewah-bounds-check:
ewah: adjust callers of ewah_read_mmap()
ewah_read_mmap: bounds-check mmap reads
The return value of ewah_read_mmap() is now an ssize_t,
since we could (in theory) process up to 32GB of data. This
would never happen in practice, but a corrupt or malicious
.bitmap or index file could convince us to do so.
Let's make sure that we don't stuff the value into an int,
which would cause us to incorrectly move our pointer
forward. We'd always move too little, since negative values
are used for reporting errors. So the worst case is just
that we end up reporting a corrupt file, not an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (42 commits)
merge-one-file: compute empty blob object ID
add--interactive: compute the empty tree value
Update shell scripts to compute empty tree object ID
sha1_file: only expose empty object constants through git_hash_algo
dir: use the_hash_algo for empty blob object ID
sequencer: use the_hash_algo for empty tree object ID
cache-tree: use is_empty_tree_oid
sha1_file: convert cached object code to struct object_id
builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
submodule: convert several uses of EMPTY_TREE_SHA1_HEX
sequencer: convert one use of EMPTY_TREE_SHA1_HEX
merge: convert empty tree constant to the_hash_algo
builtin/merge: switch tree functions to use object_id
builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
sha1-file: add functions for hex empty tree and blob OIDs
builtin/receive-pack: avoid hard-coded constants for push certs
diff: specify abbreviation size in terms of the_hash_algo
upload-pack: replace use of several hard-coded constants
...
* maint-2.16:
Git 2.16.4
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Moving a submodule that itself has submodule in it with "git mv"
forgot to make necessary adjustment to the nested sub-submodules;
now the codepath learned to recurse into the submodules.
* sb/submodule-move-nested:
submodule: fixup nested submodules after moving the submodule
submodule-config: remove submodule_from_cache
submodule-config: add repository argument to submodule_from_{name, path}
submodule-config: allow submodule_free to handle arbitrary repositories
grep: remove "repo" arg from non-supporting funcs
submodule.h: drop declaration of connect_work_tree_and_git_dir
To ensure that we are hash algorithm agnostic, use the_hash_algo to look
up the object ID for the empty blob instead of using the empty_tree_oid
variable.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the exclude_sha1 member of struct untracked_cache_dir and rename
it to exclude_oid. Eliminate several hard-coded integral constants, and
update a function name that referred to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
connect_work_tree_and_git_dir is used to connect a submodule worktree with
its git directory and vice versa after events that require a reconnection
such as moving around the working tree. As submodules can have nested
submodules themselves, we'd also want to fix the nested submodules when
asked to. Add an option to recurse into the nested submodules and connect
them as well.
As submodules are identified by their name (which determines their git
directory in relation to their superproject's git directory) internally
and by their path in the working tree of the superproject, we need to
make sure that the mapping of name <-> path is kept intact. We can do
that in the git-mv command by writing out the gitmodules file first
and then forcing a reload of the submodule config machinery.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Writing out the index file when the only thing that changed in it
is the untracked cache information is often wasteful, and this has
been optimized out.
* bp/untracked-cache-noflush:
untracked cache: use git_env_bool() not getenv() for customization
dir.c: don't flag the index as dirty for changes to the untracked cache
GIT_DISABLE_UNTRACKED_CACHE and GIT_TEST_UNTRACKED_CACHE are only
sensed for their presense by using getenv(); use git_env_bool()
instead so that GIT_DISABLE_UNTRACKED_CACHE=false would work as
naïvely expected.
Also rename GIT_TEST_UNTRACKED_CACHE to GIT_FORCE_UNTRACKED_CACHE
to express what it does more honestly. Forcing its use may be one
useful thing to do while testing the feature, but testing does not
have to be the only use of the knob.
While at it, avoid repeated calls to git_env_bool() by capturing the
return value from the first call in a static variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some bugs around "untracked cache" feature have been fixed.
* nd/fix-untracked-cache-invalidation:
dir.c: ignore paths containing .git when invalidating untracked cache
dir.c: stop ignoring opendir() error in open_cached_dir()
dir.c: fix missing dir invalidation in untracked code
dir.c: avoid stat() in valid_cached_dir()
status: add a failing test showing a core.untrackedCache bug
read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.
This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".
Without this patch, we invalidate the root directory unncessarily,
which:
- makes read_directory() fall back to slow path for root directory
(slower)
- makes the index dirty (because UNTR extension is updated). Depending
on the index size, writing it down could also be slow.
A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.
Noticed-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The untracked cache saves its current state in the UNTR index extension.
Currently, _any_ change to that state causes the index to be flagged as dirty
and written out to disk. Unfortunately, the cost to write out the index can
exceed the savings gained by using the untracked cache. Since it is a cache
that can be updated from the current state of the working directory, there is
no functional requirement that the index be written out for every change to the
untracked cache.
Update the untracked cache logic so that it no longer forces the index to be
written to disk except in the case where the extension is being turned on or
off. When some other git command requires the index to be written to disk, the
untracked cache will take advantage of that to save it's updated state as well.
This results in a performance win when looked at over common sequences of git
commands (ie such as a status followed by add, commit, etc).
After this patch, all the logic to track statistics for the untracked cache
could be removed as it is only used by debug tracing used to debug the untracked
cache.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A follow-up to the recently fixed bugs in the untracked
invalidation. If opendir() fails it should show a warning, perhaps
this should die, but if this ever happens the error is probably
recoverable for the user, and dying would just make things worse.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.
Rename this function to hash_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration of struct sha1_stat. Adjust all usages of this
struct and replace hash{clr,cmp,cpy} with oid{clr,cmp,cpy} wherever
possible. Rename it to struct oid_stat.
Rename static function load_sha1_stat to load_oid_stat.
Remove macro EMPTY_BLOB_SHA1_BIN, as it's no longer used.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's start with how create a new directory cache after the last one
becomes invalid (e.g. because its dir mtime has changed...). In
open_cached_dir():
1. We start out with valid_cached_dir() returning false, which should
call invalidate_directory() to put a directory state back to
initial state, no untracked entries (untracked_nr zero), no sub
directory traversal (dirs[].recurse zero).
2. Since the cache cannot be used, we go the slow path opendir() and
go through items one by one via readdir(). All the directories on
disk will be added back to the cache (if not already exist in
dirs[]) and its flag "recurse" gets changed to one to note that
it's part of the cached dir travesal next time.
3. By the time we reach close_cached_dir() we should have a good
subdir list in dirs[]. Those with "recurse" flag set are the ones
present in the on-disk directory. The directory is now marked
"valid".
Next time read_directory() is called, since the directory is marked
valid, it will skip readdir(), go fast path and traverse through
dirs[] array instead.
Steps one and two need some tight cooperation. If a subdir is removed,
readdir() will not find it and of course we cannot examine/invalidate
it. To make sure removed directories on disk are gone from the cache,
step one must make sure recurse flag of all subdirs are zero.
But that's not true. If "valid" flag is already false, there is a
chance we go straight to the end of valid_cached_dir() without calling
invalidate_directory(). Or we fail to meet the "if (untracked-valid)"
condition and skip over the invalidate_directory().
After step 3, we mark the cache valid. Any stale subdir with incorrect
recurse flag becomes a real subdir next time we traverse the directory
using dirs[] array.
We could avoid this by making sure invalidate_directory() is always
called (therefore dirs[].recurse cleared) at the beginning of
open_cached_dir(). Which is what this patch does.
As to how we get into this situation, the key in the test is this
command
git checkout master
where "one/file" is replaced with "one" in the index. This index
update triggers untracked_cache_invalidate_path(), which clears valid
flag of the root directory while keeping "recurse" flag on the subdir
"one" on. On the next git-status, we go through steps 1-3 above and
save an incorrect cache on disk. The second git-status blindly follows
the bad cache data and shows the problem.
This is arguably because of a bad design where "recurse" flag plays
double roles: whether a directory should be saved on disk, and whether
it is part of a directory traversal.
We need to keep recurse flag set at "checkout master" because of the
first role: we need to keep subdir caches (dir "two" for example has
not been touched at all, no reason to throw its cache away).
As long as we make sure to ignore/reset "recurse" flag at the
beginning of a directory traversal, we're good. But maybe eventually
we should separate these two roles.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
stat() may follow a symlink and return stat data of the link's target
instead of the link itself. We are concerned about the link itself.
It's kind of hard to demonstrate the bug. I think when path->buf is a
symlink, we most likely find that its target's stat data does not
match our cached one, which means we ignore the cache and fall back to
slow path.
This is performance issue, not correctness (though we could still
catch it by verifying test-dump-untracked-cache. The less unlikely
case is, link target stat data matches the cached version and we
incorrectly go fast path, ignoring real data on disk. A test for this
may involve manipulating stat data, which may be not portable.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and
safer, as it automatically infers the size of elements.
Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.
* jh/object-filtering:
rev-list: support --no-filter argument
list-objects-filter-options: support --no-filter
list-objects-filter-options: fix 'keword' typo in comment
pack-objects: add list-objects filtering
rev-list: add list-objects filtering support
list-objects: filter objects in traverse_commit_list
oidset: add iterator methods to oidset
oidmap: add oidmap iterator methods
dir: allow exclusions from blob in addition to file
Refactor add_excludes() to separate the reading of the
exclude file into a buffer and the parsing of the buffer
into exclude_list items.
Add add_excludes_from_blob_to_list() to allow an exclude
file be specified with an OID without assuming a local
worktree or index exists.
Refactor read_skip_worktree_file_from_index() and add
do_read_blob() to eliminate duplication of preliminary
processing of blob contents.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
The set of paths output from "git status --ignored" was tied
closely with its "--untracked=<mode>" option, but now it can be
controlled more flexibly. Most notably, a directory that is
ignored because it is listed to be ignored in the ignore/exclude
mechanism can be handled differently from a directory that ends up
to be ignored only because all files in it are ignored.
* jm/status-ignored-files-list:
status: test ignored modes
status: document options to show matching ignored files
status: report matching ignored and normal untracked
status: add option to show ignored files differently
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (25 commits)
refs/files-backend: convert static functions to object_id
refs: convert read_raw_ref backends to struct object_id
refs: convert peel_object to struct object_id
refs: convert resolve_ref_unsafe to struct object_id
worktree: convert struct worktree to object_id
refs: convert resolve_gitlink_ref to struct object_id
Convert remaining callers of resolve_gitlink_ref to object_id
sha1_file: convert index_path and index_fd to struct object_id
refs: convert reflog_expire parameter to struct object_id
refs: convert read_ref_at to struct object_id
refs: convert peel_ref to struct object_id
builtin/pack-objects: convert to struct object_id
pack-bitmap: convert traverse_bitmap_commit_list to object_id
refs: convert dwim_log to struct object_id
builtin/reflog: convert remaining unsigned char uses to object_id
refs: convert dwim_ref and expand_ref to struct object_id
refs: convert read_ref and read_ref_full to object_id
refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
Convert check_connected to use struct object_id
refs: update ref transactions to use struct object_id
...
"git status --ignored -u" did not stop at a working tree of a
separate project that is embedded in an ignored directory and
listed files in that other project, instead of just showing the
directory itself as ignored.
* js/submodule-in-excluded:
status: do not get confused by submodules in excluded directories
Teach status command to handle `--ignored=matching` with
`--untracked-files=normal`
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the status command more flexibility in how ignored files are
reported. Currently, the reporting of ignored files and untracked
files are linked. You cannot control how ignored files are reported
independently of how untracked files are reported (i.e. `all` vs
`normal`). This makes it impossible to show untracked files with the
`all` option, but show ignored files with the `normal` option.
This work 1) adds the ability to control the reporting of ignored
files independently of untracked files and 2) introduces the concept
of status reporting ignored paths that explicitly match an ignored
pattern. There are 2 benefits to these changes: 1) if a consumer needs
all untracked files but not all ignored files, there is a performance
benefit to not scanning all contents of an ignored directory and 2)
returning ignored files that explicitly match a path allow a consumer
to make more informed decisions about when a status result might be
stale.
This commit implements --ignored=matching with --untracked-files=all.
The following commit will implement --ignored=matching with
--untracked=files=normal.
As an example of where this flexibility could be useful is that our
application (Visual Studio) runs the status command and presents the
output. It shows all untracked files individually (e.g. using the
'--untracked-files==all' option), and would like to know about which
paths are ignored. It uses information about ignored paths to make
decisions about when the status result might have changed.
Additionally, many projects place build output into directories inside
a repository's working directory (e.g. in "bin/" and "obj/"
directories). Normal usage is to explicitly ignore these 2 directory
names in the .gitignore file (rather than or in addition to the *.obj
pattern).If an application could know that these directories are
explicitly ignored, it could infer that all contents are ignored as
well and make better informed decisions about files in these
directories. It could infer that any changes under these paths would
not affect the output of status. Additionally, there can be a
significant performance benefit by avoiding scanning through ignored
directories.
When status is set to report matching ignored files, it has the
following behavior. Ignored files and directories that explicitly
match an exclude pattern are reported. If an ignored directory matches
an exclude pattern, then the path of the directory is returned. If a
directory does not match an exclude pattern, but all of its contents
are ignored, then the contained files are reported instead of the
directory.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We meticulously pass the `exclude` flag to the `treat_directory()`
function so that we can indicate that files in it are excluded rather
than untracked when recursing.
But we did not yet treat submodules the same way.
Because of that, `git status --ignored --untracked` with a submodule
`submodule` in a gitignored `tracked/` would show the submodule in the
"Untracked files" section, e.g.
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
tracked/submodule/
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
tracked/submodule/initial.t
Instead, we would want it to show the submodule in the "Ignored files"
section:
On branch master
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
tracked/submodule/
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the performance of the directory listing logic when it wants to list
non-empty ignored directories. In order to show non-empty ignored directories,
the existing logic will recursively iterate through all contents of an ignored
directory. This change introduces the optimization to stop iterating through
the contents once it finds the first file. This can have a significant
improvement in 'git status --ignored' performance in repositories with a large
number of files in ignored directories.
For an example of the performance difference on an example repository with
196,000 files in 400 ignored directories:
| Command | Time (s) |
| -------------------------- | --------- |
| git status | 1.2 |
| git status --ignored (old) | 3.9 |
| git status --ignored (new) | 1.4 |
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The untracked cache is stored on disk by concatenating its memory
structures without any padding. Consequently some of the structs are
not aligned at a particular boundary when the whole extension is read
back in one go. That's only OK on platforms without strict alignment
requirements, or for byte-aligned data like strings or hash values.
Decode struct ondisk_untracked_cache carefully from the extension
blob by using explicit pointer arithmetic with offsets, avoiding
alignment issues. Use char pointers for passing stat_data objects to
stat_data_from_disk(), and use memcpy(3) in that function to get the
contents into a properly aligned struct, then perform the byte-order
adjustment in place there.
Found with Clang's UBSan.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Code clean-up.
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
Remove the unused wildopts placeholder struct from being passed to all
wildmatch() invocations, or rather remove all the boilerplate NULL
parameters.
This parameter was added back in commit 9b3497cab9 ("wildmatch: rename
constants and update prototype", 2013-01-01) as a placeholder for
future use. Over 4 years later nothing has made use of it, let's just
remove it. It can be added in the future if we find some reason to
start using such a parameter.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We often try to open a file for reading whose existence is
optional, and silently ignore errors from open/fopen; report such
errors if they are not due to missing files.
* nd/fopen-errors:
mingw_fopen: report ENOENT for invalid file names
mingw: verify that paths are not mistaken for remote nicknames
log: fix memory leak in open_next_file()
rerere.c: move error_errno() closer to the source system call
print errno when reporting a system call error
wrapper.c: make warn_on_inaccessible() static
wrapper.c: add and use fopen_or_warn()
wrapper.c: add and use warn_on_fopen_errors()
config.mak.uname: set FREAD_READS_DIRECTORIES for Darwin, too
config.mak.uname: set FREAD_READS_DIRECTORIES for Linux and FreeBSD
clone: use xfopen() instead of fopen()
use xfopen() in more places
git_fopen: fix a sparse 'not declared' warning
Our code often opens a path to an optional file, to work on its
contents when we can successfully open it. We can ignore a failure
to open if such an optional file does not exist, but we do want to
report a failure in opening for other reasons (e.g. we got an I/O
error, or the file is there, but we lack the permission to open).
The exact errors we need to ignore are ENOENT (obviously) and
ENOTDIR (less obvious). Instead of repeating comparison of errno
with these two constants, introduce a helper function to do so.
* jc/noent-notdir:
treewide: use is_missing_file_error() where ENOENT and ENOTDIR are checked
compat-util: is_missing_file_error()
"git clean -d" used to clean directories that has ignored files,
even though the command should not lose ignored ones without "-x".
"git status --ignored" did not list ignored and untracked files
without "-uall". These have been corrected.
* sl/clean-d-ignored-fix:
clean: teach clean -d to preserve ignored paths
dir: expose cmp_name() and check_contains()
dir: hide untracked contents of untracked dirs
dir: recurse into untracked dirs for ignored files
t7061: status --ignored should search untracked dirs
t7300: clean -d should skip dirs with ignored files
Similar functions exist in apply.c and builtin/show-branch.c for
counting the number of slashes in a string. Also in the later
patches, we introduce a third caller for the same. Hence, we unify
it now by cleaning the existing functions and declaring a common
function count_slashes in dir.h and implementing it in dir.c to
remove this code duplication.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Prathamesh Chavan <pc44800@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clean -d" used to clean directories that has ignored files,
even though the command should not lose ignored ones without "-x".
"git status --ignored" did not list ignored and untracked files
without "-uall". These have been corrected.
* sl/clean-d-ignored-fix:
clean: teach clean -d to preserve ignored paths
dir: expose cmp_name() and check_contains()
dir: hide untracked contents of untracked dirs
dir: recurse into untracked dirs for ignored files
t7061: status --ignored should search untracked dirs
t7300: clean -d should skip dirs with ignored files
Using the is_missing_file_error() helper introduced in the previous
step, update all hits from
$ git grep -e ENOENT --and -e ENOTDIR
There are codepaths that only check ENOENT, and it is possible that
some of them should be checking both. Updating them is kept out of
this step deliberately, as we do not want to change behaviour in this
step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In many places, Git warns about an inaccessible file after a fopen()
failed. To discern these cases from other cases where we want to warn
about inaccessible files, introduce a new helper specifically to test
whether fopen() failed because the current user lacks the permission to
open file in question.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to use cmp_name() and check_contains() (which both compare
`struct dir_entry`s, the former in terms of the sort order, the latter
in terms of whether one lexically contains another) outside of dir.c,
so we have to (1) change their linkage and (2) rename them as
appropriate for the global namespace. The second is achieved by
renaming cmp_name() to cmp_dir_entry() and check_contains() to
check_dir_entry_contains().
Signed-off-by: Samuel Lijin <sxlijin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we taught read_directory_recursive() to recurse into untracked
directories in search of ignored files given DIR_SHOW_IGNORED_TOO, that
had the side effect of teaching it to collect the untracked contents of
untracked directories. It doesn't always make sense to return these,
though (we do need them for `clean -d`), so we introduce a flag
(DIR_KEEP_UNTRACKED_CONTENTS) to control whether or not read_directory()
strips dir->entries of the untracked contents of untracked dirs.
We also introduce check_contains() to check if one dir_entry corresponds
to a path which contains the path corresponding to another dir_entry.
This also fixes known breakages in t7061, since status --ignored now
searches untracked directories for ignored files.
Signed-off-by: Samuel Lijin <sxlijin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We consider directories containing only untracked and ignored files to
be themselves untracked, which in the usual case means we don't have to
search these directories. This is problematic when we want to collect
ignored files with DIR_SHOW_IGNORED_TOO, though, so we teach
read_directory_recursive() to recurse into untracked directories to find
the ignored files they contain when DIR_SHOW_IGNORED_TOO is set. This
has the side effect of also collecting all untracked files in untracked
directories as well.
Signed-off-by: Samuel Lijin <sxlijin@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to make it clearer where the_index is being referenced, stop
using the index compatibility macros in dir.c. This is to make it
easier to identify the functions which need to be convert to taking in a
'struct index_state' as a parameter.
The end goal would be to eliminate the need to reference global index
state in dir.c.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout" is taught the "--recurse-submodules" option.
* sb/checkout-recurse-submodules:
builtin/read-tree: add --recurse-submodules switch
builtin/checkout: add --recurse-submodules switch
entry.c: create submodules when interesting
unpack-trees: check if we can perform the operation for submodules
unpack-trees: pass old oid to verify_clean_submodule
update submodules: add submodule_move_head
submodule.c: get_super_prefix_or_empty
update submodules: move up prepare_submodule_repo_env
submodules: introduce check to see whether to touch a submodule
update submodules: add a config option to determine if submodules are updated
update submodules: add submodule config parsing
make is_submodule_populated gently
lib-submodule-update.sh: define tests for recursing into submodules
lib-submodule-update.sh: replace sha1 by hash
lib-submodule-update: teach test_submodule_content the -C <dir> flag
lib-submodule-update.sh: do not use ./. as submodule remote
lib-submodule-update.sh: reorder create_lib_submodule_repo
submodule--helper.c: remove duplicate code
connect_work_tree_and_git_dir: safely create leading directories
The pathspec mechanism learned to further limit the paths that
match the pattern to those that have specified attributes attached
via the gitattributes mechanism.
* bw/attr-pathspec:
pathspec: allow escaped query values
pathspec: allow querying for attributes
In a later patch we'll use connect_work_tree_and_git_dir when the
directory for the gitlink file doesn't exist yet. This patch makes
connect_work_tree_and_git_dir safe to use for both cases of
either the git dir or the working dir missing.
To do so, we need to call safe_create_leading_directories[_const]
on both directories. However this has to happen before we construct
the absolute paths as real_pathdup assumes the directories to
be there already.
So for both the config file in the git dir as well as the .git link
file we need to
a) construct the name
b) call SCLD
c) get the absolute path
d) once a-c is done for both we can consume the absolute path
to compute the relative path to each other and store those
relative paths.
The implementation provided here puts a) and b) for both cases first,
and then performs c and d after.
One of the two users of 'connect_work_tree_and_git_dir' already checked
for the directory being there, so we can loose that check as
connect_work_tree_and_git_dir handles this functionality now.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.
Based on a patch by Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 4ac9006f83 (real_path: have callers use real_pathdup and
strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path())
pattern to use real_pathdup() directly.
The problem with this change is that real_path() calls
strbuf_realpath() with die_on_error = 1 while real_pathdup() calls
it with die_on_error = 0. Meaning that in cases where real_path()
causes Git to die() with an error message, real_pathdup() is silent
and returns NULL instead.
The callers, however, are ill-prepared for that change, as they expect
the return value to be non-NULL (and otherwise the function died
with an appropriate error message).
Fix this by extending real_pathdup()'s signature to accept the
die_on_error flag and simply pass it through to strbuf_realpath(),
and then adjust all callers after a careful audit whether they would
handle NULLs well.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass the match member of the first pathspec item directly to
read_directory() instead of using common_prefix() to duplicate it first,
thus avoiding memory duplication, strlen(3) and free(3).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation of "real_path()" was to go there with chdir(2)
and call getcwd(3), but this obviously wouldn't be usable in a
threaded environment. Rewrite it to manually resolve relative
paths including symbolic links in path components.
* bw/realpath-wo-chdir:
real_path: set errno when max number of symlinks is exceeded
real_path: prevent redefinition of MAXSYMLINKS
Code clean-up in the pathspec API.
* bw/pathspec-cleanup:
pathspec: rename prefix_pathspec to init_pathspec_item
pathspec: small readability changes
pathspec: create strip submodule slash helpers
pathspec: create parse_element_magic helper
pathspec: create parse_long_magic function
pathspec: create parse_short_magic function
pathspec: factor global magic into its own function
pathspec: simpler logic to prefix original pathspec elements
pathspec: always show mnemonic and name in unsupported_magic
pathspec: remove unused variable from unsupported_magic
pathspec: copy and free owned memory
pathspec: remove the deprecated get_pathspec function
ls-tree: convert show_recursive to use the pathspec struct interface
dir: convert fill_directory to use the pathspec struct interface
dir: remove struct path_simplify
mv: remove use of deprecated 'get_pathspec()'
Convert 'fill_directory()' to use the pathspec struct interface from
using the '_raw' entry in the pathspec struct.
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach simplify_away() and exclude_matches_pathspec() to handle struct
pathspec directly, eliminating the need for the struct path_simplify.
Also renamed the len parameter to pathlen in exclude_matches_pathspec()
to match the parameter names used in simplify_away().
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a submodule has its git dir inside the working dir, the submodule
support for checkout that we plan to add in a later patch will fail.
Add functionality to migrate the git directory to be absorbed
into the superprojects git directory.
The newly added code in this patch is structured such that other areas of
Git can also make use of it. The code in the submodule--helper is a mere
wrapper and option parser for the function
`absorb_git_dir_into_superproject`, that takes care of embedding the
submodules git directory into the superprojects git dir. That function
makes use of the more abstract function for this use case
`relocate_gitdir`, which can be used by e.g. the worktree code eventually
to move around a git directory.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That function was primarily used by submodule code, but the function
itself is not inherently about submodules. In the next patch we'll
introduce relocate_git_dir, which can be used by worktrees as well,
so find a neutral middle ground in dir.h.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The low-level attribute and gitignore code will try to look
in $GIT_DIR/info for any repo-level configuration files,
even if we have not actually determined that we are in a
repository (e.g., running "git grep --no-index"). In such a
case they end up looking for ".git/info/attributes", etc.
This is generally harmless, as such a file is unlikely to
exist outside of a repository, but it's still conceptually
the wrong thing to do.
Let's detect this situation explicitly and skip reading the
file (i.e., the same behavior we'd get if we were in a
repository and the file did not exist).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>