* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
When creating a directory on Windows whose path ends in a space or a
period (or chains thereof), the Win32 API "helpfully" trims those. For
example, `mkdir("abc ");` will return success, but actually create a
directory called `abc` instead.
This stems back to the DOS days, when all file names had exactly 8
characters plus exactly 3 characters for the file extension, and the
only way to have shorter names was by padding with spaces.
Sadly, this "helpful" behavior is a bit inconsistent: after a successful
`mkdir("abc ");`, a `mkdir("abc /def")` will actually _fail_ (because
the directory `abc ` does not actually exist).
Even if it would work, we now have a serious problem because a Git
repository could contain directories `abc` and `abc `, and on Windows,
they would be "merged" unintentionally.
As these paths are illegal on Windows, anyway, let's disallow any
accesses to such paths on that Operating System.
For practical reasons, this behavior is still guarded by the
config setting `core.protectNTFS`: it is possible (and at least two
regression tests make use of it) to create commits without involving the
worktree. In such a scenario, it is of course possible -- even on
Windows -- to create such file names.
Among other consequences, this patch disallows submodules' paths to end
in spaces on Windows (which would formerly have confused Git enough to
try to write into incorrect paths, anyway).
While this patch does not fix a vulnerability on its own, it prevents an
attack vector that was exploited in demonstrations of a number of
recently-fixed security bugs.
The regression test added to `t/t7417-submodule-path-url.sh` reflects
that attack vector.
Note that we have to adjust the test case "prevent git~1 squatting on
Windows" in `t/t7415-submodule-names.sh` because of a very subtle issue.
It tries to clone two submodules whose names differ only in a trailing
period character, and as a consequence their git directories differ in
the same way. Previously, when Git tried to clone the second submodule,
it thought that the git directory already existed (because on Windows,
when you create a directory with the name `b.` it actually creates `b`),
but with this patch, the first submodule's clone will fail because of
the illegal name of the git directory. Therefore, when cloning the
second submodule, Git will take a different code path: a fresh clone
(without an existing git directory). Both code paths fail to clone the
second submodule, both because the the corresponding worktree directory
exists and is not empty, but the error messages are worded differently.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.
As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.
However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.
This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.
Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.
Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.
Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.
The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.
This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.
Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
* maint-2.16:
Git 2.16.4
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.15:
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.14:
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.13:
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Signed-off-by: Jeff King <peff@peff.net>
We're more restrictive than we need to be in matching ".GIT"
on case-sensitive filesystems; let's make a note that this
is intentional.
Signed-off-by: Jeff King <peff@peff.net>
We check ".git" and ".." in the same switch statement, and
fall through the cases to share the end-of-component check.
While this saves us a line or two, it makes modifying the
function much harder. Let's just write it out.
Signed-off-by: Jeff King <peff@peff.net>
Code clean-up.
* nd/shared-index-fix:
read-cache: don't write index twice if we can't write shared index
read-cache.c: move tempfile creation/cleanup out of write_shared_index
read-cache.c: change type of "temp" in write_shared_index()
The split-index mode had a few corner case bugs fixed.
* tg/split-index-fixes:
travis: run tests with GIT_TEST_SPLIT_INDEX
split-index: don't write cache tree with null oid entries
read-cache: fix reading the shared index for other repos
The codepath to replace an existing entry in the index had a bug in
updating the name hash structure, which has been fixed.
* bp/refresh-cache-ent-rehash-fix:
Fix bugs preventing adding updated cache entries to the name hash
Internal API clean-up to allow write_locked_index() optionally skip
writing the in-core index when it is not modified.
* ma/skip-writing-unchanged-index:
write_locked_index(): add flag to avoid writing unchanged index
The function ce_write_entry() uses a 'self-initialised' variable
construct, for the symbol 'saved_namelen', to suppress a gcc
'-Wmaybe-uninitialized' warning, given that the warning is a false
positive.
For the purposes of this discussion, the ce_write_entry() function has
three code blocks of interest, that look like so:
/* block #1 */
if (ce->ce_flags & CE_STRIP_NAME) {
saved_namelen = ce_namelen(ce);
ce->ce_namelen = 0;
}
/* block #2 */
/*
* several code blocks that contain, among others, calls
* to copy_cache_entry_to_ondisk(ondisk, ce);
*/
/* block #3 */
if (ce->ce_flags & CE_STRIP_NAME) {
ce->ce_namelen = saved_namelen;
ce->ce_flags &= ~CE_STRIP_NAME;
}
The warning implies that gcc thinks it is possible that the first
block is not entered, the calls to copy_cache_entry_to_ondisk()
could toggle the CE_STRIP_NAME flag on, thereby entering block #3
with saved_namelen unset. However, the copy_cache_entry_to_ondisk()
function does not write to ce->ce_flags (it only reads). gcc could
easily determine this, since that function is local to this file,
but it obviously doesn't.
In order to suppress this warning, we make it clear to the reader
(human and compiler), that block #3 will only be entered when the
first block has been entered, by introducing a new 'stripped_name'
boolean variable. We also take the opportunity to change the type
of 'saved_namelen' to 'unsigned int' to match ce->ce_namelen.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update replace_index_entry() to clear the CE_HASHED flag from the new cache
entry so that it can add it to the name hash in set_index_entry()
Fix refresh_cache_ent() to use the copy_cache_entry() macro instead of memcpy()
so that it doesn't incorrectly copy the hash state from the old entry.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several callers like
if (active_cache_changed && write_locked_index(...))
handle_error();
rollback_lock_file(...);
where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.
Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.
We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More abstraction of hash function from the codepath.
* bc/hash-algo:
hash: update obsolete reference to SHA1_HEADER
bulk-checkin: abstract SHA-1 usage
csum-file: abstract uses of SHA-1
csum-file: rename sha1file to hashfile
read-cache: abstract away uses of SHA-1
pack-write: switch various SHA-1 values to abstract forms
pack-check: convert various uses of SHA-1 to abstract forms
fast-import: switch various uses of SHA-1 to the_hash_algo
sha1_file: switch uses of SHA-1 to the_hash_algo
builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
builtin/index-pack: improve hash function abstraction
hash: create union for hash context allocation
hash: move SHA-1 macros to hash.h
Code clean-up.
* nd/shared-index-fix:
read-cache: don't write index twice if we can't write shared index
read-cache.c: move tempfile creation/cleanup out of write_shared_index
read-cache.c: change type of "temp" in write_shared_index()
The split-index mode had a few corner case bugs fixed.
* tg/split-index-fixes:
travis: run tests with GIT_TEST_SPLIT_INDEX
split-index: don't write cache tree with null oid entries
read-cache: fix reading the shared index for other repos
Convert various uses of direct calls to SHA-1 and 20- and 40-based
constants to use the_hash_algo instead. Don't yet convert the on-disk
data structures, which will be handled in a future commit.
Adjust some comments so as not to refer explicitly to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below:
0.001791141 s: read cache ...
0.004011363 s: preload index
0.000516161 s: refresh index
0.003139257 s: git command: ... 'status' '--porcelain=2'
0.006788129 s: diff-files
0.002090267 s: diff-index
0.001885735 s: initialize name hash
0.032013138 s: read directory
0.051781209 s: git command: './git' 'status'
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.
This commit also converts static function write_sha1_file_prepare, as it
is closely related.
Rename these functions to write_object_file and
write_object_file_prepare respectively.
Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
read only", 2014-06-13), we tried to make sure we can still write an
index, even if the shared index can not be written.
We did so by just calling 'do_write_locked_index()' just before
'write_shared_index()'. 'do_write_locked_index()' always at least
closes the tempfile nowadays, and used to close or commit the lockfile
if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
introduced. COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
'write_locked_index()'.
After calling 'write_shared_index()', we call 'write_split_index()',
which calls 'do_write_locked_index()' again, which then tries to use the
closed lockfile again, but in fact fails to do so as it's already
closed. This eventually leads to a segfault.
Make sure to write the main index only once.
[nd: most of the commit message and investigation done by Thomas, I only
tweaked the solution a bit]
Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the helper macro MOVE_ARRAY to move arrays. This is shorter and
safer, as it automatically infers the size of elements.
Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21)
we made sure that broken cache entries do not get propagated to new
trees. Part of that was making sure not to re-use an existing cache
tree that includes a null oid.
It did so by dropping the cache tree in 'do_write_index()' if one of
the entries contains a null oid. In split index mode however, there
are two invocations to 'do_write_index()', one for the shared index
and one for the split index. The cache tree is only written once, to
the split index.
As we only loop through the elements that are effectively being
written by the current invocation, that may not include the entry with
a null oid in the split index (when it is already written to the
shared index), where we write the cache tree. Therefore in split
index mode we may still end up writing the cache tree, even though
there is an entry with a null oid in the index.
Fix this by checking for null oids in prepare_to_write_split_index,
where we loop the entries of the shared index as well as the entries for
the split index.
This fixes t7009 with GIT_TEST_SPLIT_INDEX. Also add a new test that's
more specifically showing the problem.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_index_from() takes a path argument for the location of the index
file. For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.
This works as long as an index in the_repository is read. Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.
For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.
be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.
Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.
read_cache_from() defaults to using the gitdir of the_repository. As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().
Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For one thing, we have more consistent cleanup procedure now and always
keep errno intact.
The real purpose is the ability to break out of write_locked_index()
early when mks_tempfile() fails in the next patch. It's more awkward to
do it if this mks_tempfile() is still inside write_shared_index().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This local variable 'temp' will be passed in from the caller in the next
patch. To reduce patch noise, let's change its type now while it's still
a local variable and get all the trival conversion out of the next patch.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.
* tb/add-renormalize:
add: introduce "--renormalize"
Various fixes to bp/fsmonitor topic.
* av/fsmonitor:
fsmonitor: simplify determining the git worktree under Windows
fsmonitor: store fsmonitor bitmap before splitting index
fsmonitor: read from getcwd(), not the PWD environment variable
fsmonitor: delay updating state until after split index is merged
fsmonitor: document GIT_TRACE_FSMONITOR
fsmonitor: don't bother pretty-printing JSON from watchman
fsmonitor: set the PWD to the top of the working tree
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.
The old way to normalize a repo was like this:
# Make sure that there are not untracked files
$ echo "* text=auto" >.gitattributes
$ git read-tree --empty
$ git add .
$ git commit -m "Introduce end-of-line normalization"
The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.
The new "add --renormalize" does not add untracked files:
$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git commit -m "Introduce end-of-line normalization"
Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".
While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop (perhaps overly cautious) sanity check before using the index
read from the filesystem at runtime.
* bp/read-index-from-skip-verification:
read_index_from(): speed index loading by skipping verification of the entry order
ba1b9cac ("fsmonitor: delay updating state until after split index
is merged", 2017-10-27) resolved the problem of the fsmonitor data
being applied to the non-base index when reading; however, a similar
problem exists when writing the index. Specifically, writing of the
fsmonitor extension happens only after the work to split the index
has been applied -- as such, the information in the index is only
for the non-"base" index, and thus the extension information
contains only partial data.
When saving, compute the ewah bitmap before the index is split, and
store it in the fsmonitor_dirty field, mirroring the behavior that
occurred during reading. fsmonitor_dirty is kept from being leaked by
being freed when the extension data is written -- which always happens
precisely once, no matter the split index configuration.
Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is code in post_read_index_from() to catch out of order
entries when reading an index file. This order verification is ~13%
of the cost of every call to read_index_from().
Update check_ce_order() so that it skips this verification unless
the "verify_ce_order" global variable is set.
Teach fsck to force this verification.
The effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD HEAD~1
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.41(0.04+0.04) 0.50(0.00+0.10) +22.0%
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (25 commits)
refs/files-backend: convert static functions to object_id
refs: convert read_raw_ref backends to struct object_id
refs: convert peel_object to struct object_id
refs: convert resolve_ref_unsafe to struct object_id
worktree: convert struct worktree to object_id
refs: convert resolve_gitlink_ref to struct object_id
Convert remaining callers of resolve_gitlink_ref to object_id
sha1_file: convert index_path and index_fd to struct object_id
refs: convert reflog_expire parameter to struct object_id
refs: convert read_ref_at to struct object_id
refs: convert peel_ref to struct object_id
builtin/pack-objects: convert to struct object_id
pack-bitmap: convert traverse_bitmap_commit_list to object_id
refs: convert dwim_log to struct object_id
builtin/reflog: convert remaining unsigned char uses to object_id
refs: convert dwim_ref and expand_ref to struct object_id
refs: convert read_ref and read_ref_full to object_id
refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
Convert check_connected to use struct object_id
refs: update ref transactions to use struct object_id
...
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)
@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`update_index_if_able()` used to always commit the lock or roll it back.
Commit 03b866477 (read-cache: new API write_locked_index instead of
write_index/write_cache, 2014-06-13) stopped rolling it back in case a
write was not even attempted. This change in behavior is not motivated
in the commit message and appears to be accidental: the `else`-path was
removed, although that changed the behavior in case the `if` shortcuts.
Reintroduce the rollback and document this behavior. While at it, move
the documentation on this function from the function definition to the
function declaration in cache.h.
If `write_locked_index(..., COMMIT_LOCK)` fails, it will roll back the
lock for us (see the previous commit).
Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).
In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.
Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.
Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.
Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).
Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.
We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).
Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.
This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>