Commit Graph

497 Commits

Author SHA1 Message Date
Martin Ågren
610008146e write_locked_index(): add flag to avoid writing unchanged index
We have several callers like

	if (active_cache_changed && write_locked_index(...))
		handle_error();
	rollback_lock_file(...);

where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.

Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.

We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-01 13:28:01 -08:00
Junio C Hamano
0fd90daba8 Merge branch 'bc/hash-algo'
More abstraction of hash function from the codepath.

* bc/hash-algo:
  hash: update obsolete reference to SHA1_HEADER
  bulk-checkin: abstract SHA-1 usage
  csum-file: abstract uses of SHA-1
  csum-file: rename sha1file to hashfile
  read-cache: abstract away uses of SHA-1
  pack-write: switch various SHA-1 values to abstract forms
  pack-check: convert various uses of SHA-1 to abstract forms
  fast-import: switch various uses of SHA-1 to the_hash_algo
  sha1_file: switch uses of SHA-1 to the_hash_algo
  builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
  builtin/index-pack: improve hash function abstraction
  hash: create union for hash context allocation
  hash: move SHA-1 macros to hash.h
2018-02-15 14:55:47 -08:00
Junio C Hamano
090dbea684 Merge branch 'nd/trace-index-ops'
* nd/trace-index-ops:
  trace: measure where the time is spent in the index-heavy operations
2018-02-15 14:55:44 -08:00
Junio C Hamano
8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Junio C Hamano
dd0c256b67 Merge branch 'nd/shared-index-fix'
Code clean-up.

* nd/shared-index-fix:
  read-cache: don't write index twice if we can't write shared index
  read-cache.c: move tempfile creation/cleanup out of write_shared_index
  read-cache.c: change type of "temp" in write_shared_index()
2018-02-13 13:39:14 -08:00
Junio C Hamano
cbf0240f82 Merge branch 'sg/cocci-move-array'
Code clean-up.

* sg/cocci-move-array:
  Use MOVE_ARRAY
2018-02-13 13:39:13 -08:00
Junio C Hamano
e75c862125 Merge branch 'tg/split-index-fixes'
The split-index mode had a few corner case bugs fixed.

* tg/split-index-fixes:
  travis: run tests with GIT_TEST_SPLIT_INDEX
  split-index: don't write cache tree with null oid entries
  read-cache: fix reading the shared index for other repos
2018-02-13 13:39:13 -08:00
brian m. carlson
aab6135906 read-cache: abstract away uses of SHA-1
Convert various uses of direct calls to SHA-1 and 20- and 40-based
constants to use the_hash_algo instead.  Don't yet convert the on-disk
data structures, which will be handled in a future commit.

Adjust some comments so as not to refer explicitly to SHA-1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Nguyễn Thái Ngọc Duy
ca54d9baa4 trace: measure where the time is spent in the index-heavy operations
All the known heavy code blocks are measured (except object database
access). This should help identify if an optimization is effective or
not. An unoptimized git-status would give something like below:

    0.001791141 s: read cache ...
    0.004011363 s: preload index
    0.000516161 s: refresh index
    0.003139257 s: git command: ... 'status' '--porcelain=2'
    0.006788129 s: diff-files
    0.002090267 s: diff-index
    0.001885735 s: initialize name hash
    0.032013138 s: read directory
    0.051781209 s: git command: './git' 'status'

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:20:16 -08:00
Patryk Obara
a09c985eae sha1_file: convert write_sha1_file to object_id
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.

This commit also converts static function write_sha1_file_prepare, as it
is closely related.

Rename these functions to write_object_file and
write_object_file_prepare respectively.

Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Nguyễn Thái Ngọc Duy
ef5b3a6c5e read-cache: don't write index twice if we can't write shared index
In a0a967568e ("update-index --split-index: do not split if $GIT_DIR is
read only", 2014-06-13), we tried to make sure we can still write an
index, even if the shared index can not be written.

We did so by just calling 'do_write_locked_index()' just before
'write_shared_index()'.  'do_write_locked_index()' always at least
closes the tempfile nowadays, and used to close or commit the lockfile
if COMMIT_LOCK or CLOSE_LOCK were given at the time this feature was
introduced.  COMMIT_LOCK or CLOSE_LOCK is passed in by most callers of
'write_locked_index()'.

After calling 'write_shared_index()', we call 'write_split_index()',
which calls 'do_write_locked_index()' again, which then tries to use the
closed lockfile again, but in fact fails to do so as it's already
closed. This eventually leads to a segfault.

Make sure to write the main index only once.

[nd: most of the commit message and investigation done by Thomas, I only
tweaked the solution a bit]

Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 10:09:18 -08:00
SZEDER Gábor
f919ffebed Use MOVE_ARRAY
Use the helper macro MOVE_ARRAY to move arrays.  This is shorter and
safer, as it automatically infers the size of elements.

Patch generated by Coccinelle and contrib/coccinelle/array.cocci in
Travis CI's static analysis build job.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-22 11:32:51 -08:00
Thomas Gummerer
4bddd98311 split-index: don't write cache tree with null oid entries
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21)
we made sure that broken cache entries do not get propagated to new
trees.  Part of that was making sure not to re-use an existing cache
tree that includes a null oid.

It did so by dropping the cache tree in 'do_write_index()' if one of
the entries contains a null oid.  In split index mode however, there
are two invocations to 'do_write_index()', one for the shared index
and one for the split index.  The cache tree is only written once, to
the split index.

As we only loop through the elements that are effectively being
written by the current invocation, that may not include the entry with
a null oid in the split index (when it is already written to the
shared index), where we write the cache tree.  Therefore in split
index mode we may still end up writing the cache tree, even though
there is an entry with a null oid in the index.

Fix this by checking for null oids in prepare_to_write_split_index,
where we loop the entries of the shared index as well as the entries for
the split index.

This fixes t7009 with GIT_TEST_SPLIT_INDEX.  Also add a new test that's
more specifically showing the problem.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:39 -08:00
Thomas Gummerer
a125a22334 read-cache: fix reading the shared index for other repos
read_index_from() takes a path argument for the location of the index
file.  For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.

This works as long as an index in the_repository is read.  Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.

For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.

be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.

Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.

read_cache_from() defaults to using the gitdir of the_repository.  As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().

Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-19 10:36:34 -08:00
Nguyễn Thái Ngọc Duy
59f9d2dd60 read-cache.c: move tempfile creation/cleanup out of write_shared_index
For one thing, we have more consistent cleanup procedure now and always
keep errno intact.

The real purpose is the ability to break out of write_locked_index()
early when mks_tempfile() fails in the next patch. It's more awkward to
do it if this mks_tempfile() is still inside write_shared_index().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16 13:12:07 -08:00
Nguyễn Thái Ngọc Duy
7db2d08cdc read-cache.c: change type of "temp" in write_shared_index()
This local variable 'temp' will be passed in from the caller in the next
patch. To reduce patch noise, let's change its type now while it's still
a local variable and get all the trival conversion out of the next patch.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16 13:12:02 -08:00
Junio C Hamano
af6e0fe3a5 Merge branch 'tb/add-renormalize'
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.

* tb/add-renormalize:
  add: introduce "--renormalize"
2017-11-27 11:06:37 +09:00
Junio C Hamano
c9fdbca92c Merge branch 'av/fsmonitor'
Various fixes to bp/fsmonitor topic.

* av/fsmonitor:
  fsmonitor: simplify determining the git worktree under Windows
  fsmonitor: store fsmonitor bitmap before splitting index
  fsmonitor: read from getcwd(), not the PWD environment variable
  fsmonitor: delay updating state until after split index is merged
  fsmonitor: document GIT_TRACE_FSMONITOR
  fsmonitor: don't bother pretty-printing JSON from watchman
  fsmonitor: set the PWD to the top of the working tree
2017-11-21 14:07:51 +09:00
Junio C Hamano
e05336bdda Merge branch 'bp/fsmonitor'
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.

* bp/fsmonitor:
  fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
  fsmonitor: read entirety of watchman output
  fsmonitor: MINGW support for watchman integration
  fsmonitor: add a performance test
  fsmonitor: add a sample integration script for Watchman
  fsmonitor: add test cases for fsmonitor extension
  split-index: disable the fsmonitor extension when running the split index test
  fsmonitor: add a test tool to dump the index extension
  update-index: add fsmonitor support to update-index
  ls-files: Add support in ls-files to display the fsmonitor valid bit
  fsmonitor: add documentation for the fsmonitor extension.
  fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
  update-index: add a new --force-write-index option
  preload-index: add override to enable testing preload-index
  bswap: add 64 bit endianness helper get_be64
2017-11-21 14:07:50 +09:00
Torsten Bögershausen
9472935d81 add: introduce "--renormalize"
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.

The old way to normalize a repo was like this:

 # Make sure that there are not untracked files
 $ echo "* text=auto" >.gitattributes
 $ git read-tree --empty
 $ git add .
 $ git commit -m "Introduce end-of-line normalization"

The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.

The new "add --renormalize" does not add untracked files:

 $ echo "* text=auto" >.gitattributes
 $ git add --renormalize .
 $ git commit -m "Introduce end-of-line normalization"

Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".

While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.

Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-17 10:31:05 +09:00
Junio C Hamano
e539a83455 Merge branch 'bp/read-index-from-skip-verification'
Drop (perhaps overly cautious) sanity check before using the index
read from the filesystem at runtime.

* bp/read-index-from-skip-verification:
  read_index_from(): speed index loading by skipping verification of the entry order
2017-11-15 12:14:37 +09:00
Alex Vandiver
3bd28eb299 fsmonitor: store fsmonitor bitmap before splitting index
ba1b9cac ("fsmonitor: delay updating state until after split index
is merged", 2017-10-27) resolved the problem of the fsmonitor data
being applied to the non-base index when reading; however, a similar
problem exists when writing the index.  Specifically, writing of the
fsmonitor extension happens only after the work to split the index
has been applied -- as such, the information in the index is only
for the non-"base" index, and thus the extension information
contains only partial data.

When saving, compute the ewah bitmap before the index is split, and
store it in the fsmonitor_dirty field, mirroring the behavior that
occurred during reading.  fsmonitor_dirty is kept from being leaked by
being freed when the extension data is written -- which always happens
precisely once, no matter the split index configuration.

Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-10 14:05:01 +09:00
Ben Peart
00ec50e56d read_index_from(): speed index loading by skipping verification of the entry order
There is code in post_read_index_from() to catch out of order
entries when reading an index file.  This order verification is ~13%
of the cost of every call to read_index_from().

Update check_ce_order() so that it skips this verification unless
the "verify_ce_order" global variable is set.

Teach fsck to force this verification.

The effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD              HEAD~1
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.41(0.04+0.04)   0.50(0.00+0.10) +22.0%

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-08 10:39:41 +09:00
Junio C Hamano
e7e456f500 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (25 commits)
  refs/files-backend: convert static functions to object_id
  refs: convert read_raw_ref backends to struct object_id
  refs: convert peel_object to struct object_id
  refs: convert resolve_ref_unsafe to struct object_id
  worktree: convert struct worktree to object_id
  refs: convert resolve_gitlink_ref to struct object_id
  Convert remaining callers of resolve_gitlink_ref to object_id
  sha1_file: convert index_path and index_fd to struct object_id
  refs: convert reflog_expire parameter to struct object_id
  refs: convert read_ref_at to struct object_id
  refs: convert peel_ref to struct object_id
  builtin/pack-objects: convert to struct object_id
  pack-bitmap: convert traverse_bitmap_commit_list to object_id
  refs: convert dwim_log to struct object_id
  builtin/reflog: convert remaining unsigned char uses to object_id
  refs: convert dwim_ref and expand_ref to struct object_id
  refs: convert read_ref and read_ref_full to object_id
  refs: convert resolve_refdup and refs_resolve_refdup to struct object_id
  Convert check_connected to use struct object_id
  refs: update ref transactions to use struct object_id
  ...
2017-11-06 14:24:27 +09:00
brian m. carlson
a98e6101f0 refs: convert resolve_gitlink_ref to struct object_id
Convert the declaration and definition of resolve_gitlink_ref to use
struct object_id and apply the following semantic patch:

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3.hash)
+ resolve_gitlink_ref(E1, E2, &E3)

@@
expression E1, E2, E3;
@@
- resolve_gitlink_ref(E1, E2, E3->hash)
+ resolve_gitlink_ref(E1, E2, E3)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
brian m. carlson
1053fe829c Convert remaining callers of resolve_gitlink_ref to object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
Martin Ågren
b74c90fb41 read_cache: roll back lock in update_index_if_able()
`update_index_if_able()` used to always commit the lock or roll it back.
Commit 03b866477 (read-cache: new API write_locked_index instead of
write_index/write_cache, 2014-06-13) stopped rolling it back in case a
write was not even attempted. This change in behavior is not motivated
in the commit message and appears to be accidental: the `else`-path was
removed, although that changed the behavior in case the `if` shortcuts.

Reintroduce the rollback and document this behavior. While at it, move
the documentation on this function from the function definition to the
function declaration in cache.h.

If `write_locked_index(..., COMMIT_LOCK)` fails, it will roll back the
lock for us (see the previous commit).

Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Martin Ågren
df60cf5789 read-cache: leave lock in right state in write_locked_index()
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).

In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.

Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.

Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.

Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).

Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Martin Ågren
812d6b0075 read-cache: drop explicit CLOSE_LOCK-flag
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.

We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).

Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.

This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-07 10:20:56 +09:00
Ben Peart
883e248b8a fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:23:01 +09:00
Junio C Hamano
59373a4e03 Merge branch 'jk/fallthrough'
Many codepaths have been updated to squelch -Wimplicit-fallthrough
warnings from Gcc 7 (which is a good code hygiene).

* jk/fallthrough:
  consistently use "fallthrough" comments in switches
  curl_trace(): eliminate switch fallthrough
  test-line-buffer: simplify command parsing
2017-09-28 14:47:53 +09:00
Junio C Hamano
c50424a6f0 Merge branch 'jk/write-in-full-fix'
Many codepaths did not diagnose write failures correctly when disks
go full, due to their misuse of write_in_full() helper function,
which have been corrected.

* jk/write-in-full-fix:
  read_pack_header: handle signed/unsigned comparison in read result
  config: flip return value of store_write_*()
  notes-merge: use ssize_t for write_in_full() return value
  pkt-line: check write_in_full() errors against "< 0"
  convert less-trivial versions of "write_in_full() != len"
  avoid "write_in_full(fd, buf, len) != len" pattern
  get-tar-commit-id: check write_in_full() return against 0
  config: avoid "write_in_full(fd, buf, len) < len" pattern
2017-09-25 15:24:06 +09:00
Junio C Hamano
d085f9773a Merge branch 'kw/write-index-reduce-alloc'
A hotfix to a topic already in 'master'.

* kw/write-index-reduce-alloc:
  read-cache: fix index corruption with index v4
  Add t/helper/test-write-cache to .gitignore
2017-09-25 15:24:06 +09:00
Jeff King
1cf01a34ea consistently use "fallthrough" comments in switches
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a
switch case falls through to the next case. The general idea
is that the compiler can't tell if this was intentional or
not, so you should annotate any intentional fall-throughs as
such, leaving it to complain about any unannotated ones.

There's a GNU __attribute__ which can be used for
annotation, but of course we'd have to #ifdef it away on
non-gcc compilers. Gcc will also recognize
specially-formatted comments, which matches our current
practice. Let's extend that practice to all of the
unannotated sites (which I did look over and verify that
they were behaving as intended).

Ideally in each case we'd actually give some reasons in the
comment about why we're falling through, or what we're
falling through to. And gcc does support that with
-Wimplicit-fallthrough=2, which relaxes the comment pattern
matching to anything that contains "fallthrough" (or a
variety of spelling variants). However, this isn't the
default for -Wimplicit-fallthrough, nor for -Wextra. In the
name of simplicity, it's probably better for us to support
the default level, which requires "fallthrough" to be the
only thing in the comment (modulo some window dressing like
"else" and some punctuation; see the gcc manual for the
complete set of patterns).

This patch suppresses all warnings due to
-Wimplicit-fallthrough. We might eventually want to add that
to the DEVELOPER Makefile knob, but we should probably wait
until gcc 7 is more widely adopted (since earlier versions
will complain about the unknown warning type).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:49:57 +09:00
Jeff King
06f46f237a avoid "write_in_full(fd, buf, len) != len" pattern
The return value of write_in_full() is either "-1", or the
requested number of bytes[1]. If we make a partial write
before seeing an error, we still return -1, not a partial
value. This goes back to f6aa66cb95 (write_in_full: really
write in full or return error on disk full., 2007-01-11).

So checking anything except "was the return value negative"
is pointless. And there are a couple of reasons not to do
so:

  1. It can do a funny signed/unsigned comparison. If your
     "len" is signed (e.g., a size_t) then the compiler will
     promote the "-1" to its unsigned variant.

     This works out for "!= len" (unless you really were
     trying to write the maximum size_t bytes), but is a
     bug if you check "< len" (an example of which was fixed
     recently in config.c).

     We should avoid promoting the mental model that you
     need to check the length at all, so that new sites are
     not tempted to copy us.

  2. Checking for a negative value is shorter to type,
     especially when the length is an expression.

  3. Linus says so. In d34cf19b89 (Clean up write_in_full()
     users, 2007-01-11), right after the write_in_full()
     semantics were changed, he wrote:

       I really wish every "write_in_full()" user would just
       check against "<0" now, but this fixes the nasty and
       stupid ones.

     Appeals to authority aside, this makes it clear that
     writing it this way does not have an intentional
     benefit. It's a historical curiosity that we never
     bothered to clean up (and which was undoubtedly
     cargo-culted into new sites).

So let's convert these obviously-correct cases (this
includes write_str_in_full(), which is just a wrapper for
write_in_full()).

[1] A careful reader may notice there is one way that
    write_in_full() can return a different value. If we ask
    write() to write N bytes and get a return value that is
    _larger_ than N, we could return a larger total. But
    besides the fact that this would imply a totally broken
    version of write(), it would already invoke undefined
    behavior. Our internal remaining counter is an unsigned
    size_t, which means that subtracting too many byte will
    wrap it around to a very large number. So we'll instantly
    begin reading off the end of the buffer, trying to write
    gigabytes (or petabytes) of data.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-14 15:17:59 +09:00
Thomas Gummerer
0b90b881e0 read-cache: fix index corruption with index v4
ce012deb98 ("read-cache: avoid allocating every ondisk entry when
writing", 2017-08-21) changed the way cache entries are written to the
index file.  While previously it wrote the name to an struct that was
allocated using xcalloc(), it now uses ce_write() directly.  Previously
ce_namelen - common bytes were written to the cache entry, which would
automatically make it nul terminated, as it was allocated using calloc.

Now we are writing ce_namelen - common + 1 bytes directly from the
ce->name to the index.  If CE_STRIP_NAME however gets set in the split
index case ce->ce_namelen is set to 0 without changing the actual
ce->name buffer.  When index-v4, this results in the first character of
ce->name being written out instead of just a terminating nul charcter.

As index-v4 requires the terminating nul character as terminator of
the name when reading it back, this results in a corrupted index.

Fix that by only writing ce_namelen - common bytes directly from
ce->name to the index, and adding the nul terminator in an extra call to
ce_write.

This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in
config.mak and running the test suite (t1700 specifically broke).

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-08 09:47:45 +09:00
Jeff King
076aa2cbda tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&t, ...);
  ...
  if (!err)
          rename_tempfile(&t, ...);
  else
          delete_tempfile(&t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
	struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:54 +09:00
Jeff King
83a3069a38 lockfile: do not rollback lock on failed close
Since the lockfile code is based on the tempfile code, it
has some of the same problems, including that close_lock_file()
erases the tempfile's filename buf, making it hard for the
caller to write a good error message.

In practice this comes up less for lockfiles than for
straight tempfiles, since we usually just report the
refname. But there is at least one buggy case in
write_ref_to_lockfile(). Besides, given the coupling between
the lockfile and tempfile modules, it's less confusing if
their close() functions have the same semantics.

Just as the previous commit did for close_tempfile(), let's
teach close_lock_file() and its wrapper close_ref() not to
rollback on error. And just as before, we'll give them new
"gently" names to catch any new callers that are added.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Jeff King
49bd0fc222 tempfile: do not delete tempfile on failed close
When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:

  if (close_tempfile(tempfile))
	return error_errno("error closing %s", tempfile->filename.buf);

wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).

Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:

  if (write(...) || close_tempfile(...)) {
	delete_tempfile(...);
	return -1;
  }

already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.

Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:53 +09:00
Junio C Hamano
030faf2fa5 Merge branch 'kw/write-index-reduce-alloc'
We used to spend more than necessary cycles allocating and freeing
piece of memory while writing each index entry out.  This has been
optimized.

* kw/write-index-reduce-alloc:
  read-cache: avoid allocating every ondisk entry when writing
  read-cache: fix memory leak in do_write_index
  perf: add test for writing the index
2017-08-26 22:55:08 -07:00
Kevin Willford
ce012deb98 read-cache: avoid allocating every ondisk entry when writing
When writing the index for each entry an ondisk struct will be
allocated and freed in ce_write_entry.  We can do better by
using a ondisk struct on the stack for each entry.

This is accomplished by using a stack ondisk_cache_entry_extended
outside looping through the entries in do_write_index.  Only the
fixed fields of this struct are used when writing and depending on
whether it is extended or not the flags2 field will be written.
The name field is not used and instead the cache_entry name field
is used directly when writing out the name.  Because ce_write is
using a buffer and memcpy to fill the buffer before flushing to disk,
we don't have to worry about doing multiple ce_write calls.

Running the p0007-write-cache.sh tests would save anywhere
between 3-7% when the index had over a million entries with no
performance degradation on small repos.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21 16:02:59 -07:00
Kevin Willford
b50386c7c0 read-cache: fix memory leak in do_write_index
The previous_name_buf was never getting released when there
was an error in ce_write_entry or allow was false and execution
was returned to the caller.

Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-21 15:57:02 -07:00
Patryk Obara
e3506559d4 sha1_file: convert index_fd to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:52:08 -07:00
Patryk Obara
98e019b067 sha1_file: convert index_path to struct object_id
Convert all remaining callers as well.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:51:38 -07:00
Patryk Obara
bebfecb94c read-cache: convert to struct object_id
Replace hashcmp with oidcmp.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-20 21:51:08 -07:00
René Scharfe
f331ab9d4c use MOVE_ARRAY
Simplify the code for moving members inside of an array and make it more
robust by using the helper macro MOVE_ARRAY.  It calculates the size
based on the specified number of elements for us and supports NULL
pointers when that number is zero.  Raw memmove(3) calls with NULL can
cause the compiler to (over-eagerly) optimize out later NULL checks.

This patch was generated with contrib/coccinelle/array.cocci and spatch
(Coccinelle).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-17 14:54:56 -07:00
Junio C Hamano
8e90578ffb Merge branch 'cc/shared-index-permfix'
The split index code did not honor core.sharedrepository setting
correctly.

* cc/shared-index-permfix:
  t1700: make sure split-index respects core.sharedrepository
  t1301: move modebits() to test-lib-functions.sh
  read-cache: use shared perms when writing shared index
2017-07-05 13:32:57 -07:00
Christian Couder
df801f3f9f read-cache: use shared perms when writing shared index
Since f6ecc62dbf (write_shared_index(): use tempfile module, 2015-08-10)
write_shared_index() has been using mks_tempfile() to create the
temporary file that will become the shared index.

But even before that, it looks like the functions used to create this
file didn't call adjust_shared_perm(), which means that the shared
index file has always been created with 600 permissions regardless
of the shared permission settings.

Because of that, on repositories created with `git init --shared=all`
and using the split index feature, one gets an error like:

fatal: .git/sharedindex.a52f910b489bc462f187ab572ba0086f7b5157de: index file open failed: Permission denied

when another user performs any operation that reads the shared index.

Call adjust_shared_perm() on the temporary file created by
mks_tempfile() ourselves to adjust the permission bits.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-25 10:42:52 -07:00
Junio C Hamano
50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00