`git merge-recursive` does a three-way merge between user-specified trees
base, head, and remote. Since the user is allowed to specify head, we can
not necesarily assume that head == HEAD.
Modify index_has_changes() to take an extra argument specifying the tree
to compare against. If NULL, it will compare to HEAD. We then use this
from merge-recursive to make sure we compare to the user-specified head.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Modify index_has_changes() to take a struct istate* instead of just
operating on the_index. This is only a partial conversion, though,
because we call do_diff_cache() which implicitly assumes work is to be
done on the_index. Ongoing work is being done elsewhere to do the
remainder of the conversion, and thus is not duplicated here. Instead,
a simple check is put in place until that work is complete.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:
1) Invalidate cache_entry memory when discarding cache_entry.
2) When discarding index_state struct, verify that all cache_entries
were allocated from expected mem_pool.
3) When discarding mem_pools, invalidate mem_pool memory.
This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This change
is in preparation for using memory pools to reduce the number of
malloc() calls made to allocate cahce entries when loading an index.
Add an API to allocate and discard cache entries, abstracting the
details of managing the memory backing the cache entries. This commit
does actually change how memory is managed - this will be done in a
later commit in the series.
This change makes the distinction between cache entries that are
associated with an index and cache entries that are not associated with
an index. A main use of cache entries is with an index, and we can
optimize the memory management around this. We still have other cases
where a cache entry is not persisted with an index, and so we need to
handle the "transient" use case as well.
To keep the congnitive overhead of managing the cache entries, there
will only be a single discard function. This means there must be enough
information kept with the cache entry so that we know how to discard
them.
A summary of the main functions in the API is:
make_cache_entry: create cache entry for use in an index. Uses specified
parameters to populate cache_entry fields.
make_empty_cache_entry: Create an empty cache entry for use in an index.
Returns cache entry with empty fields.
make_transient_cache_entry: create cache entry that is not used in an
index. Uses specified parameters to populate
cache_entry fields.
make_empty_transient_cache_entry: create cache entry that is not used in
an index. Returns cache entry with
empty fields.
discard_cache_entry: A single function that knows how to discard a cache
entry regardless of how it was allocated.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactor refresh_cache_entry() to work on a specific index, instead of
implicitly using the_index. This is in preparation for making the
make_cache_entry function apply to a specific index.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.
* sb/object-store-alloc:
alloc: allow arbitrary repositories for alloc functions
object: allow create_object to handle arbitrary repositories
object: allow grow_object_hash to handle arbitrary repositories
alloc: add repository argument to alloc_commit_index
alloc: add repository argument to alloc_report
alloc: add repository argument to alloc_object_node
alloc: add repository argument to alloc_tag_node
alloc: add repository argument to alloc_commit_node
alloc: add repository argument to alloc_tree_node
alloc: add repository argument to alloc_blob_node
object: add repository argument to grow_object_hash
object: add repository argument to create_object
repository: introduce parsed objects field
The list of commands with their various attributes were spread
across a few places in the build procedure, but it now is getting a
bit more consolidated to allow more automation.
* nd/command-list:
completion: allow to customize the completable command list
completion: add and use --list-cmds=alias
completion: add and use --list-cmds=nohelpers
Move declaration for alias.c to alias.h
completion: reduce completable command list
completion: let git provide the completable command list
command-list.txt: documentation and guide line
help: use command-list.txt for the source of guides
help: add "-a --verbose" to list all commands with synopsis
git: support --list-cmds=list-<category>
completion: implement and use --list-cmds=main,others
git --list-cmds: collect command list in a string_list
git.c: convert --list-* to --list-cmds=*
Remove common-cmds.h
help: use command-list.h for common command list
generate-cmds.sh: export all commands to command-list.h
generate-cmds.sh: factor out synopsis extract code
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (42 commits)
merge-one-file: compute empty blob object ID
add--interactive: compute the empty tree value
Update shell scripts to compute empty tree object ID
sha1_file: only expose empty object constants through git_hash_algo
dir: use the_hash_algo for empty blob object ID
sequencer: use the_hash_algo for empty tree object ID
cache-tree: use is_empty_tree_oid
sha1_file: convert cached object code to struct object_id
builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
submodule: convert several uses of EMPTY_TREE_SHA1_HEX
sequencer: convert one use of EMPTY_TREE_SHA1_HEX
merge: convert empty tree constant to the_hash_algo
builtin/merge: switch tree functions to use object_id
builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
sha1-file: add functions for hex empty tree and blob OIDs
builtin/receive-pack: avoid hard-coded constants for push certs
diff: specify abbreviation size in terms of the_hash_algo
upload-pack: replace use of several hard-coded constants
...
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.
* nd/pack-objects-pack-struct:
ci: exercise the whole test suite with uncommon code in pack-objects
pack-objects: reorder members to shrink struct object_entry
pack-objects: shrink delta_size field in struct object_entry
pack-objects: shrink size field in struct object_entry
pack-objects: clarify the use of object_entry::size
pack-objects: don't check size when the object is bad
pack-objects: shrink z_delta_size field in struct object_entry
pack-objects: refer to delta objects by index instead of pointer
pack-objects: move in_pack out of struct object_entry
pack-objects: move in_pack_pos out of struct object_entry
pack-objects: use bitfield for object_entry::depth
pack-objects: use bitfield for object_entry::dfs_state
pack-objects: turn type and in_pack_type to bitfields
pack-objects: a bit of document about struct object_entry
read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).
* sb/oid-object-info:
cache.h: allow oid_object_info to handle arbitrary repositories
packfile: add repository argument to cache_or_unpack_entry
packfile: add repository argument to unpack_entry
packfile: add repository argument to read_object
packfile: add repository argument to packed_object_info
packfile: add repository argument to packed_to_object_type
packfile: add repository argument to retry_bad_packed_offset
cache.h: add repository argument to oid_object_info
cache.h: add repository argument to oid_object_info_extended
* maint-2.16:
Git 2.16.4
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.15:
Git 2.15.2
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.14:
Git 2.14.4
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
* maint-2.13:
Git 2.13.7
verify_path: disallow symlinks in .gitmodules
update-index: stat updated files earlier
verify_dotfile: mention case-insensitivity in comment
verify_path: drop clever fallthrough
skip_prefix: add case-insensitive variant
is_{hfs,ntfs}_dotgitmodules: add tests
is_ntfs_dotgit: match other .git files
is_hfs_dotgit: match other .git files
is_ntfs_dotgit: use a size_t for traversing string
submodule-config: verify submodule names as paths
There are a few reasons it's not a good idea to make
.gitmodules a symlink, including:
1. It won't be portable to systems without symlinks.
2. It may behave inconsistently, since Git may look at
this file in the index or a tree without bothering to
resolve any symbolic links. We don't do this _yet_, but
the config infrastructure is there and it's planned for
the future.
With some clever code, we could make (2) work. And some
people may not care about (1) if they only work on one
platform. But there are a few security reasons to simply
disallow it:
a. A symlinked .gitmodules file may circumvent any fsck
checks of the content.
b. Git may read and write from the on-disk file without
sanity checking the symlink target. So for example, if
you link ".gitmodules" to "../oops" and run "git
submodule add", we'll write to the file "oops" outside
the repository.
Again, both of those are problems that _could_ be solved
with sufficient code, but given the complications in (1) and
(2), we're better off just outlawing it explicitly.
Note the slightly tricky call to verify_path() in
update-index's update_one(). There we may not have a mode if
we're not updating from the filesystem (e.g., we might just
be removing the file). Passing "0" as the mode there works
fine; since it's not a symlink, we'll just skip the extra
checks.
Signed-off-by: Jeff King <peff@peff.net>
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.
However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.
It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.
For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).
We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.
To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.
These can be recomputed by running the following Perl script:
-- snip --
use warnings;
use strict;
sub compute_short_name_hash ($) {
my $checksum = 0;
foreach (split('', $_[0])) {
$checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
}
$checksum = ($checksum * 314159269) & 0xffffffff;
$checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
$checksum -= (($checksum * 1152921497) >> 60) * 1000000007;
return scalar reverse sprintf("%x", $checksum & 0xffff);
}
print compute_short_name_hash($ARGV[0]);
-- snap --
E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
This conversion was done without the #define trick used in the earlier
series refactoring to have better repository access, because this function
is easy to review, as all lines are converted and it has only one caller.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This should make these functions easier to find and cache.h less
overwhelming to read.
In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file
As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise. It would be better to #include wherever
identifiers from the header are used. That can happen later
when we have better tooling for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have to convert all of the alloc functions at once, because alloc_report
uses a funky macro for reporting. It is better for the sake of mechanical
conversion to convert multiple functions at once rather than changing the
structure of the reporting function.
We record all memory allocation in alloc.c, and free them in
clear_alloc_state, which is called for all repositories except
the_repository.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current document mentions OBJ_* constants without their actual
values. A git developer would know these are from cache.h but that's
not very friendly to a person who wants to read this file to implement
a pack file parser.
Similarly, the deltified representation is not documented at all (the
"document" is basically patch-delta.c). Translate that C code to
English with a bit more about what ofs-delta and ref-delta mean.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a small mechanical change; it doesn't change the
implementation to handle repositories other than the_repository yet.
Use a macro to catch callers passing a repository other than
the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The effort to pass the repository in-core structure throughout the
API continues. This round deals with the code that implements the
refs/replace/ mechanism.
* sb/object-store-replace:
replace-object: allow lookup_replace_object to handle arbitrary repositories
replace-object: allow do_lookup_replace_object to handle arbitrary repositories
replace-object: allow prepare_replace_object to handle arbitrary repositories
refs: allow for_each_replace_ref to handle arbitrary repositories
refs: store the main ref store inside the repository struct
replace-object: add repository argument to lookup_replace_object
replace-object: add repository argument to do_lookup_replace_object
replace-object: add repository argument to prepare_replace_object
refs: add repository argument to for_each_replace_ref
refs: add repository argument to get_main_ref_store
replace-object: check_replace_refs is safe in multi repo environment
replace-object: eliminate replace objects prepared flag
object-store: move lookup_replace_object to replace-object.h
replace-object: move replace_map to object store
replace_object: use oidmap
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.
* ds/commit-graph:
commit-graph: implement "--append" option
commit-graph: build graph from starting commits
commit-graph: read only from specific pack-indexes
commit: integrate commit graph with commit parsing
commit-graph: close under reachability
commit-graph: add core.commitGraph setting
commit-graph: implement git commit-graph read
commit-graph: implement git-commit-graph write
commit-graph: implement write_commit_graph()
commit-graph: create git-commit-graph builtin
graph: add commit graph design document
commit-graph: add format document
csum-file: refactor finalize_hashfile() method
csum-file: rename hashclose() to finalize_hashfile()
A build-time option has been added to allow Git to be told to refer
to its associated files relative to the main binary, in the same
way that has been possible on Windows for quite some time, for
Linux, BSDs and Darwin.
* dj/runtime-prefix:
Makefile: quote $INSTLIBDIR when passing it to sed
Makefile: remove unused @@PERLLIBDIR@@ substitution variable
mingw/msvc: use the new-style RUNTIME_PREFIX helper
exec_cmd: provide a new-style RUNTIME_PREFIX helper for Windows
exec_cmd: RUNTIME_PREFIX on some POSIX systems
Makefile: add Perl runtime prefix support
Makefile: generate Perl header from template file
There really isn't any case in which we want to expose the constants for
empty trees and blobs outside of using the hash algorithm abstraction.
Make these constants static and stop exposing the defines in cache.h.
Remove the constants which are no longer in use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Oftentimes, we'll want to refer to an empty tree or empty blob by its
hex name without having to call oid_to_hex or explicitly refer to
the_hash_algo. Add helper functions that format these values into
static buffers and return them for easy use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Adjust struct index_state to use struct object_id instead of unsigned
char [20].
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the various functions for freshening objects and
has_loose_object_nonlocal to use struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sha1 member in struct pack_entry is unused except for one instance
in which we store a value in it. Since nobody ever reads this value,
don't bother to compute it and remove the member from struct pack_entry.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree member of struct object_context is unused except in one place
where we write to it. Since there are no users of this member, remove
it.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In various places throughout the codebase, we need to read data into a
struct object_id from a pack or other unsigned char buffer. Add an
inline function that does this based on the current hash algorithm in
use, and use it in several places.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This involves also adapting oid_object_info_extended and a some
internal functions that are used to implement these. It all has to
happen in one patch, because of a single recursive chain of calls visits
all these functions.
oid_object_info_extended is also used in partial clones, which allow
fetching missing objects. As this series will not add the repository
struct to the transport code and fetch_object(), add a TODO note and
omit fetching if a user tries to use a partial clone in a repository
other than the_repository.
Among the functions modified to handle arbitrary repositories,
unpack_entry() is one of them. Note that it still references the globals
"delta_base_cache" and "delta_base_cached", but those are safe to be
referenced (the former is indexed partly by "struct packed_git *", which
is repo-specific, and the latter is only used to limit the size of the
former as an optimization).
Helped-by: Brandon Williams <bmwill@google.com>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow oid_object_info_extended callers
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some codepaths, including the refs API, get and keep relative
paths, that go out of sync when the process does chdir(2). The
chdir-notify API is introduced to let these codepaths adjust these
cached paths to the new current directory.
* jk/relative-directory-fix:
refs: use chdir_notify to update cached relative paths
set_work_tree: use chdir_notify
add chdir-notify API
trace.c: export trace_setup_key
set_git_dir: die when setenv() fails
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
lookup_replace_object is a low-level function that most users of the
object store do not need to use directly.
Move it to replace-object.h to avoid a dependency loop in an upcoming
change to its inline definition that will make use of repository.h.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Enable Git to resolve its own binary location using a variety of
OS-specific and generic methods, including:
- procfs via "/proc/self/exe" (Linux)
- _NSGetExecutablePath (Darwin)
- KERN_PROC_PATHNAME sysctl on BSDs.
- argv0, if absolute (all, including Windows).
This is used to enable RUNTIME_PREFIX support for non-Windows systems,
notably Linux and Darwin. When configured with RUNTIME_PREFIX, Git will
do a best-effort resolution of its executable path and automatically use
this as its "exec_path" for relative helper and data lookups, unless
explicitly overridden.
Small incidental formatting cleanup of "exec_cmd.c".
Signed-off-by: Dan Jacques <dnj@google.com>
Thanks-to: Robbie Iannucci <iannucci@google.com>
Thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.
Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.
* sb/object-store: (27 commits)
sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
sha1_file: allow map_sha1_file to handle arbitrary repositories
sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
sha1_file: allow open_sha1_file to handle arbitrary repositories
sha1_file: allow stat_sha1_file to handle arbitrary repositories
sha1_file: allow sha1_file_name to handle arbitrary repositories
sha1_file: add repository argument to sha1_loose_object_info
sha1_file: add repository argument to map_sha1_file
sha1_file: add repository argument to map_sha1_file_1
sha1_file: add repository argument to open_sha1_file
sha1_file: add repository argument to stat_sha1_file
sha1_file: add repository argument to sha1_file_name
sha1_file: allow prepare_alt_odb to handle arbitrary repositories
sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
sha1_file: add repository argument to prepare_alt_odb
sha1_file: add repository argument to link_alt_odb_entries
sha1_file: add repository argument to read_info_alternates
sha1_file: add repository argument to link_alt_odb_entry
sha1_file: add raw_object_store argument to alt_odb_usable
pack: move approximate object count to object store
...
The commit graph feature is controlled by the new core.commitGraph config
setting. This defaults to 0, so the feature is opt-in.
The intention of core.commitGraph is that a user can always stop checking
for or parsing commit graph files if core.commitGraph=0.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up for the "repository" abstraction.
* nd/remove-ignore-env-field:
repository.h: add comment and clarify repo_set_gitdir
repository: delete ignore_env member
sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
repository.c: delete dead functions
repository.c: move env-related setup code back to environment.c
repository: initialize the_repository in main()
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
A "git fetch" from a repository with insane number of refs into a
repository that is already up-to-date still wasted too many cycles
making many lstat(2) calls to see if these objects at the tips
exist as loose objects locally. These lstat(2) calls are optimized
away by enumerating all loose objects beforehand.
It is unknown if the new strategy negatively affects existing use
cases, fetching into a repository with many loose objects from a
repository with small number of refs.
* ti/fetch-everything-local-optim:
fetch-pack.c: use oidset to check existence of loose object
The set_git_dir() function returns an error if setenv()
fails, but there are zero callers who pay attention to this
return value. If this ever were to happen, it could cause
confusing results, as sub-processes would see a potentially
stale GIT_DIR (e.g., if it is relative and we chdir()-ed to
the root of the working tree).
We _could_ try to fix each caller, but there's really
nothing useful to do after this failure except die. Let's
just lump setenv() failure into the same category as malloc
failure: things that should never happen and cause us to
abort catastrophically.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow map_sha1_file callers to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
While at it, move the declaration to object-store.h, where it should
be easier to find.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow sha1_file_name callers to be more
specific about which repository to handle. This is a small mechanical
change; it doesn't change the implementation to handle repositories
other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
While at it, move the declaration to object-store.h, where it should
be easier to find.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.
[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Migrate the struct alternate_object_database and all its related
functions to the object store as these functions are easier found in
that header. The migration is just a verbatim copy, no need to
include the object store header at any C file, because cache.h includes
repository.h which in turn includes the object-store.h
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split-index mode had a few corner case bugs fixed.
* tg/split-index-fixes:
travis: run tests with GIT_TEST_SPLIT_INDEX
split-index: don't write cache tree with null oid entries
read-cache: fix reading the shared index for other repos
Internal API clean-up to allow write_locked_index() optionally skip
writing the in-core index when it is not modified.
* ma/skip-writing-unchanged-index:
write_locked_index(): add flag to avoid writing unchanged index
When fetching from a repository with large number of refs, because to
check existence of each refs in local repository to packed and loose
objects, 'git fetch' ends up doing a lot of lstat(2) to non-existing
loose form, which makes it slow.
Instead of making as many lstat(2) calls as the refs the remote side
advertised to see if these objects exist in the loose form, first
enumerate all the existing loose objects in hashmap beforehand and use
it to check existence of them if the number of refs is larger than the
number of loose objects.
With this patch, the number of lstat(2) calls in `git fetch` is reduced
from 411412 to 13794 for chromium repository, it has more than 480000
remote refs.
I took time stat of `git fetch` when fetch-pack happens for chromium
repository 3 times on linux with SSD.
* with this patch
8.105s
8.309s
7.640s
avg: 8.018s
* master
12.287s
11.175s
12.227s
avg: 11.896s
On my MacBook Air which has slower lstat(2).
* with this patch
14.501s
* master
1m16.027s
`git fetch` on slow disk will be improved largely.
Signed-off-by: Takuto Ikuta <tikuta@chromium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert both the argument and the return value to be pointers to struct
object_id. Update the callers and their internals to deal with the new
type. Remove several temporaries which are no longer needed.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names. Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:
@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)
@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)
@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function to take a pointer to struct object_id and rename
it to assert_oid_type.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function to take a pointer to struct object_id and rename
it check_object_signature. Introduce temporaries to convert the return
values of lookup_replace_object and lookup_replace_object_extended into
struct object_id.
The temporaries are needed because in order to convert
lookup_replace_object, open_istream needs to be converted, and
open_istream needs check_sha1_signature to be converted, causing a loop
of dependencies. The temporaries will be removed in a future patch.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert find_unique_abbrev and find_unique_abbrev_r to each take a
pointer to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It does not make sense that generic repository code contains handling
of environment variables, which are specific for the main repository
only. Refactor repo_set_gitdir() function to take $GIT_DIR and
optionally _all_ other customizable paths. These optional paths can be
NULL and will be calculated according to the default directory layout.
Note that some dead functions are left behind to reduce diff
noise. They will be deleted in the next patch.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have several callers like
if (active_cache_changed && write_locked_index(...))
handle_error();
rollback_lock_file(...);
where the final rollback is needed because "!active_cache_changed"
shortcuts the if-expression. There are also a few variants of this,
including some if-else constructs that make it more clear when the
explicit rollback is really needed.
Teach `write_locked_index()` to take a new flag SKIP_IF_UNCHANGED and
simplify the callers. Leave the most complicated of the callers (in
builtin/update-index.c) unchanged. Rewriting it to use this new flag
would end up duplicating logic.
We could have made the new flag behave the other way round
("FORCE_WRITE"), but that could break existing users behind their backs.
Let's take the more conservative approach. We can still migrate existing
callers to use our new flag. Later we might even be able to flip the
default, possibly without entirely ignoring the risk to in-flight or
out-of-tree topics.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More abstraction of hash function from the codepath.
* bc/hash-algo:
hash: update obsolete reference to SHA1_HEADER
bulk-checkin: abstract SHA-1 usage
csum-file: abstract uses of SHA-1
csum-file: rename sha1file to hashfile
read-cache: abstract away uses of SHA-1
pack-write: switch various SHA-1 values to abstract forms
pack-check: convert various uses of SHA-1 to abstract forms
fast-import: switch various uses of SHA-1 to the_hash_algo
sha1_file: switch uses of SHA-1 to the_hash_algo
builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
builtin/index-pack: improve hash function abstraction
hash: create union for hash context allocation
hash: move SHA-1 macros to hash.h
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The split-index mode had a few corner case bugs fixed.
* tg/split-index-fixes:
travis: run tests with GIT_TEST_SPLIT_INDEX
split-index: don't write cache tree with null oid entries
read-cache: fix reading the shared index for other repos
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.
* gs/retire-mru:
mru: Replace mru.[ch] with list.h implementation
The machinery to clone & fetch, which in turn involves packing and
unpacking objects, have been told how to omit certain objects using
the filtering mechanism introduced by the jh/object-filtering
topic, and also mark the resulting pack as a promisor pack to
tolerate missing objects, taking advantage of the mechanism
introduced by the jh/fsck-promisors topic.
* jh/partial-clone:
t5616: test bulk prefetch after partial fetch
fetch: inherit filter-spec from partial clone
t5616: end-to-end tests for partial clone
fetch-pack: restore save_commit_buffer after use
unpack-trees: batch fetching of missing blobs
clone: partial clone
partial-clone: define partial clone settings in config
fetch: support filters
fetch: refactor calculation of remote list
fetch-pack: test support excluding large blobs
fetch-pack: add --no-filter
fetch-pack, index-pack, transport: partial clone
upload-pack: add object filtering for partial clone
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.
* jh/fsck-promisors:
gc: do not repack promisor packfiles
rev-list: support termination at promisor objects
sha1_file: support lazily fetching missing objects
introduce fetch-object: fetch one promisor object
index-pack: refactor writing of .keep files
fsck: support promisor objects as CLI argument
fsck: support referenced promisor objects
fsck: support refs pointing to promisor objects
fsck: introduce partialclone extension
extension.partialclone: introduce partial clone extension
Most of the other code dealing with SHA-1 and other hashes is located in
hash.h, which is in turn loaded by cache.h. Move the SHA-1 macros to
hash.h as well, so we can use them in additional hash-related items in
the future.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was already converted to use struct object_id earlier.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of force_object_loose to
struct object_id and adjust usage of this function.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of write_sha1_file to
struct object_id and adjust usage of this function.
This commit also converts static function write_sha1_file_prepare, as it
is closely related.
Rename these functions to write_object_file and
write_object_file_prepare respectively.
Replace sha1_to_hex, hashcpy and hashclr with their oid equivalents
wherever possible.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As long as GIT_SHA1_RAWSZ is equal to GIT_MAX_RAWSZ there's no problem,
but when new hashing algorithm will be in place this memset will clear
only 20-byte prefix of hash buffer.
Alternatively, hashclr implementation could be adjusted, but this
function is almost removed from codebase already. Separate
implementation of oidclr prevents potential buffer overrun in case
someone incorrectly used hashclr on object_id in future.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.
Rename this function to hash_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration of struct sha1_stat. Adjust all usages of this
struct and replace hash{clr,cmp,cpy} with oid{clr,cmp,cpy} wherever
possible. Rename it to struct oid_stat.
Rename static function load_sha1_stat to load_oid_stat.
Remove macro EMPTY_BLOB_SHA1_BIN, as it's no longer used.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the declaration and definition of pretend_sha1_file to use
struct object_id and adjust all usages of this function. Rename it to
pretend_object_file.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a96d3cc3f6 ("cache-tree: reject entries with null sha1", 2017-04-21)
we made sure that broken cache entries do not get propagated to new
trees. Part of that was making sure not to re-use an existing cache
tree that includes a null oid.
It did so by dropping the cache tree in 'do_write_index()' if one of
the entries contains a null oid. In split index mode however, there
are two invocations to 'do_write_index()', one for the shared index
and one for the split index. The cache tree is only written once, to
the split index.
As we only loop through the elements that are effectively being
written by the current invocation, that may not include the entry with
a null oid in the split index (when it is already written to the
shared index), where we write the cache tree. Therefore in split
index mode we may still end up writing the cache tree, even though
there is an entry with a null oid in the index.
Fix this by checking for null oids in prepare_to_write_split_index,
where we loop the entries of the shared index as well as the entries for
the split index.
This fixes t7009 with GIT_TEST_SPLIT_INDEX. Also add a new test that's
more specifically showing the problem.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_index_from() takes a path argument for the location of the index
file. For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.
This works as long as an index in the_repository is read. Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.
For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.
be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.
Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.
read_cache_from() defaults to using the gitdir of the_repository. As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().
Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using a static buffer in sha1_file_name() is error prone
and the performance improvements it gives are not needed
in many of the callers.
So let's get rid of this static buffer and, if necessary
or helpful, let's use one in the caller.
Suggested-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ew/empty-merge-with-dirty-index-maint:
merge-recursive: avoid incorporating uncommitted changes in a merge
move index_has_changes() from builtin/am.c to merge.c for reuse
t6044: recursive can silently incorporate dirty changes in a merge
index_has_changes() is a function we want to reuse outside of just am,
making it also available for merge-recursive and merge-ort.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git shows a message to tell the user that it is waiting for the
user to finish editing when spawning an editor, in case the editor
opens to a hidden window or somewhere obscure and the user gets
lost.
* ls/editor-waiting-message:
launch_editor(): indicate that Git waits for user input
refactor "dumb" terminal determination
Ancient part of codebase still shows dots after an abbreviated
object name just to show that it is not a full object name, but
these ellipses are confusing to people who newly discovered Git
who are used to seeing abbreviated object names and find them
confusing with the range syntax.
* ar/unconfuse-three-dots:
t2020: test variations that matter
t4013: test new output from diff --abbrev --raw
diff: diff_aligned_abbrev: remove ellipsis after abbreviated SHA-1 value
t4013: prepare for upcoming "diff --raw --abbrev" output format change
checkout: describe_detached_head: remove ellipsis after committish
print_sha1_ellipsis: introduce helper
Documentation: user-manual: limit usage of ellipsis
Documentation: revisions: fix typo: "three dot" ---> "three-dot" (in line with "two-dot").
An infrastructure to define what hash function is used in Git is
introduced, and an effort to plumb that throughout various
codepaths has been started.
* bc/hash-algo:
repository: fix a sparse 'using integer as NULL pointer' warning
Switch empty tree and blob lookups to use hash abstraction
Integrate hash algorithm support with repo setup
Add structure representing hash algorithm
setup: expose enumerated repo info
Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.
The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.
However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.
In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
(1) functions in sha1_file.c that take in a hash, without the user
regarding how the object is stored (loose or packed)
(2) functions in packfile.c (because I need to check callers that know
about the loose/packed distinction and operate on both differently,
and ensure that they can handle the concept of objects that are
neither loose nor packed)
(1) is handled by the modification to sha1_object_info_extended().
For (2), I looked at for_each_packed_object and others. For
for_each_packed_object, the callers either already work or are fixed in
this patch:
- reachable - only to find recent objects
- builtin/fsck - already knows about missing objects
- builtin/cat-file - warning message added in this commit
Callers of the other functions do not need to be changed:
- parse_pack_index
- http - indirectly from http_get_info_packs
- find_pack_entry_one
- this searches a single pack that is provided as an argument; the
caller already knows (through other means) that the sought object
is in a specific pack
- find_sha1_pack
- fast-import - appears to be an optimization to not store a file if
it is already in a pack
- http-walker - to search through a struct alt_base
- http-push - to search through remote packs
- has_sha1_pack
- builtin/fsck - already knows about promisor objects
- builtin/count-objects - informational purposes only (check if loose
object is also packed)
- builtin/prune-packed - check if object to be pruned is packed (if
not, don't prune it)
- revision - used to exclude packed objects if requested by user
- diff - just for optimization
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new mechanism to upgrade the wire protocol in place is proposed
and demonstrated that it works with the older versions of Git
without harming them.
* bw/protocol-v1:
Documentation: document Extra Parameters
ssh: introduce a 'simple' ssh variant
i5700: add interop test for protocol transition
http: tell server that the client understands v1
connect: tell server that the client understands v1
connect: teach client to recognize v1 server response
upload-pack, receive-pack: introduce protocol version 1
daemon: recognize hidden request arguments
protocol: introduce protocol extension mechanisms
pkt-line: add packet_write function
connect: in ref advertisement, shallows are last
Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage. In such an arrangement, the full set of objects is usually
available in remote storage, ready to be lazily downloaded.
Teach fsck about the new state of affairs. In this commit, teach fsck
that missing promisor objects referenced from the reflog are not an
error case; in future commits, fsck will be taught about other cases.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce new repository extension option:
`extensions.partialclone`
See the update to Documentation/technical/repository-version.txt
in this patch for more information.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code to detect "dumb" terminals into a single location. This
avoids duplicating the terminal detection code yet again in a subsequent
commit.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a helper print_sha1_ellipsis() that pays attention to the
GIT_PRINT_SHA1_ELLIPSIS environment variable, and prepare the tests to
unconditionally set it for the test pieces that will be broken once the code
stops showing the extra dots by default.
The removal of these dots is merely a plan at this step and has not happened
yet but soon will.
Document GIT_PRINT_SHA1_ELLIPSIS.
Signed-off-by: Ann T Ropea <bedhanger@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git add --renormalize ." is a new and safer way to record the fact
that you are correcting the end-of-line convention and other
"convert_to_git()" glitches in the in-repository data.
* tb/add-renormalize:
add: introduce "--renormalize"
Various fixes to bp/fsmonitor topic.
* av/fsmonitor:
fsmonitor: simplify determining the git worktree under Windows
fsmonitor: store fsmonitor bitmap before splitting index
fsmonitor: read from getcwd(), not the PWD environment variable
fsmonitor: delay updating state until after split index is merged
fsmonitor: document GIT_TRACE_FSMONITOR
fsmonitor: don't bother pretty-printing JSON from watchman
fsmonitor: set the PWD to the top of the working tree
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Make it safer to normalize the line endings in a repository.
Files that had been commited with CRLF will be commited with LF.
The old way to normalize a repo was like this:
# Make sure that there are not untracked files
$ echo "* text=auto" >.gitattributes
$ git read-tree --empty
$ git add .
$ git commit -m "Introduce end-of-line normalization"
The user must make sure that there are no untracked files,
otherwise they would have been added and tracked from now on.
The new "add --renormalize" does not add untracked files:
$ echo "* text=auto" >.gitattributes
$ git add --renormalize .
$ git commit -m "Introduce end-of-line normalization"
Note that "git add --renormalize <pathspec>" is the short form for
"git add -u --renormalize <pathspec>".
While at it, document that the same renormalization may be needed,
whenever a clean filter is added or changed.
Helped-By: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Drop (perhaps overly cautious) sanity check before using the index
read from the filesystem at runtime.
* bp/read-index-from-skip-verification:
read_index_from(): speed index loading by skipping verification of the entry order
Switch the uses of empty_tree_oid and empty_blob_oid to use the
current_hash abstraction that represents the current hash algorithm in
use.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In future versions of Git, we plan to support an additional hash
algorithm. Integrate the enumeration of hash algorithms with repository
setup, and store a pointer to the enumerated data in struct repository.
Of course, we currently only support SHA-1, so hard-code this value in
read_repository_format. In the future, we'll enumerate this value from
the configuration.
Add a constant, the_hash_algo, which points to the hash_algo structure
pointer in the repository global. Note that this is the hash which is
used to serialize data to disk, not the hash which is used to display
items to the user. The transition plan anticipates that these may be
different. We can add an additional element in the future (say,
ui_hash_algo) to provide for this case.
Include repository.h in cache.h since we now need to have access to
these struct and variable definitions.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is code in post_read_index_from() to catch out of order
entries when reading an index file. This order verification is ~13%
of the cost of every call to read_index_from().
Update check_ce_order() so that it skips this verification unless
the "verify_ce_order" global variable is set.
Teach fsck to force this verification.
The effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD HEAD~1
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.41(0.04+0.04) 0.50(0.00+0.10) +22.0%
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An earlier update made it possible to use an on-stack in-core
lockfile structure (as opposed to having to deliberately leak an
on-heap one). Many codepaths have been updated to take advantage
of this new facility.
* ma/lockfile-fixes:
read_cache: roll back lock in `update_index_if_able()`
read-cache: leave lock in right state in `write_locked_index()`
read-cache: drop explicit `CLOSE_LOCK`-flag
cache.h: document `write_locked_index()`
apply: remove `newfd` from `struct apply_state`
apply: move lockfile into `apply_state`
cache-tree: simplify locking logic
checkout-index: simplify locking logic
tempfile: fix documentation on `delete_tempfile()`
lockfile: fix documentation on `close_lock_file_gently()`
treewide: prefer lockfiles on the stack
sha1_file: do not leak `lock_file`
If the fsmonitor extension is used in conjunction with the split index
extension, the set of entries in the index when it is first loaded is
only a subset of the real index. This leads to only the non-"base"
index being marked as CE_FSMONITOR_VALID.
Delay the expansion of the ewah bitmap until after tweak_split_index
has been called to merge in the base index as well.
The new fsmonitor_dirty is kept from being leaked by dint of being
cleaned up in post_read_index_from, which is guaranteed to be called
after do_read_index in read_index_from.
Signed-off-by: Alex Vandiver <alexmv@dropbox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make the function for converting pairs of hexadecimal digits to binary
available to other call sites.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tell a server that protocol v1 can be used by sending the http header
'Git-Protocol' with 'version=1' indicating this.
Also teach the apache http server to pass through the 'Git-Protocol'
header as an environment variable 'GIT_PROTOCOL'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create protocol.{c,h} and provide functions which future servers and
clients can use to determine which protocol to use or is being used.
Also introduce the 'GIT_PROTOCOL' environment variable which will be
used to communicate a colon separated list of keys with optional values
to a server. Unknown keys and values must be tolerated. This mechanism
is used to communicate which version of the wire protocol a client would
like to use with a server.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`update_index_if_able()` used to always commit the lock or roll it back.
Commit 03b866477 (read-cache: new API write_locked_index instead of
write_index/write_cache, 2014-06-13) stopped rolling it back in case a
write was not even attempted. This change in behavior is not motivated
in the commit message and appears to be accidental: the `else`-path was
removed, although that changed the behavior in case the `if` shortcuts.
Reintroduce the rollback and document this behavior. While at it, move
the documentation on this function from the function definition to the
function declaration in cache.h.
If `write_locked_index(..., COMMIT_LOCK)` fails, it will roll back the
lock for us (see the previous commit).
Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).
In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.
Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.
Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.
Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).
Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.
We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).
Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.
This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The next patches will tweak the behavior of this function. Document it
in order to establish a basis for those patches.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some commands (most notably "git status") makes an opportunistic
update when performing a read-only operation to help optimize later
operations in the same repository. The new "--no-optional-locks"
option can be passed to Git to disable them.
* jk/no-optional-locks:
git: add --no-optional-locks option
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.
But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.
There are a couple possible solutions:
1. Have some kind of "pseudo-lock" that allows other
commands to tell status that they want the lock.
This is likely to be complicated and error-prone to
implement (and maybe even impossible with just
dotlocks to work from, as it requires some
inter-process communication).
2. Avoid background runs of commands like "git status"
that want to do opportunistic updates, preferring
instead plumbing like diff-files, etc.
This is awkward for a couple of reasons. One is that
"status --porcelain" reports a lot more about the
repository state than is available from individual
plumbing commands. And two is that we actually _do_
want to see the refreshed index. We just don't want to
take a lock or write out the result. Whereas commands
like diff-files expect us to refresh the index
separately and write it to disk so that they can depend
on the result. But that write is exactly what we're
trying to avoid.
3. Ask "status" not to lock or write the index.
This is easy to implement. The big downside is that any
work done in refreshing the index for such a call is
lost when the process exits. So a background process
may end up re-hashing a changed file multiple times
until the user runs a command that does an index
refresh themselves.
This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.
The reason there is two-fold:
1. An environment variable is carried through to
sub-processes. And whether an invocation is a
background process or not should apply to the whole
process tree. So you could do "git --no-optional-locks
foo", and if "foo" is a script or alias that calls
"status", you'll still get the effect.
2. There may be other programs that want the same
treatment.
I've punted here on finding more callers to convert,
since "status" is the obvious one to call as a repeated
background job. But "git diff"'s opportunistic refresh
of the index may be a good candidate.
The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:
Note that the regression test added in this commit does
not *really* verify that no index.lock file was written;
that test is not possible in a portable way. Instead, we
verify that .git/index is rewritten *only* when `git
status` is run without `--no-optional-locks`.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MRU cache that keeps track of recently used packs is represented
using two global variables:
struct mru packed_git_mru_storage;
struct mru *packed_git_mru = &packed_git_mru_storage;
Callers never assign to the packed_git_mru pointer, though, so we can
simplify by eliminating it and using &packed_git_mru_storage (renamed
to &packed_git_mru) directly. This variable is only used by the
packfile subsystem, making this a relatively uninvasive change (and
any new unadapted callers would trigger a compile error).
Noticed while moving these globals to the object_store struct.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" has been taught to optionally paint new lines that are
the same as deleted lines elsewhere differently from genuinely new
lines.
* sb/diff-color-move: (25 commits)
diff: document the new --color-moved setting
diff.c: add dimming to moved line detection
diff.c: color moved lines differently, plain mode
diff.c: color moved lines differently
diff.c: buffer all output if asked to
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_SUMMARY
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_STAT_SEP
diff.c: convert word diffing to use emit_diff_symbol
diff.c: convert show_stats to use emit_diff_symbol
diff.c: convert emit_binary_diff_body to use emit_diff_symbol
submodule.c: migrate diff output to use emit_diff_symbol
diff.c: emit_diff_symbol learns DIFF_SYMBOL_REWRITE_DIFF
diff.c: emit_diff_symbol learns about DIFF_SYMBOL_BINARY_FILES
diff.c: emit_diff_symbol learns DIFF_SYMBOL_HEADER
diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR_{PLUS, MINUS}
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_INCOMPLETE
diff.c: emit_diff_symbol learns DIFF_SYMBOL_WORDS[_PORCELAIN]
diff.c: migrate emit_line_checked to use emit_diff_symbol
diff.c: emit_diff_symbol learns DIFF_SYMBOL_NO_LF_EOF
diff.c: emit_diff_symbol learns DIFF_SYMBOL_CONTEXT_FRAGINFO
...
Both sha1_file.c and packfile.c now need read_object(), so a copy of
read_object() was created in packfile.c.
This patch makes both mark_bad_packed_object() and has_packed_and_bad()
global. Unlike most of the other patches in this series, these 2
functions need to remain global.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.
Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.
In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jt/subprocess-handshake:
sub-process: refactor handshake to common function
Documentation: migrate sub-process docs to header
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).
* bw/grep-recurse-submodules:
grep: recurse in-process using 'struct repository'
submodule: merge repo_read_gitmodules and gitmodules_config
submodule: check for unmerged .gitmodules outside of config parsing
submodule: check for unstaged .gitmodules outside of config parsing
submodule: remove fetch.recursesubmodules from submodule-config parsing
submodule: remove submodule.fetchjobs from submodule-config parsing
config: add config_from_gitmodules
cache.h: add GITMODULES_FILE macro
repository: have the_repository use the_index
repo_read_index: don't discard the index
read_info_alternates is not used from outside, so let's make it static.
We have to declare the function before link_alt_odb_entry instead of
moving the code around, link_alt_odb_entry calls read_info_alternates,
which in turn calls link_alt_odb_entry.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The filter-process interface learned to allow a process with long
latency give a "delayed" response.
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
Conversion from uchar[20] to struct object_id continues.
* bc/object-id:
sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
sha1_name: convert GET_SHA1* flags to GET_OID*
sha1_name: convert get_sha1* to get_oid*
Convert remaining callers of get_sha1 to get_oid.
builtin/unpack-file: convert to struct object_id
bisect: convert bisect_checkout to struct object_id
builtin/update_ref: convert to struct object_id
sequencer: convert to struct object_id
remote: convert struct push_cas to struct object_id
submodule: convert submodule config lookup to use object_id
builtin/merge-tree: convert remaining caller of get_sha1 to object_id
builtin/fsck: convert remaining caller of get_sha1 to object_id
In 1a812f3a70 (hashcmp(): inline memcmp() by hand to
optimize, 2011-04-28), it was reported that an open-coded
loop outperformed memcmp() for comparing sha1s.
Discussion[1] a few years later in 2013 showed that this
depends on your libc's version of memcmp(). In particular,
glibc 2.13 optimized their memcmp around 2011. Here are
current timings with glibc 2.24 (best-of-five, on
linux.git):
[before this patch, open-coded]
$ time git rev-list --objects --all
real 0m35.357s
user 0m35.016s
sys 0m0.340s
[after this patch, memcmp]
real 0m32.930s
user 0m32.630s
sys 0m0.300s
Now that we've had 6 years for that version of glibc to
make its way onto people's machines, it's worth revisiting
our benchmarks and switching to memcmp().
It may be that there are other non-glibc systems where
memcmp() isn't as well optimized. But since our single data
point in favor of open-coding was on a now-ancient glibc, we
should probably assume the system memcmp is good unless
proven otherwise. We may end up with a SLOW_MEMCMP Makefile
knob, but we can hold off on that until we actually find
such a system in practice.
[1] https://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a macro to be used when specifying the '.gitmodules' file and
convert any existing hard coded '.gitmodules' file strings to use the
new macro.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
Convert the flags for get_oid_with_context and friends to use "OID"
instead of "SHA1" in their names.
This transform was made by running the following one-liner on the
affected files:
perl -pi -e 's/GET_SHA1/GET_OID/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize "what are the object names already taken in an alternate
object database?" query that is used to derive the length of prefix
an object name is uniquely abbreviated to.
* rs/sha1-name-readdir-optim:
sha1_file: guard against invalid loose subdirectory numbers
sha1_file: let for_each_file_in_obj_subdir() handle subdir names
p4205: add perf test script for pretty log formats
sha1_name: cache readdir(3) results in find_short_object_filename()
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
Take a note in cahe.c as well, when this ws rules are extended we have
to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)
The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve sha1_object_info_extended() by supporting additional
flags. This allows has_sha1_file_with_flags() to be modified to use
sha1_object_info_extended() in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Code clean-up.
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
Loose object subdirectories have hexadecimal names based on the first
byte of the hash of contained objects, thus their numerical
representation can range from 0 (0x00) to 255 (0xff). Change the type
of the corresponding variable in for_each_file_in_obj_subdir() and
associated callback functions to unsigned int and add a range check.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all path related declarations from cache.h to a new path.h header
file. This makes cache.h smaller and makes it easier to add new path
related functions.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Migrate 'git_dir', 'git_common_dir', 'git_object_dir', 'git_index_file',
'git_graft_file', and 'namespace' to be stored in 'the_repository'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under some circumstances (bogus GIT_DIR value or the discovered gitdir
is '.git') 'setup_git_directory()' won't initialize key repository
state. This leads to inconsistent state after running the setup code.
To account for this inconsistent state, lazy initialization is done once
a caller asks for the repository's gitdir or some other piece of
repository state. This is confusing and can be error prone.
Instead let's tighten the expected outcome of 'setup_git_directory()'
and ensure that it initializes repository state in all cases that would
have been handled by lazy initialization.
This also lets us drop the requirement to have 'have_git_dir()' check if
the environment variable GIT_DIR was set as that will be handled by the
end of the setup code.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Read each loose object subdirectory at most once when looking for unique
abbreviated hashes. This speeds up commands like "git log --pretty=%h"
considerably, which previously caused one readdir(3) call for each
candidate, even for subdirectories that were visited before.
The new cache is kept until the program ends and never invalidated. The
same is already true for pack indexes. The inherent racy nature of
finding unique short hashes makes it still fit for this purpose -- a
conflicting new object may be added at any time. Tasks with higher
consistency requirements should not use it, though.
The cached object names are stored in an oid_array, which is quite
compact. The bitmap for remembering which subdir was already read is
stored as a char array, with one char per directory -- that's not quite
as compact, but really simple and incurs only an overhead equivalent to
11 hashes after all.
Suggested-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_object() and sha1_object_info_extended() both implement mechanisms
such as object replacement, retrying the packed store after failing to
find the object in the packed store then the loose store, and being able
to mark a packed object as bad and then retrying the whole process.
Consolidating these mechanisms would be a great help to maintainability.
Therefore, consolidate them by extending sha1_object_info_extended() to
support the functionality needed, and then modifying read_object() to
use sha1_object_info_extended().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The LOOKUP_REPLACE_OBJECT flag controls whether the
lookup_replace_object() function is invoked by
sha1_object_info_extended(), read_sha1_file_extended(), and
lookup_replace_object_extended(), but it is not immediately clear which
functions accept that flag.
Therefore restrict this flag to only sha1_object_info_extended(),
renaming it appropriately to OBJECT_INFO_LOOKUP_REPLACE and adding some
documentation. Update read_sha1_file_extended() to have a boolean
parameter instead, and delete lookup_replace_object_extended().
parse_sha1_header() also passes this flag to
parse_sha1_header_extended() since commit 46f0344 ("sha1_file: support
reading from a loose object of unknown type", 2015-05-03), but that has
had no effect since that commit. Therefore this patch also removes this
flag from that invocation.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The LOOKUP_UNKNOWN_OBJECT flag was introduced in commit 46f0344
("sha1_file: support reading from a loose object of unknown type",
2015-05-03) in order to support a feature in cat-file subsequently
introduced in commit 39e4ae3 ("cat-file: teach cat-file a
'--allow-unknown-type' option", 2015-05-03). Despite its name and
location in cache.h, this flag is used neither in
read_sha1_file_extended() nor in any of the lookup functions, but used
only in sha1_object_info_extended().
Therefore rename this flag to OBJECT_INFO_ALLOW_UNKNOWN_TYPE, taking the
name of the cat-file flag that invokes this feature, and move it closer
to the declaration of sha1_object_info_extended(). Also add
documentation for this flag.
OBJECT_INFO_ALLOW_UNKNOWN_TYPE is defined to 2, not 1, to avoid
conflicting with LOOKUP_REPLACE_OBJECT. Avoidance of this conflict is
necessary because sha1_object_info_extended() supports both flags.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently 'discover_git_directory' only looks at the gitdir to determine
if a git directory was discovered. This causes a problem in the event
that the gitdir which was discovered was in fact a per-worktree git
directory and not the common git directory. This is because the
repository config, which is checked to verify the repository's format,
is stored in the commondir and not in the per-worktree gitdir. Correct
this behavior by checking the config stored in the commondir.
It will also be of use for callers to have access to the commondir, so
lets also return that upon successfully discovering a git directory.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all config related declarations from cache.h to a new config.h
header file. This makes cache.h smaller and allows for the opportunity
in a following patch to only include config.h when needed.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The result from "git diff" that compares two blobs, e.g. "git diff
$commit1:$path $commit2:$path", used to be shown with the full
object name as given on the command line, but it is more natural to
use the $path in the output and use it to look up .gitattributes.
* jk/diff-blob:
diff: use blob path for blob/file diffs
diff: use pending "path" if it is available
diff: use the word "path" instead of "name" for blobs
diff: pass whole pending entry in blobinfo
handle_revision_arg: record paths for pending objects
handle_revision_arg: record modes for "a..b" endpoints
t4063: add tests of direct blob diffs
get_sha1_with_context: dynamically allocate oc->path
get_sha1_with_context: always initialize oc->symlink_path
sha1_name: consistently refer to object_context as "oc"
handle_revision_arg: add handle_dotdot() helper
handle_revision_arg: hoist ".." check out of range parsing
handle_revision_arg: stop using "dotdot" as a generic pointer
handle_revision_arg: simplify commit reference lookups
handle_revision_arg: reset "dotdot" consistently
Convert the remaining parts of grep to use struct object_id.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git checkout", "git merge", etc. manipulates the in-core
index, various pieces of information in the index extensions are
discarded from the original state, as it is usually not the case
that they are kept up-to-date and in-sync with the operation on the
main index. The untracked cache extension is copied across these
operations now, which would speed up "git status" (as long as the
cache is properly invalidated).
* dt/unpack-save-untracked-cache-extension:
unpack-trees: preserve index extensions
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
When a sha1 lookup returns the tree path via "struct
object_context", it just copies it into a fixed-size buffer.
This means the result can be truncated, and it means our
"struct object_context" consumes a lot of stack space.
Instead, let's allocate a string on the heap. Because most
callers don't care about this information, we'll avoid doing
it by default (so they don't all have to start calling
free() on the result). There are basically two options for
the caller to signal to us that it's interested:
1. By setting a pointer to storage in the object_context.
2. By passing a flag in another parameter.
Doing (1) would match the way that sha1_object_info_extended()
works. But it would mean that every caller would have to
initialize the object_context, which they don't currently
have to do.
This patch does (2), and adds a new bit to the function's
flags field. All of the callers that look at the "path"
field are updated to pass the new flag.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An early version of the patch to add object_context used the
name object_resolve_context. This was later shortened to
just object_context, but the "orc" variable name stuck in a
few places. Let's use "oc", which is used elsewhere in the
code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make git checkout (and other unpack_tree operations) preserve the
untracked cache. This is valuable for two reasons:
1. Often, an unpack_tree operation will not touch large parts of the
working tree, and thus most of the untracked cache will continue to be
valid.
2. Even if the untracked cache were entirely invalidated by such an
operation, the user has signaled their intention to have such a cache,
and we don't want to throw it away.
[jes: backed out the watchman-specific parts]
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some platforms have ulong that is smaller than time_t, and our
historical use of ulong for timestamp would mean they cannot
represent some timestamp that the platform allows. Invent a
separate and dedicated timestamp_t (so that we can distingiuish
timestamps and a vanilla ulongs, which along is already a good
move), and then declare uintmax_t is the type to be used as the
timestamp_t.
* js/larger-timestamps:
archive-tar: fix a sparse 'constant too large' warning
use uintmax_t for timestamps
date.c: abort if the system time cannot handle one of our timestamps
timestamp_t: a new data type for timestamps
PRItime: introduce a new "printf format" for timestamps
parse_timestamp(): specify explicitly where we parse timestamps
t0006 & t5000: skip "far in the future" test when time_t is too limited
t0006 & t5000: prepare for 64-bit timestamps
ref-filter: avoid using `unsigned long` for catch-all data type
Converting checkout_fast_forward is required to convert
parse_tree_indirect.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).
So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.
By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.
As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git checkout" that handles a lot of paths has been optimized by
reducing the number of unnecessary checks of paths in the
has_dir_name() function.
* jh/add-index-entry-optim:
read-cache: speed up has_dir_name (part 2)
read-cache: speed up has_dir_name (part 1)
read-cache: speed up add_index_entry during checkout
p0006-read-tree-checkout: perf test to time read-tree
read-cache: add strcmp_offset function
The recently introduced conditional inclusion of configuration did
not work well when early-config mechanism was involved.
* nd/conditional-config-in-early-config:
config: correct file reading order in read_early_config()
config: handle conditional include when $GIT_DIR is not set up
config: prepare to pass more info in git_config_with_options()
The index file has a trailing SHA-1 checksum to detect file
corruption, and historically we checked it every time the index
file is used. Omit the validation during normal use, and instead
verify only in "git fsck".
* jh/verify-index-checksum-only-in-fsck:
read-cache: force_verify_index_checksum
$GIT_DIR may in some cases be normalized with all symlinks resolved
while "gitdir" path expansion in the pattern does not receive the
same treatment, leading to incorrect mismatch. This has been fixed.
* nd/conditional-config-include:
config: resolve symlinks in conditional include's patterns
path.c: and an option to call real_path() in expand_user_path()
Allow the http.postbuffer configuration variable to be set to a
size that can be expressed in size_t, which can be larger than
ulong on some platforms.
* dt/http-postbuffer-can-be-large:
http.postbuffer: allow full range of ssize_t values
Conversion from unsigned char [40] to struct object_id continues.
* bc/object-id:
Documentation: update and rename api-sha1-array.txt
Rename sha1_array to oid_array
Convert sha1_array_for_each_unique and for_each_abbrev to object_id
Convert sha1_array_lookup to take struct object_id
Convert remaining callers of sha1_array_lookup to object_id
Make sha1_array_append take a struct object_id *
sha1-array: convert internal storage for struct sha1_array to object_id
builtin/pull: convert to struct object_id
submodule: convert check_for_new_submodule_commits to object_id
sha1_name: convert disambiguate_hint_fn to take object_id
sha1_name: convert struct disambiguate_state to object_id
test-sha1-array: convert most code to struct object_id
parse-options-cb: convert sha1_array_append caller to struct object_id
fsck: convert init_skiplist to struct object_id
builtin/receive-pack: convert portions to struct object_id
builtin/pull: convert portions to struct object_id
builtin/diff: convert to struct object_id
Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Define new hash-size constants for allocating memory
If setup_git_directory() and friends have not been called,
get_git_dir() (because of includeIf.gitdir:XXX) would lead to
die("BUG: setup_git_env called without repository");
There are two cases when a config file could be read before $GIT_DIR
is located.
The first one is check_repository_format(), where we read just the one
file $GIT_DIR/config to check if we could understand this
repository. This case should be safe. We do not parse include
directives, which can only be triggered from git_config_with_options,
but setup code uses a lower-level function. The concerned variables
should never be hidden away behind includes anyway.
The second one is triggered in check_pager_config() when we're about
to run an external git command. We might be able to find $GIT_DIR in
this case, which is exactly what read_early_config() does (and also is
what check_pager_config() uses). Conditional includes and
get_git_dir() could be triggered by the first
git_config_with_options() call there, before discover_git_directory()
is used as a fallback $GIT_DIR detection.
Detect this special "early reading" case, pass down the $GIT_DIR,
either from previous setup or detected by discover_git_directory(),
and make conditional include use it.
Noticed-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far we can only pass one flag, respect_includes, to thie function. We
need to pass some more (non-flag even), so let's make it accept a struct
instead of an integer.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/snprintf-cleanups:
daemon: use an argv_array to exec children
gc: replace local buffer with git_path
transport-helper: replace checked snprintf with xsnprintf
convert unchecked snprintf into xsnprintf
combine-diff: replace malloc/snprintf with xstrfmt
replace unchecked snprintf calls with heap buffers
receive-pack: print --pack-header directly into argv array
name-rev: replace static buffer with strbuf
create_branch: use xstrfmt for reflog message
create_branch: move msg setup closer to point of use
avoid using mksnpath for refs
avoid using fixed PATH_MAX buffers for refs
fetch: use heap buffer to format reflog
tag: use strbuf to format tag header
diff: avoid fixed-size buffer for patch-ids
odb_mkstemp: use git_path_buf
odb_mkstemp: write filename into strbuf
do not check odb_mkstemp return value for errors
Add strcmp_offset() function to also return the offset of the
first change.
Add unit test and helper to verify.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.
Teach fsck to force this verification.
The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.
These effect can be seen using t/perf/p0002-read-cache.sh:
Test HEAD~1 HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5%
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the next patch we need the ability to expand '~' to
real_path($HOME). But we can't do that from outside because '~' is part
of a pattern, not a true path. Add an option to expand_user_path() to do
so.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Unfortunately, in order to push some large repos where a server does
not support chunked encoding, the http postbuffer must sometimes
exceed two gigabytes. On a 64-bit system, this is OK: we just malloc
a larger buffer.
This means that we need to use CURLOPT_POSTFIELDSIZE_LARGE to set the
buffer size.
Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sha1_array_for_each_unique take a callback using struct object_id.
Since one of these callbacks is an argument to for_each_abbrev, convert
those as well. Rename various functions, replacing "sha1" with "oid".
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few commands that recently learned the "--recurse-submodule"
option misbehaved when started from a subdirectory of the
superproject.
* bw/recurse-submodules-relative-fix:
ls-files: fix bug when recursing with relative pathspec
ls-files: fix typo in variable name
grep: fix bug when recursing with relative pathspec
setup: allow for prefix to be passed to git commands
grep: fix help text typo
The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.
In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.
The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).
Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.
Signed-off-by: Jeff King <peff@peff.net>
The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.
Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.
So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.
And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.
Signed-off-by: Jeff King <peff@peff.net>
The name-hash used for detecting paths that are different only in
cases (which matter on case insensitive filesystems) has been
optimized to take advantage of multi-threading when it makes sense.
* jh/memihash-opt:
name-hash: add test-lazy-init-name-hash to .gitignore
name-hash: add perf test for lazy_init_name_hash
name-hash: add test-lazy-init-name-hash
name-hash: perf improvement for lazy_init_name_hash
hashmap: document memihash_cont, hashmap_disallow_rehash api
hashmap: add disallow_rehash setting
hashmap: allow memihash computation to be continued
name-hash: specify initial size for istate.dir_hash table
Code clean-up.
* jk/pack-name-cleanups:
index-pack: make pointer-alias fallbacks safer
replace snprintf with odb_pack_name()
odb_pack_keep(): stop generating keepfile name
sha1_file.c: make pack-name helper globally accessible
move odb_* declarations out of git-compat-util.h
"git branch @" created refs/heads/@ as a branch, and in general the
code that handled @{-1} and @{upstream} was a bit too loose in
disambiguating.
* jk/interpret-branch-name:
checkout: restrict @-expansions when finding branch
strbuf_check_ref_format(): expand only local branches
branch: restrict @-expansions when deleting
t3204: test git-branch @-expansion corner cases
interpret_branch_name: allow callers to restrict expansions
strbuf_branchname: add docstring
strbuf_branchname: drop return value
interpret_branch_name: move docstring to header file
interpret_branch_name(): handle auto-namelen for @{-1}
The "parse_config_key()" API function has been cleaned up.
* jk/parse-config-key-cleanup:
parse_hide_refs_config: tell parse_config_key we don't want a subsection
parse_config_key: allow matching single-level config
parse_config_key: use skip_prefix instead of starts_with
refs: parse_hide_refs_config to use parse_config_key
Code clean-up with minor bugfixes.
* jk/prefix-filename:
bundle: use prefix_filename with bundle path
prefix_filename: simplify windows #ifdef
prefix_filename: return newly allocated string
prefix_filename: drop length parameter
prefix_filename: move docstring to header file
hash-object: fix buffer reuse with --path in a subdirectory
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 20 bytes, use the constant
GIT_MAX_RAWSZ, which is designed to be suitable for allocations, instead
of GIT_SHA1_RAWSZ. This will ease the transition down the line by
distinguishing between places where we need to allocate memory suitable
for the largest hash from those where we need to handle the current
hash.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we will want to transition to a new hash at some point in the
future, and that hash may be larger in size than 160 bits, introduce two
constants that can be used for allocating a sufficient amount of memory.
They can be increased to reflect the largest supported hash size.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default location "~/.git-credential-cache/socket" for the
socket used to communicate with the credential-cache daemon has
been moved to "~/.cache/git/credential/socket".
* dl/credential-cache-socket-in-xdg-cache:
credential-cache: add tests for XDG functionality
credential-cache: use XDG_CACHE_HOME for socket
path.c: add xdg_cache_home
Add t/helper/test-lazy-init-name-hash.c test code
to demonstrate performance times for lazy_init_name_hash()
using the original single-threaded and the new multi-threaded
code paths.
Includes a --dump option to dump the created hashmaps to
stdout. You can use this to run both code paths and
confirm that they generate the same hashmaps.
Includes a --analyze option to analyze performance of both
code paths over a range of index sizes to help you find a
lower bound for the LAZY_THREAD_COST in name-hash.c.
For example, passing "-a 4000" will set "istate.cache_nr"
to 4000 and then try the multi-threaded code -- probably
giving 2 threads with 2000 entries each. It will then
run both the single-threaded (1x4000) and the multi-threaded
(2x2000) and compare the times. It will then repeat the
test with 8000, 12000, and etc. so that you can see the
cross over.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/pack-name-cleanups:
index-pack: make pointer-alias fallbacks safer
replace snprintf with odb_pack_name()
odb_pack_keep(): stop generating keepfile name
sha1_file.c: make pack-name helper globally accessible
move odb_* declarations out of git-compat-util.h
The prefix_filename() function returns a pointer to static
storage, which makes it easy to use dangerously. We already
fixed one buggy caller in hash-object recently, and the
calls in apply.c are suspicious (I didn't dig in enough to
confirm that there is a bug, but we call the function once
in apply_all_patches() and then again indirectly from
parse_chunk()).
Let's make it harder to get wrong by allocating the return
value. For simplicity, we'll do this even when the prefix is
empty (and we could just return the original file pointer).
That will cause us to allocate sometimes when we wouldn't
otherwise need to, but this function isn't called in
performance critical code-paths (and it already _might_
allocate on any given call, so a caller that cares about
performance is questionable anyway).
The downside is that the callers need to remember to free()
the result to avoid leaking. Most of them already used
xstrdup() on the result, so we know they are OK. The
remainder have been converted to use free() as appropriate.
I considered retaining a prefix_filename_unsafe() for cases
where we know the static lifetime is OK (and handling the
cleanup is awkward). This is only a handful of cases,
though, and it's not worth the mental energy in worrying
about whether the "unsafe" variant is OK to use in any
situation.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function takes the prefix as a ptr/len pair, but in
every caller the length is exactly strlen(ptr). Let's
simplify the interface and just take the string. This saves
callers specifying it (and in some cases handling a NULL
prefix).
In a handful of cases we had the length already without
calling strlen, so this is technically slower. But it's not
likely to matter (after all, if the prefix is non-empty
we'll allocate and copy it into a buffer anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a public function, so we should make its
documentation available near the declaration.
While we're at it, we can give a few details about how it
works.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The start-up sequence of "git" needs to figure out some configured
settings before it finds and set itself up in the location of the
repository and was quite messy due to its "chicken-and-egg" nature.
The code has been restructured.
* js/early-config:
setup.c: mention unresolved problems
t1309: document cases where we would want early config not to die()
setup_git_directory_gently_1(): avoid die()ing
t1309: test read_early_config()
read_early_config(): really discover .git/
read_early_config(): avoid .git/config hack when unneeded
setup: make read_early_config() reusable
setup: introduce the discover_git_directory() function
setup_git_directory_1(): avoid changing global state
setup: prepare setup_discovered_git_dir() for the root directory
setup_git_directory(): use is_dir_sep() helper
t7006: replace dubious test
Our source code has used the SHA1_HEADER cpp macro after "#include"
in the C code to switch among the SHA-1 implementations. Instead,
list the exact header file names and switch among implementations
using "#ifdef BLK_SHA1/#include "block-sha1/sha1.h"/.../#endif";
this helps some IDE tools.
* bc/sha1-header-selection-with-cpp-macros:
hash.h: move SHA-1 implementation selection into a header file
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.
* cc/split-index-config: (22 commits)
Documentation/git-update-index: explain splitIndex.*
Documentation/config: add splitIndex.sharedIndexExpire
read-cache: use freshen_shared_index() in read_index_from()
read-cache: refactor read_index_from()
t1700: test shared index file expiration
read-cache: unlink old sharedindex files
config: add git_config_get_expiry() from gc.c
read-cache: touch shared index files when used
sha1_file: make check_and_freshen_file() non static
Documentation/config: add splitIndex.maxPercentChange
t1700: add tests for splitIndex.maxPercentChange
read-cache: regenerate shared index if necessary
config: add git_config_get_max_percent_split_change()
Documentation/git-update-index: talk about core.splitIndex config var
Documentation/config: add information for core.splitIndex
t1700: add tests for core.splitIndex
update-index: warn in case of split-index incoherency
read-cache: add and then use tweak_split_index()
split-index: add {add,remove}_split_index() functions
config: add git_config_get_split_index()
...
In a future patch child processes which act on submodules need a little
more context about the original command that was invoked. This patch
teaches git to use the prefix stored in `GIT_INTERNAL_TOPLEVEL_PREFIX`
instead of the prefix that was potentally found during the git directory
setup process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The odb_pack_keep() function generates the name of a .keep
file and opens it. This has two problems:
1. It requires a fixed-size buffer to create the filename
and doesn't notice when the result is truncated.
2. Of the two callers, one sometimes wants to open a
filename it already has, which makes things awkward (it
has to do so manually, and skips the leading-directory
creation).
Instead, let's have odb_pack_keep() just open the file.
Generating the name isn't hard, and a future patch will
switch callers over to odb_pack_name() anyway.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We provide sha1_pack_name() and sha1_pack_index_name(), but
the more generic form (which takes its own strbuf and an
arbitrary extension) is only used to implement the other
two. Let's make it available, but clean up a few things:
1. Name it odb_pack_name(), as the original
sha1_get_pack_name() is long but not all that
descriptive.
2. Switch the strbuf argument to the beginning, so that it
matches similar path-building functions like
git_path_buf().
3. Clean up the out-dated docstring and move it to the
public declaration.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These functions were originally conceived as wrapper
functions similar to xmkstemp(). They were later moved by
463db9b10 (wrapper: move odb_* to environment.c,
2010-11-06). The more appropriate place for a declaration is
in cache.h.
While we're at it, let's add some basic docstrings.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many developers use functionality in their editors that allows for quick
syntax checks, including warning about questionable constructs. This
functionality allows rapid development with fewer errors. However, such
functionality generally does not allow the specification of
project-specific defines or command-line options.
Since the SHA1_HEADER include is not defined in such a case,
developers see spurious errors when using these tools. Furthermore,
there are known implementations of "cc" whose '#include' is unhappy
with this construct.
Instead of using SHA1_HEADER, create a hash.h header and use #if
and #elif to select the desired header. Have the Makefile pass an
appropriate option to help the header select the right implementation to
use.
[jc: make BLK_SHA1 the fallback default as discussed on list,
e.g. <20170314201424.vccij5z2ortq4a4o@sigill.intra.peff.net>; also
remove SHA1_HEADER and SHA1_HEADER_SQ that are no longer used].
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git branch @" created refs/heads/@ as a branch, and in general the
code that handled @{-1} and @{upstream} was a bit too loose in
disambiguating.
* jk/interpret-branch-name:
checkout: restrict @-expansions when finding branch
strbuf_check_ref_format(): expand only local branches
branch: restrict @-expansions when deleting
t3204: test git-branch @-expansion corner cases
interpret_branch_name: allow callers to restrict expansions
strbuf_branchname: add docstring
strbuf_branchname: drop return value
interpret_branch_name: move docstring to header file
interpret_branch_name(): handle auto-namelen for @{-1}
The pager configuration needs to be read early, possibly before
discovering any .git/ directory.
Let's not hide this function in pager.c, but make it available to other
callers.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We modified the setup_git_directory_gently_1() function earlier to make
it possible to discover the GIT_DIR without changing global state.
However, it is still a bit cumbersome to use if you only need to figure
out the (possibly absolute) path of the .git/ directory. Let's just
provide a convenient wrapper function with an easier signature that
*just* discovers the .git/ directory.
We will use it in a subsequent patch to fix the early config.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have xdg_config_home to format paths relative to
XDG_CONFIG_HOME. Let's provide a similar function xdg_cache_home to do
the same for paths relative to XDG_CACHE_HOME.
Signed-off-by: Devin Lehmacher <lehmacdj@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git v2.12 was shipped with an embarrassing breakage where various
operations that verify paths given from the user stopped dying when
seeing an issue, and instead later triggering segfault.
* js/realpath-pathdup-fix:
real_pathdup(): fix callsites that wanted it to die on error
t1501: demonstrate NULL pointer access with invalid GIT_WORK_TREE
The "parse_config_key()" API function has been cleaned up.
* jk/parse-config-key-cleanup:
parse_hide_refs_config: tell parse_config_key we don't want a subsection
parse_config_key: allow matching single-level config
parse_config_key: use skip_prefix instead of starts_with
In 4ac9006f83 (real_path: have callers use real_pathdup and
strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path())
pattern to use real_pathdup() directly.
The problem with this change is that real_path() calls
strbuf_realpath() with die_on_error = 1 while real_pathdup() calls
it with die_on_error = 0. Meaning that in cases where real_path()
causes Git to die() with an error message, real_pathdup() is silent
and returns NULL instead.
The callers, however, are ill-prepared for that change, as they expect
the return value to be non-NULL (and otherwise the function died
with an appropriate error message).
Fix this by extending real_pathdup()'s signature to accept the
die_on_error flag and simply pass it through to strbuf_realpath(),
and then adjust all callers after a careful audit whether they would
handle NULLs well.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The interpret_branch_name() function converts names like
@{-1} and @{upstream} into branch names. The expanded ref
names are not fully qualified, and may be outside of the
refs/heads/ namespace (e.g., "@" expands to "HEAD", and
"@{upstream}" is likely to be in "refs/remotes/").
This is OK for callers like dwim_ref() which are primarily
interested in resolving the resulting name, no matter where
it is. But callers like "git branch" treat the result as a
branch name in refs/heads/. When we expand to a ref outside
that namespace, the results are very confusing (e.g., "git
branch @" tries to create refs/heads/HEAD, which is
nonsense).
Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").
However, out-parameters make the function interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).
The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it.
We can break the callers down into two groups:
1. Callers that are happy with any kind of ref in the
result. We pass "0" here, so they continue to work
without restrictions. This includes merge_name(),
the reflog handling in add_pending_object_with_path(),
and substitute_branch_name(). This last is what powers
dwim_ref().
2. Callers that have funny corner cases (mostly in
git-branch and git-checkout). These need to make use of
the new parameter, but I've left them as "0" in this
patch, and will address them individually in follow-on
patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally put docstrings with function declarations,
because it's the callers who need to know how the function
works. Let's do so for interpret_branch_name().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function will be used in a following commit to get the expiration
time of the shared index files from the config, and it is generic
enough to be put in "config.c".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function will be used in a commit soon, so let's make
it available globally.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new function will be used in a following commit to get the
value of the "splitIndex.maxPercentChange" config variable.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new function will be used in a following commit to know
if we want to use the split index feature or not.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The last caller of git_mkstemp() was removed in commit 6fec0a89
("verify_signed_buffer: use tempfile object", 16-06-2016). Since
the introduction of the 'tempfile' APIs, along with git_mkstemp_mode,
it is unlikely that new callers will materialize. Remove the dead
code.
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Deletion of a branch "foo/bar" could remove .git/refs/heads/foo
once there no longer is any other branch whose name begins with
"foo/", but we didn't do so so far. Now we do.
* mh/ref-remove-empty-directory: (23 commits)
files_transaction_commit(): clean up empty directories
try_remove_empty_parents(): teach to remove parents of reflogs, too
try_remove_empty_parents(): don't trash argument contents
try_remove_empty_parents(): rename parameter "name" -> "refname"
delete_ref_loose(): inline function
delete_ref_loose(): derive loose reference path from lock
log_ref_write_1(): inline function
log_ref_setup(): manage the name of the reflog file internally
log_ref_write_1(): don't depend on logfile argument
log_ref_setup(): pass the open file descriptor back to the caller
log_ref_setup(): improve robustness against races
log_ref_setup(): separate code for create vs non-create
log_ref_write(): inline function
rename_tmp_log(): improve error reporting
rename_tmp_log(): use raceproof_create_file()
lock_ref_sha1_basic(): use raceproof_create_file()
lock_ref_sha1_basic(): inline constant
raceproof_create_file(): new function
safe_create_leading_directories(): set errno on SCLD_EXISTS
safe_create_leading_directories_const(): preserve errno
...
The parse_config_key() function was introduced to make it
easier to match "section.subsection.key" variables. It also
handles the simpler "section.key", and the caller is
responsible for distinguishing the two from its
out-parameters.
Most callers who _only_ want "section.key" would just use a
strcmp(var, "section.key"), since there is no parsing
required. However, they may still use parse_config_key() if
their "section" variable isn't a constant (an example of
this is in parse_hide_refs_config).
Using the parse_config_key is a bit clunky, though:
const char *subsection;
int subsection_len;
const char *key;
if (!parse_config_key(var, section, &subsection, &subsection_len, &key) &&
!subsection) {
/* matched! */
}
Instead, let's treat a NULL subsection as an indication that
the caller does not expect one. That lets us write:
const char *key;
if (!parse_config_key(var, section, NULL, NULL, &key)) {
/* matched! */
}
Existing callers should be unaffected, as passing a NULL
subsection would currently segfault.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert each_loose_object_fn and each_packed_object_fn to take a pointer
to struct object_id. Update the various callbacks. Convert several
40-based constants to use GIT_SHA1_HEXSZ.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are places in the code where we would like to provide a struct
object_id *, yet read the hash directly from the pack. Provide an
nth_packed_object_oid function that is similar to the
nth_packed_object_sha1 function.
In order to avoid a potentially invalid cast, nth_packed_object_oid
provides a variable into which to store the value, which it returns on
success; on error, it returns NULL, as nth_packed_object_sha1 does.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a function, parse_oid_hex, which parses a hexadecimal object
ID and if successful, sets a pointer to just beyond the last character.
This allows for simpler, more robust parsing without needing to
hard-code integer values throughout the codebase.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "core.logAllRefUpdates" that used to be boolean has been
enhanced to take 'always' as well, to record ref updates to refs
other than the ones that are expected to be updated (i.e. branches,
remote-tracking branches and notes).
* cw/log-updates-for-all-refs-really:
doc: add note about ignoring '--no-create-reflog'
update-ref: add test cases for bare repository
refs: add option core.logAllRefUpdates = always
config: add markup to core.logAllRefUpdates doc
When a submodule "A", which has another submodule "B" nested within
it, is "absorbed" into the top-level superproject, the inner
submodule "B" used to be left in a strange state. The logic to
adjust the .git pointers in these submodules has been corrected.
* sb/submodule-recursive-absorb:
submodule absorbing: fix worktree/gitdir pointers recursively for non-moves
cache.h: expose the dying procedure for reading gitlinks
setup: add gentle version of resolve_git_dir
When core.logallrefupdates is true, we only create a new reflog for refs
that are under certain well-known hierarchies. The reason is that we
know that some hierarchies (like refs/tags) are not meant to change, and
that unknown hierarchies might not want reflogs at all (e.g., a
hypothetical refs/foo might be meant to change often and drop old
history immediately).
However, sometimes it is useful to override this decision and simply log
for all refs, because the safety and audit trail is more important than
the performance implications of keeping the log around.
This patch introduces a new "always" mode for the core.logallrefupdates
option which will log updates to everything under refs/, regardless
where in the hierarchy it is (we still will not log things like
ORIG_HEAD and FETCH_HEAD, which are known to be transient).
Based-on-patch-by: Jeff King <peff@peff.net>
Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function that returns a buffer containing the absolute path of its
argument and a semantic patch for its intended use. It avoids an extra
string copy to a static buffer.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later patch we want to react to only a subset of errors, defaulting
the rest to die as usual. Separate the block that takes care of dying
into its own function so we have easy access to it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This follows a93bedada (setup: add gentle version of read_gitfile,
2015-06-09), and assumes the same reasoning. resolve_git_dir is unsuited
for speculative calls, so we want to use the gentle version to find out
about potential errors.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do this by moving the existing documentation from
read-cache.c to cache.h.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up in the pathspec API.
* bw/pathspec-cleanup:
pathspec: rename prefix_pathspec to init_pathspec_item
pathspec: small readability changes
pathspec: create strip submodule slash helpers
pathspec: create parse_element_magic helper
pathspec: create parse_long_magic function
pathspec: create parse_short_magic function
pathspec: factor global magic into its own function
pathspec: simpler logic to prefix original pathspec elements
pathspec: always show mnemonic and name in unsupported_magic
pathspec: remove unused variable from unsupported_magic
pathspec: copy and free owned memory
pathspec: remove the deprecated get_pathspec function
ls-tree: convert show_recursive to use the pathspec struct interface
dir: convert fill_directory to use the pathspec struct interface
dir: remove struct path_simplify
mv: remove use of deprecated 'get_pathspec()'
"git grep" has been taught to optionally recurse into submodules.
* bw/grep-recurse-submodules:
grep: search history of moved submodules
grep: enable recurse-submodules to work on <tree> objects
grep: optionally recurse into submodules
grep: add submodules as a grep source type
submodules: load gitmodules file from commit sha1
submodules: add helper to determine if a submodule is initialized
submodules: add helper to determine if a submodule is populated
real_path: canonicalize directory separators in root parts
real_path: have callers use real_pathdup and strbuf_realpath
real_path: create real_pathdup
real_path: convert real_path_internal to strbuf_realpath
real_path: resolve symlinks by hand
It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.
However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.
The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.
We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().
Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve constness of the index_state parameter to the
'read_blob_data_from_index' function.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codeflow of setting NOATIME and CLOEXEC on file descriptors Git
opens has been simplified.
We may want to drop the tip one, but we'll see.
* jc/git-open-cloexec:
sha1_file: stop opening files with O_NOATIME
git_open_cloexec(): use fcntl(2) w/ FD_CLOEXEC fallback
git_open(): untangle possible NOATIME and CLOEXEC interactions
Now that all callers of the old 'get_pathspec' interface have been
migrated to use the new pathspec struct interface it can be removed
from the codebase.
Since there are no more users of the '_raw' field in the pathspec struct
it can also be removed. This patch also removes the old functionality
of modifying the const char **argv array that was passed into
parse_pathspec. Instead the constructed 'match' string (which is a
pathspec element with the prefix prepended) is only stored in its
corresponding pathspec_item entry.
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function that tries to create a file and any containing
directories in a way that is robust against races with other processes
that might be cleaning up empty directories at the same time.
The actual file creation is done by a callback function, which, if it
fails, should set errno to EISDIR or ENOENT according to the convention
of open(). raceproof_create_file() detects such failures, and
respectively either tries to delete empty directories that might be in
the way of the file or tries to create the containing directories. Then
it retries the callback function.
This function is not yet used.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The exit path for SCLD_EXISTS wasn't setting errno, which some callers
use to generate error messages for the user. Fix the problem and
document that the function sets errno correctly to help avoid similar
regressions in the future.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
teach submodules to load a '.gitmodules' file from a commit sha1. This
enables the population of the submodule_cache to be based on the state
of the '.gitmodules' file from a particular commit.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create real_pathdup which returns a caller owned string of the resolved
realpath based on the provide path.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the name of real_path_internal to strbuf_realpath. In addition
push the static strbuf up to its callers and instead take as a
parameter a pointer to a strbuf to use for the final result.
This change makes strbuf_realpath reentrant.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are three codepaths that use a variable whose name is
pack_compression_level to affect how objects and deltas sent to a
packfile is compressed. Unlike zlib_compression_level that controls
the loose object compression, however, this variable was static to
each of these codepaths. Two of them read the pack.compression
configuration variable, using core.compression as the default, and
one of them also allowed overriding it from the command line.
The other codepath in bulk-checkin did not pay any attention to the
configuration.
Unify the configuration parsing to git_default_config(), where we
implement the parsing of core.loosecompression and core.compression
and make the former override the latter, by moving code to parse
pack.compression and also allow core.compression to give default to
this variable.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects. But a few things have changed since then:
1. In June 2005, git learned about packfiles, which means
we would do a lot fewer atime updates (rather than one
per object access, we'd generally get one per packfile).
2. In late 2006, Linux learned about "relatime", which is
generally the default on modern installs. So
performance around atimes updates is a non-issue there
these days.
All the world isn't Linux, but as it turns out, Linux
is the only platform to implement O_NOATIME in the
first place.
So it's very unlikely that this code is helping anybody
these days.
Helped-by: Jeff King <peff@peff.net>
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git generally does not explicitly close file descriptors that were
open in the parent process when spawning a child process, but most
of the time the child does not want to access them. As Windows does
not allow removing or renaming a file that has a file descriptor
open, a slow-to-exit child can even break the parent process by
holding onto them. Use O_CLOEXEC flag to open files in various
codepaths.
* ls/git-open-cloexec:
read-cache: make sure file handles are not inherited by child processes
sha1_file: open window into packfiles with O_CLOEXEC
sha1_file: rename git_open_noatime() to git_open()