The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.
Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.
In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* jt/subprocess-handshake:
sub-process: refactor handshake to common function
Documentation: migrate sub-process docs to header
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
"git grep --recurse-submodules" has been reworked to give a more
consistent output across submodule boundary (and do its thing
without having to fork a separate process).
* bw/grep-recurse-submodules:
grep: recurse in-process using 'struct repository'
submodule: merge repo_read_gitmodules and gitmodules_config
submodule: check for unmerged .gitmodules outside of config parsing
submodule: check for unstaged .gitmodules outside of config parsing
submodule: remove fetch.recursesubmodules from submodule-config parsing
submodule: remove submodule.fetchjobs from submodule-config parsing
config: add config_from_gitmodules
cache.h: add GITMODULES_FILE macro
repository: have the_repository use the_index
repo_read_index: don't discard the index
read_info_alternates is not used from outside, so let's make it static.
We have to declare the function before link_alt_odb_entry instead of
moving the code around, link_alt_odb_entry calls read_info_alternates,
which in turn calls link_alt_odb_entry.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The filter-process interface learned to allow a process with long
latency give a "delayed" response.
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
Conversion from uchar[20] to struct object_id continues.
* bc/object-id:
sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ
sha1_name: convert GET_SHA1* flags to GET_OID*
sha1_name: convert get_sha1* to get_oid*
Convert remaining callers of get_sha1 to get_oid.
builtin/unpack-file: convert to struct object_id
bisect: convert bisect_checkout to struct object_id
builtin/update_ref: convert to struct object_id
sequencer: convert to struct object_id
remote: convert struct push_cas to struct object_id
submodule: convert submodule config lookup to use object_id
builtin/merge-tree: convert remaining caller of get_sha1 to object_id
builtin/fsck: convert remaining caller of get_sha1 to object_id
In 1a812f3a70 (hashcmp(): inline memcmp() by hand to
optimize, 2011-04-28), it was reported that an open-coded
loop outperformed memcmp() for comparing sha1s.
Discussion[1] a few years later in 2013 showed that this
depends on your libc's version of memcmp(). In particular,
glibc 2.13 optimized their memcmp around 2011. Here are
current timings with glibc 2.24 (best-of-five, on
linux.git):
[before this patch, open-coded]
$ time git rev-list --objects --all
real 0m35.357s
user 0m35.016s
sys 0m0.340s
[after this patch, memcmp]
real 0m32.930s
user 0m32.630s
sys 0m0.300s
Now that we've had 6 years for that version of glibc to
make its way onto people's machines, it's worth revisiting
our benchmarks and switching to memcmp().
It may be that there are other non-glibc systems where
memcmp() isn't as well optimized. But since our single data
point in favor of open-coding was on a now-ancient glibc, we
should probably assume the system memcmp is good unless
proven otherwise. We may end up with a SLOW_MEMCMP Makefile
knob, but we can hold off on that until we actually find
such a system in practice.
[1] https://public-inbox.org/git/20130318073229.GA5551@sigill.intra.peff.net/
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert grep to use 'struct repository' which enables recursing into
submodules to be handled in-process.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a macro to be used when specifying the '.gitmodules' file and
convert any existing hard coded '.gitmodules' file strings to use the
new macro.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We reject hostnames that start with a dash because they may
be confused for command-line options. Let's factor out that
notion into a helper function, as we'll use it in more
places. And while it's simple now, it's not clear if some
systems might need more complex logic to handle all cases.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ls/filter-process-delayed:
convert: add "status=delayed" to filter process protocol
convert: refactor capabilities negotiation
convert: move multiple file filter error handling to separate function
convert: put the flags field before the flag itself for consistent style
t0021: write "OUT <size>" only on success
t0021: make debug log file name configurable
t0021: keep filter log files on comparison
Convert the flags for get_oid_with_context and friends to use "OID"
instead of "SHA1" in their names.
This transform was made by running the following one-liner on the
affected files:
perl -pi -e 's/GET_SHA1/GET_OID/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Optimize "what are the object names already taken in an alternate
object database?" query that is used to derive the length of prefix
an object name is uniquely abbreviated to.
* rs/sha1-name-readdir-optim:
sha1_file: guard against invalid loose subdirectory numbers
sha1_file: let for_each_file_in_obj_subdir() handle subdir names
p4205: add perf test script for pretty log formats
sha1_name: cache readdir(3) results in find_short_object_filename()
Some `clean` / `smudge` filters may require a significant amount of
time to process a single blob (e.g. the Git LFS smudge filter might
perform network requests). During this process the Git checkout
operation is blocked and Git needs to wait until the filter is done to
continue with the checkout.
Teach the filter process protocol, introduced in edcc8581 ("convert: add
filter.<driver>.process option", 2016-10-16), to accept the status
"delayed" as response to a filter request. Upon this response Git
continues with the checkout operation. After the checkout operation Git
calls "finish_delayed_checkout" which queries the filter for remaining
blobs. If the filter is still working on the completion, then the filter
is expected to block. If the filter has completed all remaining blobs
then an empty response is expected.
Git has a multiple code paths that checkout a blob. Support delayed
checkouts only in `clone` (in unpack-trees.c) and `checkout` operations
for now. The optimization is most effective in these code paths as all
files of the tree are processed.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a new flags field to emit_diff_symbol, that will be used by
context lines for:
* white space rules that are applicable (The first 12 bits)
Take a note in cahe.c as well, when this ws rules are extended we have
to fix the bits in the flags field.
* how the rules are evaluated (actually this double encodes the sign
of the line, but the code is easier to keep this way, bits 13,14,15)
* if the line a blank line at EOF (bit 16)
The check if new lines need to be marked up as extra lines at the end of
file, is now done unconditionally. That should be ok, as
'new_blank_line_at_eof' has a quick early return.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve sha1_object_info_extended() by supporting additional
flags. This allows has_sha1_file_with_flags() to be modified to use
sha1_object_info_extended() in a subsequent patch.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Code clean-up.
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
Loose object subdirectories have hexadecimal names based on the first
byte of the hash of contained objects, thus their numerical
representation can range from 0 (0x00) to 255 (0xff). Change the type
of the corresponding variable in for_each_file_in_obj_subdir() and
associated callback functions to unsigned int and add a range check.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move all path related declarations from cache.h to a new path.h header
file. This makes cache.h smaller and makes it easier to add new path
related functions.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Migrate 'git_dir', 'git_common_dir', 'git_object_dir', 'git_index_file',
'git_graft_file', and 'namespace' to be stored in 'the_repository'.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under some circumstances (bogus GIT_DIR value or the discovered gitdir
is '.git') 'setup_git_directory()' won't initialize key repository
state. This leads to inconsistent state after running the setup code.
To account for this inconsistent state, lazy initialization is done once
a caller asks for the repository's gitdir or some other piece of
repository state. This is confusing and can be error prone.
Instead let's tighten the expected outcome of 'setup_git_directory()'
and ensure that it initializes repository state in all cases that would
have been handled by lazy initialization.
This also lets us drop the requirement to have 'have_git_dir()' check if
the environment variable GIT_DIR was set as that will be handled by the
end of the setup code.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bw/ls-files-sans-the-index:
ls-files: factor out tag calculation
ls-files: factor out debug info into a function
ls-files: convert show_files to take an index
ls-files: convert show_ce_entry to take an index
ls-files: convert prune_cache to take an index
ls-files: convert ce_excluded to take an index
ls-files: convert show_ru_info to take an index
ls-files: convert show_other_files to take an index
ls-files: convert show_killed_files to take an index
ls-files: convert write_eolinfo to take an index
ls-files: convert overlay_tree_on_cache to take an index
tree: convert read_tree to take an index parameter
convert: convert renormalize_buffer to take an index
convert: convert convert_to_git to take an index
convert: convert convert_to_git_filter_fd to take an index
convert: convert crlf_to_git to take an index
convert: convert get_cached_convert_stats_ascii to take an index
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
alias: use the early config machinery to expand aliases
t7006: demonstrate a problem with aliases in subdirectories
t1308: relax the test verifying that empty alias values are disallowed
help: use early config when autocorrecting aliases
config: report correct line number upon error
discover_git_directory(): avoid setting invalid git_dir
Read each loose object subdirectory at most once when looking for unique
abbreviated hashes. This speeds up commands like "git log --pretty=%h"
considerably, which previously caused one readdir(3) call for each
candidate, even for subdirectories that were visited before.
The new cache is kept until the program ends and never invalidated. The
same is already true for pack indexes. The inherent racy nature of
finding unique short hashes makes it still fit for this purpose -- a
conflicting new object may be added at any time. Tasks with higher
consistency requirements should not use it, though.
The cached object names are stored in an oid_array, which is quite
compact. The bitmap for remembering which subdir was already read is
stored as a char array, with one char per directory -- that's not quite
as compact, but really simple and incurs only an overhead equivalent to
11 hashes after all.
Suggested-by: Jeff King <peff@peff.net>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_object() and sha1_object_info_extended() both implement mechanisms
such as object replacement, retrying the packed store after failing to
find the object in the packed store then the loose store, and being able
to mark a packed object as bad and then retrying the whole process.
Consolidating these mechanisms would be a great help to maintainability.
Therefore, consolidate them by extending sha1_object_info_extended() to
support the functionality needed, and then modifying read_object() to
use sha1_object_info_extended().
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>