Deal with a few deprecation warning from cURL library.
* jk/curl-avoid-deprecated-api:
http: support CURLOPT_PROTOCOLS_STR
http: prefer CURLOPT_SEEKFUNCTION to CURLOPT_IOCTLFUNCTION
http-push: prefer CURLOPT_UPLOAD to CURLOPT_PUT
On platforms where `size_t` does not have the same width as `unsigned
long`, passing a pointer to the former when a pointer to the latter is
expected can lead to problems.
Windows and 32-bit Linux are among the affected platforms.
In this instance, we want to store the size of the blob that was read in
that variable. However, `read_blob_data_from_index()` passes that
pointer to `read_object_file()` which expects an `unsigned long *`.
Which means that on affected platforms, the variable is not fully
populated and part of its value is left uninitialized. (On Big-Endian
platforms, this problem would be even worse.)
The consequence is that depending on the uninitialized memory's
contents, we may erroneously reject perfectly fine attributes.
Let's address this by passing a pointer to a variable of the expected
data type.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The CURLOPT_PROTOCOLS (and matching CURLOPT_REDIR_PROTOCOLS) flag was
deprecated in curl 7.85.0, and using it generate compiler warnings as of
curl 7.87.0. The path forward is to use CURLOPT_PROTOCOLS_STR, but we
can't just do so unilaterally, as it was only introduced less than a
year ago in 7.85.0.
Until that version becomes ubiquitous, we have to either disable the
deprecation warning or conditionally use the "STR" variant on newer
versions of libcurl. This patch switches to the new variant, which is
nice for two reasons:
- we don't have to worry that silencing curl's deprecation warnings
might cause us to miss other more useful ones
- we'd eventually want to move to the new variant anyway, so this gets
us set up (albeit with some extra ugly boilerplate for the
conditional)
There are a lot of ways to split up the two cases. One way would be to
abstract the storage type (strbuf versus a long), how to append
(strbuf_addstr vs bitwise OR), how to initialize, which CURLOPT to use,
and so on. But the resulting code looks pretty magical:
GIT_CURL_PROTOCOL_TYPE allowed = GIT_CURL_PROTOCOL_TYPE_INIT;
if (...http is allowed...)
GIT_CURL_PROTOCOL_APPEND(&allowed, "http", CURLOPT_HTTP);
and you end up with more "#define GIT_CURL_PROTOCOL_TYPE" macros than
actual code.
On the other end of the spectrum, we could just implement two separate
functions, one that handles a string list and one that handles bits. But
then we end up repeating our list of protocols (http, https, ftp, ftp).
This patch takes the middle ground. The run-time code is always there to
handle both types, and we just choose which one to feed to curl.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The IOCTLFUNCTION option has been deprecated, and generates a compiler
warning in recent versions of curl. We can switch to using SEEKFUNCTION
instead. It was added in 2008 via curl 7.18.0; our INSTALL file already
indicates we require at least curl 7.19.4.
But there's one catch: curl says we should use CURL_SEEKFUNC_{OK,FAIL},
and those didn't arrive until 7.19.5. One workaround would be to use a
bare 0/1 here (or define our own macros). But let's just bump the
minimum required version to 7.19.5. That version is only a minor version
bump from our existing requirement, and is only a 2 month time bump for
versions that are almost 13 years old. So it's not likely that anybody
cares about the distinction.
Switching means we have to rewrite the ioctl functions into seek
functions. In some ways they are simpler (seeking is the only
operation), but in some ways more complex (the ioctl allowed only a full
rewind, but now we can seek to arbitrary offsets).
Curl will only ever use SEEK_SET (per their documentation), so I didn't
bother implementing anything else, since it would naturally be
completely untested. This seems unlikely to change, but I added an
assertion just in case.
Likewise, I doubt curl will ever try to seek outside of the buffer sizes
we've told it, but I erred on the defensive side here, rather than do an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two options do exactly the same thing, but the latter has been
deprecated and in recent versions of curl may produce a compiler
warning. Since the UPLOAD form is available everywhere (it was
introduced in the year 2000 by curl 7.1), we can just switch to it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On platforms where `size_t` does not have the same width as `unsigned
long`, passing a pointer to the former when a pointer to the latter is
expected can lead to problems.
Windows and 32-bit Linux are among the affected platforms.
In this instance, we want to store the size of the blob that was read in
that variable. However, `read_blob_data_from_index()` passes that
pointer to `read_object_file()` which expects an `unsigned long *`.
Which means that on affected platforms, the variable is not fully
populated and part of its value is left uninitialized. (On Big-Endian
platforms, this problem would be even worse.)
The consequence is that depending on the uninitialized memory's
contents, we may erroneously reject perfectly fine attributes.
Let's address this by passing a pointer to a variable of the expected
data type.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce an optional configuration to allow the trailing hash that
protects the index file from bit flipping.
* ds/omit-trailing-hash-in-index:
features: feature.manyFiles implies fast index writes
test-lib-functions: add helper for trailing hash
read-cache: add index.skipHash config option
hashfile: allow skipping the hash function
The logic to see if we are using the "cone" mode by checking the
sparsity patterns has been tightened to avoid mistaking a pattern
that names a single file as specifying a cone.
* ws/single-file-cone:
dir: check for single file cone patterns
"git diff --relative" did not mix well with "git diff --ext-diff",
which has been corrected.
* jk/ext-diff-with-relative:
diff: drop "name" parameter from prepare_temp_file()
diff: clean up external-diff argv setup
diff: use filespec path to set up tempfiles for ext-diff
Conditionally skip the pre-applypatch and applypatch-msg hooks when
applying patches with 'git am'.
* tr/am--no-verify:
am: allow passing --no-verify flag
Test fixes.
* es/t1509-root-fixes:
t1509: facilitate repeated script invocations
t1509: make "setup" test more robust
t1509: fix failing "root work tree" test due to owner-check
Commit b25562e63f (object-file: inline calls to read_object(),
2023-01-07) accidentally indented a conditional block with spaces
instead of a tab.
Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CI updates. We probably want a clean-up to move the long shell
script embedded in yaml file into a separate file, but that can
come later.
* cw/ci-whitespace:
ci (check-whitespace): move to actions/checkout@v3
ci (check-whitespace): add links to job output
ci (check-whitespace): suggest fixes for errors
Use `git diff --no-index` as a test_cmp on Windows.
We'd probably need to revisit "do we really want to, and have to,
lose CRLF vs LF?" later, at which time we may be able to further
clean this up by replacing "git diff --no-index" with "diff -u".
* js/drop-mingw-test-cmp:
tests(mingw): avoid very slow `mingw_test_cmp`
When the pack code was split into its own file[1], it got a copy of the
static read_object() function. But there's only one caller here, so we
could just inline it. And it's worth doing so, as the name read_object()
invites comparisons to the public read_object_file(), but the two don't
behave quite the same.
[1] The move happened over several commits, but the relevant one here is
f1d8130be0 (pack: move clear_delta_base_cache(), packed_object_info(),
unpack_entry(), 2017-08-18).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The only caller of read_object_file_extended() is the thin wrapper of
repo_read_object_file(). Instead of wrapping, let's just rename the
inner function and let people call it directly. This cleans up the
namespace and reduces confusion.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our sole caller always passes in "1", so we can just drop the parameter
entirely. Anybody who doesn't want this behavior could easily call
oid_object_info_extended() themselves, as we're just a thin wrapper
around it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The open_istream_incore() function is the only direct user of
read_object_file_extended(), and the only caller which unsets the
lookup_replace flag. Since read_object_file_extended() is now just a
thin wrapper around oid_object_info_extended(), let's inline the call.
That will let us simplify read_object_file_extended() in the next patch.
The inlined version here is a few more lines because of the query setup,
but it's much more flexible, since we can pass (or omit) any flags we
want.
Note the updated comment in the istream struct definition. It was
already slightly wrong (we never called read_object(); it has been
read_object_file_extended() since day one), but should now be accurate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since read_object() is these days just a thin wrapper around
oid_object_info_extended(), and since it only has two callers, let's
just inline those calls. This has a few positive outcomes:
- it's a net reduction in source code lines
- even though the callers end up with a few extra lines, they're now
more flexible and can use object_info flags directly. So no more
need to convert die_if_corrupt between parameter/flag, and we can
ask for lookup replacement with a flag rather than doing it
ourselves.
- there's one fewer function in an already crowded namespace (e.g.,
the difference between read_object() and read_object_file() was not
immediately obvious; now we only have one of them).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As with the previous patch, using skip_prefix() is more readable and
less error-prone than a raw strncmp(), because it avoids a
manually-computed length. These cases differ from the previous patch
that uses starts_with() because they care about the value after the
matched prefix.
We can convert these to use skip_prefix() by introducing an extra
variable to hold the out-pointer.
Note in the case in ws.c that to get rid of the magic number "9"
completely, we also switch out "len" for recomputing the pointer
difference. These are equivalent because "len" is always "ep - string".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's more readable to use starts_with() instead of strncmp() to match a
prefix, as the latter requires a manually-computed length, and has the
funny "matching is zero" return value common to cmp functions. This
patch converts several cases which were found with:
git grep 'strncmp(.*, [0-9]*)'
But note that it doesn't convert all such cases. There are several where
the magic length number is repeated elsewhere in the code, like:
/* handle "buf" which isn't NUL-terminated and might be too small */
if (len >= 3 && !strncmp(buf, "foo", 3))
or:
/* exact match for "foo", but within a larger string */
if (end - buf == 3 && !strncmp(buf, "foo", 3))
While it would not produce the wrong outcome to use starts_with() in
these cases, we'd still be left with one instance of "3". We're better
to leave them for now, as the repeated "3" makes it clear that the two
are linked (there may be other refactorings that handle both, but
they're out of scope for this patch).
A few things to note while reading the patch:
- all cases but one are trying to match, and so lose the extra "!".
The case in the first hunk of urlmatch.c is not-matching, and hence
gains a "!".
- the case in remote-fd.c is matching the beginning of "connect foo",
but we never look at str+8 to parse the "foo" part (which would make
this a candidate for skip_prefix(), not starts_with()). This seems
at first glance like a bug, but is a limitation of how remote-fd
works.
- the second hunk in urlmatch.c shows some cases adjacent to other
strncmp() calls that are left. These are of the "exact match within
a larger string" type, as described above.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix typos in code comments which repeat various words. Most of the
cases are simple in that they repeat a word that usually cannot be
repeated in a grammatically correct sentence. Just remove the
incorrectly duplicated word in these cases and rewrap text, if needed.
A tricky case is usage of "that that", which is sometimes grammatically
correct. However, an instance of this in "t7527-builtin-fsmonitor.sh"
doesn't need two words "that", because there is only one daemon being
discussed, so replace the second "that" with "the".
Reword code comment "entries exist on on-disk index" in function
update_one in file cache-tree.c, by replacing incorrect preposition "on"
with "in".
Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The recent addition of the index.skipHash config option allows index
writes to speed up by skipping the hash computation for the trailing
checksum. This is particularly critical for repositories with many files
at HEAD, so add this config option to two cases where users in that
scenario may opt-in to such behavior:
1. The feature.manyFiles config option enables some options that are
helpful for repositories with many files at HEAD.
2. 'scalar register' and 'scalar reconfigure' set config options that
optimize for large repositories.
In both of these cases, set index.skipHash=true to gain this
speedup. Add tests that demonstrate the proper way that
index.skipHash=true can override feature.manyFiles=true.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful to check that a file format with a trailing hash has a
specific hash in the final bytes of a written file. This is made more
apparent by recent changes that allow skipping the hash algorithm and
writing a null hash at the end of the file instead.
Add a new test_trailing_hash helper and use it in t1600 to verify that
index.skipHash=true really does skip the hash computation, since
'git fsck' does not actually verify the hash. This confirms that when
the config is disabled explicitly in a super project but enabled in a
submodule, then the use of repo_config_get_bool() loads config from the
correct repository in the case of 'git add'. There are other cases where
istate->repo is NULL and thus this config is loaded instead from
the_repository, but that's due to many different code paths initializing
index_state structs in their own way.
Keep the 'git fsck' call to ensure that any potential future change to
check the index hash does not cause an error in this case.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous change allowed skipping the hashing portion of the
hashwrite API, using it instead as a buffered write API. Disabling the
hashwrite can be particularly helpful when the write operation is in a
critical path.
One such critical path is the writing of the index. This operation is so
critical that the sparse index was created specifically to reduce the
size of the index to make these writes (and reads) faster.
This trade-off between file stability at rest and write-time performance
is not easy to balance. The index is an interesting case for a couple
reasons:
1. Writes block users. Writing the index takes place in many user-
blocking foreground operations. The speed improvement directly
impacts their use. Other file formats are typically written in the
background (commit-graph, multi-pack-index) or are super-critical to
correctness (pack-files).
2. Index files are short lived. It is rare that a user leaves an index
for a long time with many staged changes. Outside of staged changes,
the index can be completely destroyed and rewritten with minimal
impact to the user.
Following a similar approach to one used in the microsoft/git fork [1],
add a new config option (index.skipHash) that allows disabling this
hashing during the index write. The cost is that we can no longer
validate the contents for corruption-at-rest using the trailing hash.
[1] 21fed2d914
We load this config from the repository config given by istate->repo,
with a fallback to the_repository if it is not set.
While older Git versions will not recognize the null hash as a special
case, the file format itself is still being met in terms of its
structure. Using this null hash will still allow Git operations to
function across older versions.
The one exception is 'git fsck' which checks the hash of the index file.
This used to be a check on every index read, but was split out to just
the index in a33fc72fe9 (read-cache: force_verify_index_checksum,
2017-04-14) and released first in Git 2.13.0. Document the versions that
relaxed these restrictions, with the optimistic expectation that this
change will be included in Git 2.40.0.
Here, we disable this check if the trailing hash is all zeroes. We add a
warning to the config option that this may cause undesirable behavior
with older Git versions.
As a quick comparison, I tested 'git update-index --force-write' with
and without index.skipHash=true on a copy of the Linux kernel
repository.
Benchmark 1: with hash
Time (mean ± σ): 46.3 ms ± 13.8 ms [User: 34.3 ms, System: 11.9 ms]
Range (min … max): 34.3 ms … 79.1 ms 82 runs
Benchmark 2: without hash
Time (mean ± σ): 26.0 ms ± 7.9 ms [User: 11.8 ms, System: 14.2 ms]
Range (min … max): 16.3 ms … 42.0 ms 69 runs
Summary
'without hash' ran
1.78 ± 0.76 times faster than 'with hash'
These performance benefits are substantial enough to allow users the
ability to opt-in to this feature, even with the potential confusion
with older 'git fsck' versions.
Test this new config option, both at a command-line level and within a
submodule. The confirmation is currently limited to confirm that 'git
fsck' does not complain about the index. Future updates will make this
test more robust.
It is critical that this test is placed before the test_index_version
tests, since those tests obliterate the .git/config file and hence lose
the setting from GIT_TEST_DEFAULT_HASH, if set.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashfile API is useful for generating files that include a trailing
hash of the file's contents up to that point. Using such a hash is
helpful for verifying the file for corruption-at-rest, such as a faulty
drive causing flipped bits.
Git's index file includes this trailing hash, so it uses a 'struct
hashfile' to handle the I/O to the file. This was very convenient to
allow using the hashfile methods during these operations.
However, hashing the file contents during write comes at a performance
penalty. It's slower to hash the bytes on their way to the disk than
without that step. This problem is made worse by the replacement of
hardware-accelerated SHA1 computations with the software-based sha1dc
computation.
This write cost is significant, and the checksum capability is likely
not worth that cost for such a short-lived file. The index is rewritten
frequently and the only time the checksum is checked is during 'git
fsck'. Thus, it would be helpful to allow a user to opt-out of the hash
computation.
We first need to allow Git to opt-out of the hash computation in the
hashfile API. The buffered writes of the API are still helpful, so it
makes sense to make the change here.
Introduce a new 'skip_hash' option to 'struct hashfile'. When set, the
update_fn and final_fn members of the_hash_algo are skipped. When
finalizing the hashfile, the trailing hash is replaced with the null
hash.
This use of a trailing null hash would be desireable in either case,
since we do not want to special case a file format to have a different
length depending on whether it was hashed or not. When the final bytes
of a file are all zero, we can infer that it was written without
hashing, and thus that verification is not available as a check for file
consistency. This also means that we could easily toggle hashing for any
file format we desire.
A version of this patch has existed in the microsoft/git fork since
2017 [1] (the linked commit was rebased in 2018, but the original dates
back to January 2017). Here, the change to make the index use this fast
path is delayed until a later change.
[1] 21fed2d914
Co-authored-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prepare_temp_file() function takes a diff_filespec as well as a
filename. But it is almost certainly an error to pass in a name that
isn't the filespec's "path" parameter, since that is the only thing that
reliably tells us how to find the content (and indeed, this was the
source of a recently-fixed bug).
So let's drop the redundant "name" parameter and just use one->path
throughout the function. This simplifies the interface a little bit, and
makes it impossible for calling code to get it wrong.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the previous commit, setting up the tempfile for an external diff
uses df->path from the diff_filespec, rather than the logical name. This
means add_external_diff_name() does not need to take a "name" parameter
at all, and we can drop it. And that in turn lets us simplify the
conditional for handling renames (when the "other" name is non-NULL).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>