In 961b130d20 (branch: add --recurse-submodules option for branch
creation, 2022-01-28), a funny pattern was introduced where first some
struct is `xmalloc()`ed, then we resize an array whose element type is
the same struct, and then the first struct's contents are copied into
the last element of that array.
Crucially, the `xmalloc()`ed memory never gets released.
Let's avoid that memory leak and that memory allocation dance altogether
by first reallocating the array, then using a pointer to the last array
element to go forward.
Reported by Coverity.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts 080ab56a46 (cache-tree: implement cache_tree_find_path(),
2022-05-23). The cache_tree_find_path() method was never actually called
in the topic that added it. I cannot find any reference to it in any of
my forks, so this appears to not be needed at the moment.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_mtimes_file() takes an mtimes parameter as its first option, but
the only caller passes a NULL constant. Drop this parameter to simplify
logic. This can be reverted if that parameter is needed in the future.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace a 'git repack --cruft -d' with the wrapper 'git gc --cruft' to
exercise some logic in builtin/gc.c that adds the '--cruft' option to
the underlying 'git repack' command.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The '--verbose' option reports what is being added and removed from the
index, but has not been tested up to this point. Augment the tests in
t2107 to check the '--verbose' option in some scenarios.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 5dccd9155f (t/perf: add iteration setup mechanism to perf-lib,
2022-04-04) modified the parameter parsing of test_wrapper() such that
the test title was no longer in $1, and is instead in $test_title_.
We correctly pass the new variable to the code which outputs the title
to the log, but missed the spot in test_wrapper() where the title is
written to the ".descr" file which is used to produce the final output
table. As a result, all of the titles are missing from that table (or
worse, using whatever was left in $1):
$ ./p0000-perf-lib-sanity.sh
[...]
Test this tree
------------------------------
0000.1: 0.01(0.01+0.00)
0000.2: 0.01(0.00+0.01)
0000.4: 0.00(0.00+0.00)
0000.5: true 0.00(0.00+0.00)
0000.7: 0.00(0.00+0.00)
0000.8: 0.00(0.00+0.00)
After this patch, we get the pre-5dccd9155f output:
Test this tree
--------------------------------------------------------------------------
0000.1: test_perf_default_repo works 0.00(0.00+0.00)
0000.2: test_checkout_worktree works 0.01(0.00+0.01)
0000.4: export a weird var 0.00(0.00+0.00)
0000.5: éḿíẗ ńöń-ÁŚĆÍÍ ćḧáŕáćẗéŕś 0.00(0.00+0.00)
0000.7: important variables available in subshells 0.00(0.00+0.00)
0000.8: test-lib-functions correctly loaded in subshells 0.00(0.00+0.00)
Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Bitmap file has a trailing checksum at the end of the file. However
there is no information in the bitmap-format documentation about it.
Add a trailer section to include the trailing checksum info in the
`Documentation/technical/bitmap-format.txt` file.
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The asciidoc generated html for `Documentation/technical/bitmap-
format.txt` is broken. This is mainly because `-` is used for nested
lists (which is not allowed in asciidoc) instead of `*`.
Fix these and also reformat it for better readability of the html page.
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Documentation/Makefile does not include bitmap-format.txt to generate
a html page using asciidoc.
Teach Documentation/Makefile to also generate a html page for
Documentation/technical/bitmap-format.txt file.
Signed-off-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we generate the list of promisor objects, we walk every pack with a
.promisor file and examine its objects for any links to other objects.
By default, for_each_packed_object() will go in pack .idx order.
This is the worst case with respect to our delta base cache. If we have
a delta chain of A->B->C->D, then visiting A may require reconstructing
both B and C, unless we also visited B recently, in which case we may
have cached its value. Because .idx order is based on sha1, it's random
with respect to the actual object contents and deltas, and thus we're
unlikely to get many cache hits.
If we instead traverse in pack order, then we get the optimal case:
packs are written to keep delta families together, and to place bases
before their children.
Even on a modest repository like git.git, this has a noticeable speedup
on p5600.4, which runs "fsck" on a partial clone with blob:none (so lots
of trees which need to be walked, and which delta well):
Test HEAD^ HEAD
-------------------------------------------------------
5600.4: 17.87(17.83+0.04) 15.42(15.35+0.06) -13.7%
On a larger repository like linux.git, the speedup is even more
pronounced:
Test HEAD^ HEAD
-----------------------------------------------------------
5600.4: 322.47(322.01+0.42) 186.41(185.76+0.63) -42.2%
Any other operations that call is_promisor_object(), like "rev-list
--exclude-promisor-objects", would similarly benefit, but the
invocations in p5600 don't actually trigger any such cases.
Note that we may pay a small price to build a rev-index in-memory to do
the pack-order traversal. But it's still a big net win, and even that
small cost goes away if you are using pack.writeReverseIndex.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fix more translation mistakes
* consistently translate "amend" as "enmendar"
* consistently translate "chunk" as "fragmento"
* consistently translate "prune" as "recortar" or "recorte"
* consistently translate "push" as "empujar" or "empuje"
* consistently translate "rephrase" as "refrasear" or "refraseo"
* consistently translate "squash" as "aplastar" or "aplastamiento"
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Various error messages that talk about the removal of
"--preserve-merges" in "rebase" have been strengthened, and "rebase
--abort" learned to get out of a state that was left by an earlier
use of the option.
* po/rebase-preserve-merges:
rebase: translate a die(preserve-merges) message
rebase: note `preserve` merges may be a pull config option
rebase: help users when dying with `preserve-merges`
rebase.c: state preserve-merges has been removed
"git revert" learns "--reference" option to use more human-readable
reference to the commit it reverts in the message template it
prepares for the user.
* jc/revert-show-parent-info:
revert: --reference should apply only to 'revert', not 'cherry-pick'
revert: optionally refer to commit in the "reference" format
Drop the dependency on gzip(1) and use our internal implementation to
create tar.gz and tgz files.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gzip(1) encodes the OS it runs on in the 10th byte of its output. It
uses the following OS_CODE values according to its tailor.h [1]:
0 - MS-DOS
3 - UNIX
5 - Atari ST
6 - OS/2
10 - TOPS-20
11 - Windows NT
The gzip.exe that comes with Git for Windows uses OS_CODE 3 for some
reason, so this value is used on practically all supported platforms
when generating tgz archives using gzip(1).
Zlib uses a bigger set of values according to its zutil.h [2], aligned
with section 4.4.2 of the ZIP specification, APPNOTE.txt [3]:
0 - MS-DOS
1 - Amiga
3 - UNIX
4 - VM/CMS
5 - Atari ST
6 - OS/2
7 - Macintosh
8 - Z-System
10 - Windows NT
11 - MVS (OS/390 - Z/OS)
13 - Acorn Risc
16 - BeOS
18 - OS/400
19 - OS X (Darwin)
Thus the internal gzip implementation in archive-tar.c sets different
OS_CODE header values on major platforms Windows and macOS. Git for
Windows uses its own zlib-based variant since v2.20.1 by default and
thus embeds OS_CODE 10 in tgz archives.
The tar archive for a commit is generated consistently on all systems
(by the same Git version). The OS_CODE in the gzip header does not
influence extraction. Avoid leaking OS information and make tgz
archives constistent and reproducable (with the same Git and libz
versions) by using OS_CODE 3 everywhere.
At least on macOS 12.4 this produces the same output as gzip(1) for the
examples I tried:
# before
$ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 | shasum
3abbffb40b7c63cf9b7d91afc682f11682f80759 -
# with this patch
$ git -c tar.tgz.command='git archive gzip' archive --format=tgz v2.36.0 | shasum
dc6dc6ba9636d522799085d0d77ab6a110bcc141 -
$ git archive --format=tar v2.36.0 | gzip -cn | shasum
dc6dc6ba9636d522799085d0d77ab6a110bcc141 -
[1] https://git.savannah.gnu.org/cgit/gzip.git/tree/tailor.h
[2] https://github.com/madler/zlib/blob/master/zutil.h
[3] https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git uses zlib for its own object store, but calls gzip when creating tgz
archives. Add an option to perform the gzip compression for the latter
using zlib, without depending on the external gzip binary.
Plug it in by making write_block a function pointer and switching to a
compressing variant if the filter command has the magic value "git
archive gzip". Does that indirection slow down tar creation? Not
really, at least not in this test:
$ hyperfine -w3 -L rev HEAD,origin/main -p 'git checkout {rev} && make' \
'./git -C ../linux archive --format=tar HEAD # {rev}'
Benchmark #1: ./git -C ../linux archive --format=tar HEAD # HEAD
Time (mean ± σ): 4.044 s ± 0.007 s [User: 3.901 s, System: 0.137 s]
Range (min … max): 4.038 s … 4.059 s 10 runs
Benchmark #2: ./git -C ../linux archive --format=tar HEAD # origin/main
Time (mean ± σ): 4.047 s ± 0.009 s [User: 3.903 s, System: 0.138 s]
Range (min … max): 4.038 s … 4.066 s 10 runs
How does tgz creation perform?
$ hyperfine -w3 -L command 'gzip -cn','git archive gzip' \
'./git -c tar.tgz.command="{command}" -C ../linux archive --format=tgz HEAD'
Benchmark #1: ./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 20.404 s ± 0.006 s [User: 23.943 s, System: 0.401 s]
Range (min … max): 20.395 s … 20.414 s 10 runs
Benchmark #2: ./git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD
Time (mean ± σ): 23.807 s ± 0.023 s [User: 23.655 s, System: 0.145 s]
Range (min … max): 23.782 s … 23.857 s 10 runs
Summary
'./git -c tar.tgz.command="gzip -cn" -C ../linux archive --format=tgz HEAD' ran
1.17 ± 0.00 times faster than './git -c tar.tgz.command="git archive gzip" -C ../linux archive --format=tgz HEAD'
So the internal implementation takes 17% longer on the Linux repo, but
uses 2% less CPU time. That's because the external gzip can run in
parallel on its own processor, while the internal one works sequentially
and avoids the inter-process communication overhead.
What are the benefits? Only an internal sequential implementation can
offer this eco mode, and it allows avoiding the gzip(1) requirement.
This implementation uses the helper functions from our zlib.c instead of
the convenient gz* functions from zlib, because the latter doesn't give
the control over the generated gzip header that the next patch requires.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All tar archive writes have the same size and are done to the same file
descriptor. Move them to a common function, write_block(), to reduce
code duplication and make it easy to change the destination.
Original-patch-by: Rohit Ashiwal <rohit.ashiwal265@gmail.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The void pointer "data" in struct archiver is only used to store filter
commands to pass tar archives to, like gzip. Rename it accordingly and
also turn it into a char pointer to document the fact that it's a string
reference.
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mention all formats in the --format section, use backtick quoting for
literal values throughout, clarify the description of the configuration
option.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add and use a LIBCURL prerequisite for tests added in
6dcbdc0d66 (remote: create fetch.credentialsInUrl config,
2022-06-06).
These tests would get as far as emitting a couple of the warnings we
were testing for, but would then die as we had no "git-remote-https"
program compiled.
It would be more consistent with other prerequisites (e.g. PERL for
NO_PERL) to name this "CURL", but since e9184b0789 (t5561: skip tests
if curl is not available, 2018-04-03) we've had that prerequisite
defined for checking of we have the curl(1) program.
The existing "CURL" prerequisite is only used in one place, and we
should probably name it "CURL_PROGRAM", then rename "LIBCURL" to
"CURL" as a follow-up, but for now (pre-v2.37.0) let's aim for the
most minimal fix possible.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was found during l10n process by Jiang Xin.
Reported-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Fangyi Zhou <me@fangyi.io>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the "fetch.credentialsInUrl" configuration variable introduced
in 6dcbdc0d66 (remote: create fetch.credentialsInUrl config,
2022-06-06) to "transfer".
There are existing exceptions, but generally speaking the
"<namespace>.<var>" configuration should only apply to command
described in the "namespace" (and its sub-commands, so e.g. "clone.*"
or "fetch.*" might also configure "git-remote-https").
But in the case of "fetch.credentialsInUrl" we've got a configuration
variable that configures the behavior of all of "clone", "push" and
"fetch", someone adjusting "fetch.*" configuration won't expect to
have the behavior of "git push" altered, especially as we have the
pre-existing "{transfer,fetch,receive}.fsckObjects", which configures
different parts of the transfer dialog.
So let's move this configuration variable to the "transfer" namespace
before it's exposed in a release. We could add all of
"{transfer,fetch,pull}.credentialsInUrl" at some other time, but once
we have "fetch" configure "pull" such an arrangement would would be a
confusing mess, as we'd at least need to have "fetch" configure
"push" (but not the other way around), or change existing behavior.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Amend the documentation and release notes entry for the
"fetch.credentialsInUrl" feature added in 6dcbdc0d66 (remote: create
fetch.credentialsInUrl config, 2022-06-06), it currently doesn't
detect passwords in `remote.<name>.pushurl` configuration. We
shouldn't lull users into a false sense of security, so we need to
mention that prominently.
This also elaborates and clarifies the "exposes the password in
multiple ways" part of the documentation. As noted in [1] a user
unfamiliar with git's implementation won't know what to make of that
scary claim, e.g. git hypothetically have novel git-specific ways of
exposing configured credentials.
The reality is that this configuration is intended as an aid for users
who can't fully trust their OS's or system's security model, so lets
say that's what this is intended for, and mention the most common ways
passwords stored in configuration might inadvertently get exposed.
1. https://lore.kernel.org/git/220524.86ilpuvcqh.gmgdl@evledraar.gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The branch_checked_out() method populates a strmap linking a refname to
a worktree that has that branch checked out. While unlikely, it is
possible that a bug or filesystem manipulation could create a scenario
where the same ref is checked out in multiple places. Further, there are
some states in an interactive rebase where HEAD and REBASE_HEAD point to
the same ref, leading to multiple insertions into the strmap. In either
case, the strmap_put() method returns the old value which is leaked.
Update branch_checked_out() to consume that pointer and free it.
Add a test in t2407 that checks this erroneous case. The test "checks
itself" by first confirming that the filesystem manipulations it makes
trigger the branch_checked_out() logic, and then sets up similar
manipulations to make it look like there are multiple worktrees pointing
to the same ref.
While TEST_PASSES_SANITIZE_LEAK would be helpful to demonstrate the
leakage and prevent it in the future, t2407 uses helpers such as 'git
clone' that cause the test to fail under that mode.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the last current use of find_shared_symref() that can easily be
replaced by branch_checked_out(). The benefit of this switch is that the
code is a bit simpler, but also it is faster on repeated calls.
The remaining uses of find_shared_symref() are non-trivial to remove, so
we probably should not continue in that direction:
* builtin/notes.c uses find_shared_symref() with "NOTES_MERGE_REF"
instead of "HEAD", so it doesn't have an immediate analogue with
branch_checked_out(). Perhaps we should consider extending it to
include that symref in addition to HEAD, BISECT_HEAD, and
REBASE_HEAD.
* receive-pack.c checks to see if a worktree has a checkout for the ref
that is being updated. The tricky part is that it can actually decide
to update the worktree directly instead of just skipping the update.
This all depends on the receive.denyCurrentBranch config option. The
implementation currenty cares about receiving the worktree in the
result, so the current branch_checked_out() prototype is insufficient
currently. This is something to investigate later, though, since a
large number of refs could be updated at the same time and using the
strmap implementation of branch_checked_out() could be beneficial.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching refs from a remote, it is possible that the refspec will
cause use to overwrite a ref that is checked out in a worktree. The
existing logic in builtin/fetch.c uses a possibly-slow mechanism. Update
those sections to use the new, more efficient branch_checked_out()
helper.
These uses were not previously tested, so add a test case that can be
used for these kinds of collisions. There is only one test now, but more
tests will be added as other consumers of branch_checked_out() are
added.
Note that there are two uses in builtin/fetch.c, but only one of the
messages is tested. This is because the tested check is run before
completing the fetch, and the untested check is not reachable without
concurrent updates to the filesystem. Thus, it is beneficial to keep
that extra check for the sake of defense-in-depth. However, we should
not attempt to test the check, as the effort required is too
complicated to be worth the effort. This use in update_local_ref()
also requires a change in the error message because we no longer have
access to the worktree struct, only the path of the worktree. This error
is so rare that making a distinction between the two is not critical.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The branch_checked_out() helper was added by the previous change, but it
used an over-simplified view to check if a branch is checked out. It
only focused on the HEAD symref, but ignored whether a bisect or rebase
was happening.
Teach branch_checked_out() to check for these things, and also add tests
to ensure that we do not lose this functionality in the future.
Now that this test coverage exists, we can safely refactor
validate_new_branchname() to use branch_checked_out().
Note that we need to prepend "refs/heads/" to the 'state.branch' after
calling wt_status_check_*(). We also need to duplicate wt->path so the
value is not freed at the end of the call.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The validate_new_branchname() method contains a check to see if a branch
is checked out in any non-bare worktree. This is intended to prevent a
force push that will mess up an existing checkout. This helper is not
suitable to performing just that check, because the method will die()
when the branch is checked out instead of returning an error code.
Create a new branch_checked_out() helper that performs the most basic
form of this check. To ensure we can call branch_checked_out() in a loop
with good performance, do a single preparation step that iterates over
all worktrees and stores their current HEAD branches in a strmap. The
branch_checked_out() helper can then discover these branches using a
hash lookup.
This helper is currently missing some key functionality. Namely: it
doesn't look for active rebases or bisects which mean that the branch is
"checked out" even though HEAD doesn't point to that ref. This
functionality will be added in a coming change.
We could use branch_checked_out() in validate_new_branchname(), but this
missing functionality would be a regression. However, we have no tests
that cover this case!
Add a new test script that will be expanded with these cross-worktree
ref updates. The current tests would still pass if we refactored
validate_new_branchname() to use this version of branch_checked_out().
The next change will fix that functionality and add the proper test
coverage.
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix an issue that existed before 0527ccb1b5 (add -i: default to the
built-in implementation, 2021-11-30), but which became the default
with that change, we should not be marking tests that are known to
pass as "TODO" tests.
When GIT_TEST_ADD_I_USE_BUILTIN=1 was made the default we started
passing the tests added in 0f0fba2cc8 (t3701: add a test for advanced
split-hunk editing, 2019-12-06) and 1bf01040f0 (add -p: demonstrate
failure when running 'edit' after a split, 2015-04-16).
Thus we've been emitting this sort of output:
$ prove ./t3701-add-interactive.sh
./t3701-add-interactive.sh .. ok
All tests successful.
Test Summary Report
-------------------
./t3701-add-interactive.sh (Wstat: 0 Tests: 70 Failed: 0)
TODO passed: 45, 47
Files=1, Tests=70, 2 wallclock secs ( 0.03 usr 0.00 sys + 0.86 cusr 0.33 csys = 1.22 CPU)
Result: PASS
Which isn't just cosmetic, but due to issues with
test_expect_failure (see [1]) we could e.g. be hiding something as bad
as a segfault in the new implementation. It makes sense catch that,
especially before we put out a release with the built-in "add -i", so
let's generalize the check we were already doing in 0527ccb1b5 with a
new "ADD_I_USE_BUILTIN" prerequisite.
1. https://lore.kernel.org/git/patch-1.7-4624abc2591-20220318T002951Z-avarab@gmail.com/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to log an error return from wait_or_whine() as process
termination of the waited child, which was incorrect.
* js/wait-or-whine-can-fail:
run-command: don't spam trace2_child_exit()
Sample watchman interface hook sometimes failed to produce
correctly formatted JSON message, which has been corrected.
* sn/fsmonitor-missing-clock:
fsmonitor: query watchman with right valid json
Remove redundant copying (with index v3 and older) or possible
over-reading beyond end of mmapped memory (with index v4) has been
corrected.
* zh/read-cache-copy-name-entry-fix:
read-cache.c: reduce unnecessary cache entry name copying
"git show-ref --heads" (and "--tags") still iterated over all the
refs only to discard refs outside the specified area, which has
been corrected.
* tb/show-ref-optim:
builtin/show-ref.c: avoid over-iterating with --heads, --tags
The "fetch.credentialsInUrl" configuration variable controls what
happens when a URL with embedded login credential is used.
* ds/credentials-in-url:
remote: create fetch.credentialsInUrl config
Updating the graft information invalidates the list of parents of
in-core commit objects that used to be in the graft file.
* jt/unparse-commit-upon-graft-change:
commit,shallow: unparse commits if grafts changed
In Git 2.36 we revamped the way how hooks are invoked. One change
that is end-user visible is that the output of a hook is no longer
directly connected to the standard output of "git" that spawns the
hook, which was noticed post release. This is getting corrected.
* ab/hooks-regression-fix:
hook API: fix v2.36.0 regression: hooks should be connected to a TTY
run-command: add an "ungroup" option to run_process_parallel()
"git -c diff.submodule=log range-diff" did not show anything for
submodules that changed in the ranges being compared, and
"git -c diff.submodule=diff range-diff" did not work correctly.
Fix this by including the "--submodule=short" output
unconditionally to be compared.
* pb/range-diff-with-submodule:
range-diff: show submodule changes irrespective of diff.submodule
The two examples in the doc for 'git diff-index' were not updated when
the raw output format was changed in 81e50eabf0 ([PATCH] The diff-raw
format updates., 2005-05-21) (first example) and in b6d8f309d9 ([PATCH]
diff-raw format update take #2., 2005-05-23) and 7cb6ac1e4b (diff:
diff_aligned_abbrev: remove ellipsis after abbreviated SHA-1 value,
2017-12-03) (second example).
Update the output, inventing some characters to complete the source
hash in the second example. Also correct the destination mode in the
second example, which was wrongly '100664' since the addition of the
example in c64b9b8860 (Reference documentation for the core git
commands., 2005-05-05).
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Near the end of the "Raw output format" section, an example shows the
output of 'git diff-files' for a tracked file modified on disk but not
yet added to the index. However the wording is:
<sha1> is shown as all 0's if a file is new on the filesystem
and it is out of sync with the index.
which is confusing since it can be understood to mean that 'file' is a
new, yet untracked file, in which case 'git diff-files' does not care
about it at all.
When this example was introduced all the way back in c64b9b8860
(Reference documentation for the core git commands., 2005-05-05), 'old'
and 'new' referred to the two entities being compared, depending on the
command being used (diff-index, diff-tree or diff-files - which at the
time were diff-cache, diff-tree and show-diff). The wording used at the
time was:
<new-sha1> is shown as all 0's if new is a file on the
filesystem and it is out of sync with the cache.
This section was reworked in 81e50eabf0 ([PATCH] The diff-raw
format updates., 2005-05-21) and the mention of the meaning of 'new' and
'old' was removed. Then in f73ae1fc5d (Some typos and light editing of
various manpages, 2005-10-05), the wording was changed to what it is
now.
In addition, in b6d8f309d9 ([PATCH] diff-raw format update take #2.,
2005-05-23), the section was further reworked and did not use '<sha1>'
anymore, making the example the sole user of this token.
Rework the introductory sentence of the example to instead refer to
'sha1 for "dst"', which is what the text description above it uses, and
fix the wording so that we do not mention a "new file".
While at it, also tweak the wording used in the description of the raw
format to explicitely state that all 0's are used for the destination
hash if the working tree is out of sync with the index, instead of the
more vague "look at worktree".
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"dst" can legitimately be "0\{40\}" for a creation patch, e.g. when
the stat information is stale, but it falls into "look at work tree"
case. The original description in b6d8f309 ([PATCH] diff-raw format
update take #2., 2005-05-23) forgot that deletion also makes the
"dst" 0* SHA-1.
Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make use of the stream_loose_object() function introduced in the
preceding commit to unpack large objects. Before this we'd need to
malloc() the size of the blob before unpacking it, which could cause
OOM with very large blobs.
We could use the new streaming interface to unpack all blobs, but
doing so would be much slower, as demonstrated e.g. with this
benchmark using git-hyperfine[0]:
rm -rf /tmp/scalar.git &&
git clone --bare https://github.com/Microsoft/scalar.git /tmp/scalar.git &&
mv /tmp/scalar.git/objects/pack/*.pack /tmp/scalar.git/my.pack &&
git hyperfine \
-r 2 --warmup 1 \
-L rev origin/master,HEAD -L v "10,512,1k,1m" \
-s 'make' \
-p 'git init --bare dest.git' \
-c 'rm -rf dest.git' \
'./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/scalar.git/my.pack'
Here we'll perform worse with lower core.bigFileThreshold settings
with this change in terms of speed, but we're getting lower memory use
in return:
Summary
'./git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master' ran
1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.01 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.01 ± 0.02 times faster than './git -C dest.git -c core.bigFileThreshold=1m unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.02 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'origin/master'
1.09 ± 0.01 times faster than './git -C dest.git -c core.bigFileThreshold=1k unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.10 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
1.11 ± 0.00 times faster than './git -C dest.git -c core.bigFileThreshold=10 unpack-objects </tmp/scalar.git/my.pack' in 'HEAD'
A better benchmark to demonstrate the benefits of that this one, which
creates an artificial repo with a 1, 25, 50, 75 and 100MB blob:
rm -rf /tmp/repo &&
git init /tmp/repo &&
(
cd /tmp/repo &&
for i in 1 25 50 75 100
do
dd if=/dev/urandom of=blob.$i count=$(($i*1024)) bs=1024
done &&
git add blob.* &&
git commit -mblobs &&
git gc &&
PACK=$(echo .git/objects/pack/pack-*.pack) &&
cp "$PACK" my.pack
) &&
git hyperfine \
--show-output \
-L rev origin/master,HEAD -L v "512,50m,100m" \
-s 'make' \
-p 'git init --bare dest.git' \
-c 'rm -rf dest.git' \
'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold={v} unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum'
Using this test we'll always use >100MB of memory on
origin/master (around ~105MB), but max out at e.g. ~55MB if we set
core.bigFileThreshold=50m.
The relevant "Maximum resident set size" lines were manually added
below the relevant benchmark:
'/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master' ran
Maximum resident set size (kbytes): 107080
1.02 ± 0.78 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
Maximum resident set size (kbytes): 106968
1.09 ± 0.79 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'origin/master'
Maximum resident set size (kbytes): 107032
1.42 ± 1.07 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=100m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 107072
1.83 ± 1.02 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=50m unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 55704
2.16 ± 1.19 times faster than '/usr/bin/time -v ./git -C dest.git -c core.bigFileThreshold=512 unpack-objects </tmp/repo/my.pack 2>&1 | grep Maximum' in 'HEAD'
Maximum resident set size (kbytes): 4564
This shows that if you have enough memory this new streaming method is
slower the lower you set the streaming threshold, but the benefit is
more bounded memory use.
An earlier version of this patch introduced a new
"core.bigFileStreamingThreshold" instead of re-using the existing
"core.bigFileThreshold" variable[1]. As noted in a detailed overview
of its users in [2] using it has several different meanings.
Still, we consider it good enough to simply re-use it. While it's
possible that someone might want to e.g. consider objects "small" for
the purposes of diffing but "big" for the purposes of writing them
such use-cases are probably too obscure to worry about. We can always
split up "core.bigFileThreshold" in the future if there's a need for
that.
0. https://github.com/avar/git-hyperfine/
1. https://lore.kernel.org/git/20211210103435.83656-1-chiyutianyi@gmail.com/
2. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Derrick Stolee <stolee@gmail.com>
Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
Signed-off-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The core.bigFileThreshold documentation has been largely unchanged
since 5eef828bc0 (fast-import: Stream very large blobs directly to
pack, 2010-02-01).
But since then this setting has been expanded to affect a lot more
than that description indicated. Most notably in how "git diff" treats
them, see 6bf3b81348 (diff --stat: mark any file larger than
core.bigfilethreshold binary, 2014-08-16).
In addition to that, numerous commands and APIs make use of a
streaming mode for files above this threshold.
So let's attempt to summarize 12 years of changes in behavior, which
can be seen with:
git log --oneline -Gbig_file_thre 5eef828bc03.. -- '*.c'
To do that turn this into a bullet-point list. The summary Han Xin
produced in [1] helped a lot, but is a bit too detailed for
documentation aimed at users. Let's instead summarize how
user-observable behavior differs, and generally describe how we tend
to stream these files in various commands.
1. https://lore.kernel.org/git/20220120112114.47618-5-chiyutianyi@gmail.com/
Helped-by: Han Xin <chiyutianyi@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>