Commit Graph

67085 Commits

Author SHA1 Message Date
Derrick Stolee
0243930af4 sparse-index: partially expand directories
The expand_to_pattern_list() method expands sparse directory entries
to their list of contained files when either the pattern list is NULL or
the directory is contained in the new pattern list's cone mode patterns.

It is possible that the pattern list has a recursive match with a
directory 'A/B/C/' and so an existing sparse directory 'A/B/' would need
to be expanded. If there exists a directory 'A/B/D/', then that
directory should not be expanded and instead we can create a sparse
directory.

To implement this, we plug into the add_path_to_index() callback for the
call to read_tree_at(). Since we now need access to both the index we
are writing and the pattern list we are comparing, create a 'struct
modify_index_context' to use as a data transfer object. It is important
that we use the given pattern list since we will use this pattern list
to change the sparse-checkout patterns and cannot use
istate->sparse_checkout_patterns.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
2d443389fd sparse-checkout: --no-sparse-index needs a full index
When the --no-sparse-index option is supplied, the sparse-checkout
builtin should explicitly ask to expand a sparse index to a full one.
This is currently done implicitly due to the command_requires_full_index
protection, but that will be removed in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
080ab56a46 cache-tree: implement cache_tree_find_path()
Given a 'struct cache_tree', it may be beneficial to navigate directly
to a node within that corresponds to a given path name. Create
cache_tree_find_path() for this function. It returns NULL when no such
path exists.

The implementation is adapted from do_invalidate_path() which does a
similar search but also modifies the nodes it finds along the way. The
method could be implemented simply using tail-recursion, but this while
loop does the same thing.

This new method is not currently used, but will be in an upcoming
change.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
9fadb373dd sparse-index: introduce partially-sparse indexes
A future change will present a temporary, in-memory mode where the index
can both contain sparse directory entries but also not be completely
collapsed to the smallest possible sparse directories. This will be
necessary for modifying the sparse-checkout definition while using a
sparse index.

For now, convert the single-bit member 'sparse_index' in 'struct
index_state' to be a an 'enum sparse_index_mode' with three modes:

* INDEX_EXPANDED (0): No sparse directories exist. This is always the
  case for repositories that do not use cone-mode sparse-checkout.

* INDEX_COLLAPSED: Sparse directories may exist. Files outside the
  sparse-checkout cone are reduced to sparse directory entries whenever
  possible.

* INDEX_PARTIALLY_SPARSE: Sparse directories may exist. Some file
  entries outside the sparse-checkout cone may exist. Running
  convert_to_sparse() may further reduce those files to sparse directory
  entries.

The main reason to store this extra information is to allow
convert_to_sparse() to short-circuit when the index is already in
INDEX_EXPANDED mode but to actually do the necessary work when in
INDEX_PARTIALLY_SPARSE mode.

The INDEX_PARTIALLY_SPARSE mode will be used in an upcoming change.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
dce241b020 sparse-index: create expand_index()
This is the first change in a series to allow modifying the
sparse-checkout pattern set without expanding a sparse index to a full
one in the process. Here, we focus on the problem of expanding the
pattern set through a command like 'git sparse-checkout add <path>'
which needs to create new index entries for the paths now being written
to the worktree.

To achieve this, we need to be able to replace sparse directory entries
with their contained files and subdirectories. Once this is complete,
other code paths can discover those cache entries and write the
corresponding files to disk before committing the index.

We already have logic in ensure_full_index() that expands the index
entries, so we will use that as our base. Create a new method,
expand_index(), which takes a pattern list, but for now mostly ignores
it. The current implementation is only correct when the pattern list is
NULL as that does the same as ensure_full_index(). In fact,
ensure_full_index() is converted to a shim over expand_index().

A future update will actually implement expand_index() to its full
capabilities. For now, it is created and documented.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:21 -07:00
Derrick Stolee
8846847a14 t1092: stress test 'git sparse-checkout set'
The 'sparse-index contents' test checks that the sparse index has the
correct set of sparse directories in the index after modifying the cone
mode patterns using 'git sparse-checkout set'. Add to the coverage here
by adding more complicated scenarios that were not previously tested.

In order to check paths that do not exist at HEAD, we need to modify the
test_sparse_checkout_set helper slightly:

1. Add the --skip-checks argument to the 'set' command to avoid failures
   when passing paths that do not exist at HEAD.

2. When looking for the non-existence of sparse directories for the
   paths in $CONE_DIRS, allow the rev-list command to fail because the
   path does not exist at HEAD.

This allows us to add some interesting test cases.

Helped-by: Victoria Dye <vdye@github.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:20 -07:00
Derrick Stolee
baa73e2b75 t1092: refactor 'sparse-index contents' test
Before expanding this test with more involved cases, first extract the
repeated logic into a new test_sparse_checkout_set helper. This helper
checks that 'git sparse-checkout set ...' succeeds and then verifies
that certain directories have sparse directory entries in the sparse
index. It also verifies that the in-cone directories are _not_ sparse
directory entries in the sparse index.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-23 11:08:20 -07:00
Johannes Schindelin
3069f2a6f4 ci: call finalize_test_case_output a little later
We used to call that function already before printing the final verdict.
However, now that we added grouping to the GitHub workflow output, we
will want to include even that part in the collapsible group for that
test case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:57 -07:00
Johannes Schindelin
aeea0084a0 ci(github): mention where the full logs can be found
The full logs are contained in the `failed-tests-*.zip` artifacts that
are attached to the failed CI run. Since this is not immediately
obvious to the well-disposed reader, let's mention it explicitly.

Suggested-by: Victoria Dye <vdye@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
0068c82a13 ci: use --github-workflow-markup in the GitHub workflow
This makes the output easier to digest.

Note: since workflow output currently cannot contain any nested groups
(see https://github.com/actions/runner/issues/802 for details), we need
to remove the explicit grouping that would span the entirety of each
failed test script.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Victoria Dye
110e91150d ci(github): avoid printing test case preamble twice
We want to mark up the test case preamble when presenting test output in
Git's GitHub workflow. Let's suppress the non-marked-up version in that
case. Any information it would contain is included in the marked-up
variant already.

Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
448de909a7 ci(github): skip the logs of the successful test cases
In most instances, looking at the log of failed test cases is enough to
identify the problem.

In some (rare?) instances, a previous test case that was marked as
successful actually has information pertaining to a later test case that
fails.

To allow the page to load relatively quickly, let's only show the logs
of the failed test cases to be shown. The full logs are available for
download as artifacts, should a deeper investigation become necessary.

Co-authored-by: Victoria Dye <vdye@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
0f5ae593be ci: optionally mark up output in the GitHub workflow
A couple of commands exist to spruce up the output in GitHub workflows:
https://docs.github.com/en/actions/learn-github-actions/workflow-commands-for-github-actions

In addition to the `::group::<label>`/`::endgroup::` commands (which we
already use to structure the output of the build step better), we also
use `::error::`/`::notice::` to draw the attention to test failures and
to test cases that were expected to fail but didn't.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
dab73aebd8 ci/run-build-and-tests: add some structure to the GitHub workflow output
The current output of Git's GitHub workflow can be quite confusing,
especially for contributors new to the project.

To make it more helpful, let's introduce some collapsible grouping.
Initially, readers will see the high-level view of what actually
happened (did the build fail, or the test suite?). To drill down, the
respective group can be expanded.

Note: sadly, workflow output currently cannot contain any nested groups
(see https://github.com/actions/runner/issues/802 for details),
therefore we take pains to ensure to end any previous group before
starting a new one.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
08dccc8fc1 ci: make it easier to find failed tests' logs in the GitHub workflow
When investigating a test failure, the time that matters most is the
time it takes from getting aware of the failure to displaying the output
of the failing test case.

You currently have to know a lot of implementation details when
investigating test failures in the CI runs. The first step is easy: the
failed job is marked quite clearly, but when opening it, the failed step
is expanded, which in our case is the one running
`ci/run-build-and-tests.sh`. This step, most notably, only offers a
high-level view of what went wrong: it prints the output of `prove`
which merely tells the reader which test script failed.

The actually interesting part is in the detailed log of said failed
test script. But that log is shown in the CI run's step that runs
`ci/print-test-failures.sh`. And that step is _not_ expanded in the web
UI by default. It is even marked as "successful", which makes it very
easy to miss that there is useful information hidden in there.

Let's help the reader by showing the failed tests' detailed logs in the
step that is expanded automatically, i.e. directly after the test suite
failed.

This also helps the situation where the _build_ failed and the
`print-test-failures` step was executed under the assumption that the
_test suite_ failed, and consequently failed to find any failed tests.

An alternative way to implement this patch would be to source
`ci/print-test-failures.sh` in the `handle_test_failures` function to
show these logs. However, over the course of the next few commits, we
want to introduce some grouping which would be harder to achieve that
way (for example, we do want a leaner, and colored, preamble for each
failed test script, and it would be trickier to accommodate the lack of
nested groupings in GitHub workflows' output).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:56 -07:00
Johannes Schindelin
b95181cf82 ci/run-build-and-tests: take a more high-level view
In the web UI of GitHub workflows, failed runs are presented with the
job step that failed auto-expanded. In the current setup, this is not
helpful at all because that shows only the output of `prove`, which says
which test failed, but not in what way.

What would help understand the reader what went wrong is the verbose
test output of the failed test.

The logs of the failed runs do contain that verbose test output, but it
is shown in the _next_ step (which is marked as succeeding, and is
therefore _not_ auto-expanded). Anyone not intimately familiar with this
would completely miss the verbose test output, being left mostly
puzzled with the test failures.

We are about to show the failed test cases' output in the _same_ step,
so that the user has a much easier time to figure out what was going
wrong.

But first, we must partially revert the change that tried to improve the
CI runs by combining the `Makefile` targets to build into a single
`make` invocation. That might have sounded like a good idea at the time,
but it does make it rather impossible for the CI script to determine
whether the _build_ failed, or the _tests_. If the tests were run at
all, that is.

So let's go back to calling `make` for the build, and call `make test`
separately so that we can easily detect that _that_ invocation failed,
and react appropriately.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Johannes Schindelin
270ccd2a67 test(junit): avoid line feeds in XML attributes
In the test case's output, we do want newline characters, but in the XML
attributes we do not want them.

However, the `xml_attr_encode` function always adds a Line Feed at the
end (which are then encoded as `&#x0a;`, even for XML attributes.

This seems not to faze Azure Pipelines' XML parser, but it still is
incorrect, so let's fix it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Johannes Schindelin
78d5e4cfb4 tests: refactor --write-junit-xml code
The code writing JUnit XML is interspersed directly with all the code in
`t/test-lib.sh`, and it is therefore not only ill-separated, but
introducing yet another output format would make the situation even
worse.

Let's introduce an abstraction layer by hiding the JUnit XML code behind
four new functions that are supposed to be called before and after each
test and test case.

This is not just an academic exercise, refactoring for refactoring's
sake. We _actually_ want to introduce such a new output format, to
make it substantially easier to diagnose test failures in our GitHub
workflow, therefore we do need this refactoring.

This commit is best viewed with `git show --color-moved
--color-moved-ws=allow-indentation-change <commit>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Johannes Schindelin
863d6ceb52 ci: fix code style
In b92cb86ea1 (travis-ci: check that all build artifacts are
.gitignore-d, 2017-12-31), a function was introduced with a code style
that is different from the surrounding code: it added the opening curly
brace on its own line, when all the existing functions in the same file
cuddle that brace on the same line as the function name.

Let's make the code style consistent again.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-21 16:25:55 -07:00
Taylor Blau
3d89a8c118 Documentation/technical: add cruft-packs.txt
Create a technical document to explain cruft packs. It contains a brief
overview of the problem, some background, details on the implementation,
and a couple of alternative approaches not considered here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 22:31:21 -07:00
Junio C Hamano
f9b95943b6 First batch for 2.37
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 15:27:00 -07:00
Junio C Hamano
69e3d1e550 Merge branch 'cb/ci-make-p4-optional'
macOS CI jobs have been occasionally flaky due to tentative version
skew between perforce and the homebrew packager.  Instead of
failing the whole CI job, just let it skip the p4 tests when this
happens.

* cb/ci-make-p4-optional:
  ci: use https, not http to download binaries from perforce.com
  ci: reintroduce prevention from perforce being quarantined in macOS
  ci: avoid brew for installing perforce
  ci: make failure to find perforce more user friendly
2022-05-20 15:27:00 -07:00
Junio C Hamano
3af1df0415 Merge branch 'tk/p4-metadata-coding-strategies'
"git p4" updates.

* tk/p4-metadata-coding-strategies:
  git-p4: improve encoding handling to support inconsistent encodings
2022-05-20 15:27:00 -07:00
Junio C Hamano
e121c8c0d9 Merge branch 'ep/equals-null-cocci'
Merges up ep/maint-equals-null-cocci to the current codebase.

* ep/equals-null-cocci:
  tree-wide: apply equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano
538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano
acdeb10f91 Merge branch 'ds/sparse-colon-path'
"git show :<path>" learned to work better with the sparse-index
feature.

* ds/sparse-colon-path:
  rev-parse: integrate with sparse index
  object-name: diagnose trees in index properly
  object-name: reject trees found in the index
  show: integrate with the sparse index
  t1092: add compatibility tests for 'git show'
2022-05-20 15:26:58 -07:00
Junio C Hamano
5a9253cd45 Merge branch 'vd/sparse-stash'
Teach "git stash" to work better with sparse index entries.

* vd/sparse-stash:
  unpack-trees: preserve index sparsity
  stash: apply stash using 'merge_ort_nonrecursive()'
  read-cache: set sparsity when index is new
  sparse-index: expose 'is_sparse_index_allowed()'
  stash: integrate with sparse index
  stash: expand sparse-checkout compatibility testing
2022-05-20 15:26:58 -07:00
Junio C Hamano
945b9f2c31 Merge branch 'cd/bisect-messages-from-pre-flight-states'
"git bisect" was too silent before it is ready to start computing
the actual bisection, which has been corrected.

* cd/bisect-messages-from-pre-flight-states:
  bisect: output bisect setup status in bisect log
  bisect: output state before we are ready to compute bisection
2022-05-20 15:26:58 -07:00
Junio C Hamano
9a7176d9fb Merge branch 'jc/update-ozlabs-url'
* jc/update-ozlabs-url:
  SubmittingPatches: use more stable git.ozlabs.org URL
2022-05-20 15:26:58 -07:00
Junio C Hamano
ed54e1b31a Merge branch 'gc/pull-recurse-submodules'
"git pull" without "--recurse-submodules=<arg>" made
submodule.recurse take precedence over fetch.recurseSubmodules by
mistake, which has been corrected.

* gc/pull-recurse-submodules:
  pull: do not let submodule.recurse override fetch.recurseSubmodules
2022-05-20 15:26:57 -07:00
Junio C Hamano
1dff6dc016 Merge branch 'mg/detect-compiler-in-c-locale'
Build procedure fixup.

* mg/detect-compiler-in-c-locale:
  detect-compiler: make detection independent of locale
2022-05-20 15:26:56 -07:00
Junio C Hamano
3ab732864a Merge branch 'js/trace2-doc-fixes'
Trace2 documentation updates.

* js/trace2-doc-fixes:
  trace2 docs: add missing full stop
  trace2 docs: clarify what `varargs` is all about
  trace2 docs: fix a JSON formatted example
  trace2 docs: surround more terms in backticks
  trace2 docs: "printf" is not an English word
  trace2 docs: a couple of grammar fixes
2022-05-20 15:26:56 -07:00
Junio C Hamano
6f24da652c Merge branch 'mv/log-since-as-filter'
"git log --since=X" will stop traversal upon seeing a commit that
is older than X, but there may be commits behind it that is younger
than X when the commit was created with a faulty clock.  A new
option is added to keep digging without stopping, and instead
filter out commits with timestamp older than X.

* mv/log-since-as-filter:
  log: "--since-as-filter" option is a non-terminating "--since" variant
2022-05-20 15:26:56 -07:00
Junio C Hamano
2e969751ec Merge branch 'rs/external-diff-tempfile'
The temporary files fed to external diff command are now generated
inside a new temporary directory under the same basename.

* rs/external-diff-tempfile:
  diff: use mks_tempfile_dt()
  tempfile: add mks_tempfile_dt()
2022-05-20 15:26:55 -07:00
Junio C Hamano
586f23705c Merge branch 'kf/p4-multiple-remotes'
"git p4" update.

* kf/p4-multiple-remotes:
  git-p4: fix issue with multiple perforce remotes
2022-05-20 15:26:55 -07:00
Junio C Hamano
af3a3205d1 Merge branch 'tk/p4-with-explicity-sync'
"git p4" update.

* tk/p4-with-explicity-sync:
  git-p4: support explicit sync of arbitrary existing git-p4 refs
2022-05-20 15:26:55 -07:00
Junio C Hamano
804ec0301f Merge branch 'tk/p4-utf8-bom'
"git p4" update.

* tk/p4-utf8-bom:
  git-p4: preserve utf8 BOM when importing from p4 to git
2022-05-20 15:26:54 -07:00
Junio C Hamano
2e55151800 Merge branch 'cg/vscode-with-gdb'
VS code configuration updates.

* cg/vscode-with-gdb:
  contrib/vscode/: debugging with VS Code and gdb
2022-05-20 15:26:54 -07:00
Junio C Hamano
bdba04d4d0 Merge branch 'sa/t1011-use-helpers'
A GSoC practice.

* sa/t1011-use-helpers:
  t1011: replace test -f with test_path_is_file
2022-05-20 15:26:54 -07:00
Junio C Hamano
6b3d47a960 Merge branch 'km/t3501-use-test-helpers'
Test script updates.

* km/t3501-use-test-helpers:
  t3501: remove test -f and stop ignoring git <cmd> exit code
2022-05-20 15:26:54 -07:00
Junio C Hamano
ee0241bd22 Merge branch 'pb/submodule-recurse-mode-enum'
Small code clean-up.

* pb/submodule-recurse-mode-enum:
  submodule.h: use a named enum for RECURSE_SUBMODULES_*
2022-05-20 15:26:53 -07:00
Junio C Hamano
0a88638b0b Merge branch 'ah/convert-warning-message'
Update a few end-user facing messages around eol conversion.

* ah/convert-warning-message:
  convert: clarify line ending conversion warning
2022-05-20 15:26:53 -07:00
Junio C Hamano
87d6bec2c8 Merge branch 'gf/unused-includes'
Remove unused includes.

* gf/unused-includes:
  apply.c: remove unnecessary include
  serve.c: remove unnecessary include
2022-05-20 15:26:53 -07:00
Junio C Hamano
4976f244f3 Merge branch 'gf/shorthand-version-and-help'
"git -v" and "git -h" are now understood as "git --version" and
"git --help".

* gf/shorthand-version-and-help:
  cli: add -v and -h shorthands
2022-05-20 15:26:53 -07:00
Junio C Hamano
796388bebd Merge branch 'rs/t7812-pcre2-ws-bug-test'
A test to ensure workaround for an earlier pcre2 bug does work.

* rs/t7812-pcre2-ws-bug-test:
  t7812: test PCRE2 whitespace bug
2022-05-20 15:26:52 -07:00
Junio C Hamano
f5203a4220 Merge branch 'ds/do-not-call-bug-on-bad-refs'
Code clean-up.

* ds/do-not-call-bug-on-bad-refs:
  clone: die() instead of BUG() on bad refs
2022-05-20 15:26:52 -07:00
Junio C Hamano
1256a25ecd Merge branch 'sg/safe-directory-tests-and-docs'
New tests for the safe.directory mechanism.

* sg/safe-directory-tests-and-docs:
  safe.directory: document and check that it's ignored in the environment
  t0033-safe-directory: check when 'safe.directory' is ignored
  t0033-safe-directory: check the error message without matching the trash dir
2022-05-20 15:26:52 -07:00
Taylor Blau
66731ff921 builtin/repack.c: ensure that names is sorted
The previous patch demonstrates a scenario where the list of packs
written by `pack-objects` (and stored in the `names` string_list) is
out-of-order, and can thus cause us to delete packs we shouldn't.

This patch resolves that bug by ensuring that `names` is sorted in all
cases, not just when

    delete_redundant && pack_everything & ALL_INTO_ONE

is true.

Because we did sort `names` in that case (which, prior to `--geometric`
repacks, was the only time we would actually delete packs, this is only
a bug for `--geometric` repacks.

It would be sufficient to only sort `names` when `delete_redundant` is
set to a non-zero value. But sorting a small list of strings is cheap,
and it is defensive against future calls to `string_list_has_string()`
on this list.

Co-discovered-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 13:54:44 -07:00
Taylor Blau
aab7bea14f t7703: demonstrate object corruption with pack.packSizeLimit
When doing a `--geometric=<d>` repack, `git repack` determines a
splitting point among packs ordered by their object count such that:

  - each pack above the split has at least `<d>` times as many objects
    as the next-largest pack by object count, and
  - the first pack above the split has at least `<d>` times as many
    object as the sum of all packs below the split line combined

`git repack` then creates a pack containing all of the objects contained
in packs below the split line by running `git pack-objects
--stdin-packs` underneath. Once packs are moved into place, then any
packs below the split line are removed, since their objects were just
combined into a new pack.

But `git repack` tries to be careful to avoid removing a pack that it
just wrote, by checking:

    struct packed_git *p = geometry->pack[i];
    if (string_list_has_string(&names, hash_to_hex(p->hash)))
      continue;

in the `delete_redundant` and `geometric` conditional towards the end of
`cmd_repack`.

But it's possible to trick `git repack` into not recognizing a pack that
it just wrote when `names` is out-of-order (which violates
`string_list_has_string()`'s assumption that the list is sorted and thus
binary search-able).

When this happens in just the right circumstances, it is possible to
remove a pack that we just wrote, leading to object corruption.

Luckily, this is quite difficult to provoke in practice (for a couple of
reasons):

  - we ordinarily write just one pack, so `names` usually contains just
    one entry, and is thus sorted
  - when we do write more than one pack (e.g., due to `--max-pack-size`)
    we have to: (a) write a pack identical to one that already
    exists, (b) have that pack be below the split line, and (c) have
    the set of packs written by `pack-objects` occur in an order which
    tricks `string_list_has_string()`.

Demonstrate the above scenario in a failing test, which causes `git
repack --geometric` to write a pack which occurs below the split line,
_and_ fail to recognize that it wrote that pack.

The following patch will fix this bug.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 13:42:40 -07:00
Victoria Dye
4b5a808bb9 repack: respect --keep-pack with geometric repack
Update 'repack' to ignore packs named on the command line with the
'--keep-pack' option. Specifically, modify 'init_pack_geometry()' to treat
command line-kept packs the same way it treats packs with an on-disk '.keep'
file (that is, skip the pack and do not include it in the 'geometry'
structure).

Without this handling, a '--keep-pack' pack would be included in the
'geometry' structure. If the pack is *before* the geometry split line (with
at least one other pack and/or loose objects present), 'repack' assumes the
pack's contents are "rolled up" into another pack via 'pack-objects'.
However, because the internally-invoked 'pack-objects' properly excludes
'--keep-pack' objects, any new pack it creates will not contain the kept
objects. Finally, 'repack' deletes the '--keep-pack' as "redundant" (since
it assumes 'pack-objects' created a new pack with its contents), resulting
in possible object loss and repository corruption.

Add a test ensuring that '--keep-pack' packs are now appropriately handled.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Victoria Dye <vdye@github.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-20 12:56:29 -07:00