Commit Graph

56265 Commits

Author SHA1 Message Date
Junio C Hamano
5d5c46b28c Merge branch 'ds/object-info-for-prefetch-fix'
Code cleanup and futureproof.

* ds/object-info-for-prefetch-fix:
  sha1-file: split OBJECT_INFO_FOR_PREFETCH
2019-06-17 10:15:14 -07:00
Johannes Schindelin
cc8d872e69 t3404: fix a typo
This one slipped through the review of a9279c6785 (sequencer: do not
squash 'reword' commits when we hit conflicts, 2018-06-19).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-14 12:30:23 -07:00
Junio C Hamano
0aae918dd9 The first batch after 2.22
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 13:23:03 -07:00
Junio C Hamano
c510261154 Merge branch 'pw/rebase-edit-message-for-replayed-merge'
A "merge -c" instruction during "git rebase --rebase-merges" should
give the user a chance to edit the log message, even when there is
otherwise no need to create a new merge and replace the existing
one (i.e. fast-forward instead), but did not.  Which has been
corrected.

* pw/rebase-edit-message-for-replayed-merge:
  rebase -r: always reword merge -c
2019-06-13 13:19:43 -07:00
Junio C Hamano
51d6c0f015 Merge branch 'ab/deprecate-R-for-dynpath'
The way of specifying the path to find dynamic libraries at runtime
has been simplified.  The old default to pass -R/path/to/dir has been
replaced with the new default to pass -Wl,-rpath,/path/to/dir,
which is the more recent GCC uses.  Those who need to build with an
old GCC can still use "CC_LD_DYNPATH=-R"

* ab/deprecate-R-for-dynpath:
  Makefile: remove the NO_R_TO_GCC_LINKER flag
2019-06-13 13:19:43 -07:00
Junio C Hamano
2a983b227d Merge branch 'mh/import-transport-fd-fix'
The ownership rule for the file descriptor to fast-import remote
backend was mixed up, leading to unrelated file descriptor getting
closed, which has been fixed.

* mh/import-transport-fd-fix:
  Use xmmap_gently instead of xmmap in use_pack
  dup() the input fd for fast-import used for remote helpers
2019-06-13 13:19:43 -07:00
Junio C Hamano
813a3a2ab7 Merge branch 'ew/update-server-info'
"git update-server-info" learned not to rewrite the file with the
same contents.

* ew/update-server-info:
  update-server-info: avoid needless overwrites
2019-06-13 13:19:42 -07:00
Junio C Hamano
8d32d2552e Merge branch 'jk/help-unknown-ref-fix'
Improve the code to show args with potential typo that cannot be
interpreted as a commit-ish.

* jk/help-unknown-ref-fix:
  help_unknown_ref(): check for refname ambiguity
  help_unknown_ref(): duplicate collected refnames
2019-06-13 13:19:42 -07:00
Junio C Hamano
e91f65d0e2 Merge branch 'dl/format-patch-notes-config'
"git format-patch" learns a configuration to set the default for
its --notes=<ref> option.

* dl/format-patch-notes-config:
  format-patch: teach format.notes config option
  git-format-patch.txt: document --no-notes option
2019-06-13 13:19:42 -07:00
Junio C Hamano
c4a38d161c Merge branch 'nd/merge-quit'
"git merge" learned "--quit" option that cleans up the in-progress
merge while leaving the working tree and the index still in a mess.

* nd/merge-quit:
  merge: add --quit
  merge: remove drop_save() in favor of remove_merge_branch_state()
2019-06-13 13:19:41 -07:00
Junio C Hamano
89d1b573d7 Merge branch 'ab/fail-prereqs-in-test'
Developer support to emulate unsatisfied prerequisites in tests to
ensure that the remainer of the tests still succeeds when tests
with prerequisites are skipped.

* ab/fail-prereqs-in-test:
  tests: add a special setup where prerequisites fail
2019-06-13 13:19:41 -07:00
Junio C Hamano
000bce0ee4 Merge branch 'nd/corrupt-worktrees'
"git worktree add" used to fail when another worktree connected to
the same repository was corrupt, which has been corrected.

* nd/corrupt-worktrees:
  worktree add: be tolerant of corrupt worktrees
2019-06-13 13:19:41 -07:00
Junio C Hamano
ed7f8acbaa Merge branch 'js/rebase-cleanup'
Update supporting parts of "git rebase" to remove code that should
no longer be used.

* js/rebase-cleanup:
  rebase: fold git-rebase--common into the -p backend
  sequencer: the `am` and `rebase--interactive` scripts are gone
  .gitignore: there is no longer a built-in `git-rebase--interactive`
  t3400: stop referring to the scripted rebase
  Drop unused git-rebase--am.sh
2019-06-13 13:19:40 -07:00
Junio C Hamano
0d107b1989 Merge branch 'nd/worktree-name-sanitization'
In recent versions of Git, per-worktree refs are exposed in
refs/worktrees/<wtname>/ hierarchy, which means that worktree names
must be a valid refname component.  The code now sanitizes the names
given to worktrees, to make sure these refs are well-formed.

* nd/worktree-name-sanitization:
  worktree add: sanitize worktree names
2019-06-13 13:19:40 -07:00
Junio C Hamano
66dc7b68e4 Merge branch 'en/fast-export-encoding'
The "git fast-export/import" pair has been taught to handle commits
with log messages in encoding other than UTF-8 better.

* en/fast-export-encoding:
  fast-export: do automatic reencoding of commit messages only if requested
  fast-export: differentiate between explicitly UTF-8 and implicitly UTF-8
  fast-export: avoid stripping encoding header if we cannot reencode
  fast-import: support 'encoding' commit header
  t9350: fix encoding test to actually test reencoding
2019-06-13 13:19:40 -07:00
Junio C Hamano
c0e78f7e46 Merge branch 'jk/unused-params-final-batch'
* jk/unused-params-final-batch:
  verify-commit: simplify parameters to run_gpg_verify()
  show-branch: drop unused parameter from show_independent()
  rev-list: drop unused void pointer from finish_commit()
  remove_all_fetch_refspecs(): drop unused "remote" parameter
  receive-pack: drop unused "commands" from prepare_shallow_update()
  pack-objects: drop unused rev_info parameters
  name-rev: drop unused parameters from is_better_name()
  mktree: drop unused length parameter
  wt-status: drop unused status parameter
  read-cache: drop unused parameter from threaded load
  clone: drop dest parameter from copy_alternates()
  submodule: drop unused prefix parameter from some functions
  builtin: consistently pass cmd_* prefix to parse_options
  cmd_{read,write}_tree: rename "unused" variable that is used
2019-06-13 13:19:34 -07:00
Junio C Hamano
8202d12fca Merge branch 'sb/format-patch-base-patch-id-fix'
The "--base" option of "format-patch" computed the patch-ids for
prerequisite patches in an unstable way, which has been updated to
compute in a way that is compatible with "git patch-id --stable".

* sb/format-patch-base-patch-id-fix:
  format-patch: make --base patch-id output stable
  format-patch: inform user that patch-id generation is unstable
2019-06-13 13:18:46 -07:00
Junio C Hamano
cf3269fba8 Merge branch 'nd/init-relative-template-fix'
A relative pathname given to "git init --template=<path> <repo>"
ought to be relative to the directory "git init" gets invoked in,
but it instead was made relative to the repository, which has been
corrected.

* nd/init-relative-template-fix:
  init: make --template path relative to $CWD
2019-06-13 13:18:46 -07:00
Junio C Hamano
86d2271f06 Merge branch 'ab/send-email-transferencoding-fix'
Since "git send-email" learned to take 'auto' as the value for the
transfer-encoding, it by mistake stopped honoring the values given
to the configuration variables sendemail.transferencoding and/or
sendemail.<ident>.transferencoding.  This has been corrected to
(finally) redoing the order of setting the default, reading the
configuration and command line options.

* ab/send-email-transferencoding-fix:
  send-email: fix regression in sendemail.identity parsing
  send-email: document --no-[to|cc|bcc]
  send-email: fix broken transferEncoding tests
  send-email: remove cargo-culted multi-patch pattern in tests
  send-email: do defaults -> config -> getopt in that order
  send-email: rename the @bcclist variable for consistency
  send-email: move the read_config() function above getopts
2019-06-13 13:18:46 -07:00
René Scharfe
568a05c5ec cleanup: fix possible overflow errors in binary search, part 2
Calculating the sum of two array indexes to find the midpoint between
them can overflow, i.e. code like this is unsafe for big arrays:

	mid = (first + last) >> 1;

Make sure the intermediate value stays within the boundaries instead,
like this:

	mid = first + ((last - first) >> 1);

The loop condition of the binary search makes sure that 'last' is
always greater than 'first', so this is safe as long as 'first' is
not negative.  And that can be verified easily using the pre-context
of each change, except for name-hash.c, so add an assertion to that
effect there.

The unsafe calculations were found with:

	git grep '(.*+.*) *>> *1'

This is a continuation of 19716b21a4 (cleanup: fix possible overflow
errors in binary search, 2017-10-08).

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 11:28:53 -07:00
Phillip Wood
2bd69b9024 add -p: fix checkout -p with pathological context
Commit fecc6f3a68 ("add -p: adjust offsets of subsequent hunks when one is
skipped", 2018-03-01) fixed adding hunks in the correct place when a
previous hunk has been skipped. However it did not address patches that
are applied in reverse. In that case we need to adjust the pre-image
offset so that when apply reverses the patch the post-image offset is
adjusted correctly. We subtract rather than add the delta as the patch
is reversed (the easiest way to think about it is to consider a hunk of
deletions that is skipped - in that case we want to reduce offset so we
need to subtract).

Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 10:00:30 -07:00
Johannes Schindelin
9dae4fe79f config: avoid calling labs() on too-large data type
The `labs()` function operates, as the initial `l` suggests, on `long`
parameters. However, in `config.c` we tried to use it on values of type
`intmax_t`.

This problem was found by GCC v9.x.

To fix it, let's just "unroll" the function (i.e. negate the value if it
is negative).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 09:34:17 -07:00
Johannes Schindelin
4fa42df972 winansi: simplify loading the GetCurrentConsoleFontEx() function
We introduced helper macros to simplify loading functions dynamically.
Might just as well use them.

This also side-steps a compiler warning when building with GCC v8.x: it
would complain about casting between incompatible function pointers.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 09:34:16 -07:00
Johannes Schindelin
08e0450690 kwset: allow building with GCC 8
The kwset functionality makes use of the obstack code, which expects to
be handed a function that can allocate large chunks of data. It expects
that function to accept a `size` parameter of type `long`.

This upsets GCC 8 on Windows, because `long` does not have the same
bit size as `size_t` there.

Now, the proper thing to do would be to switch to `size_t`. But this
would make us deviate from the "upstream" code even further, making it
hard to synchronize with newer versions, and also it would be quite
involved because that `long` type is so invasive in that code.

Let's punt, and instead provide a super small wrapper around
`xmalloc()`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 09:34:16 -07:00
Johannes Schindelin
8d4679fe09 poll (mingw): allow compiling with GCC 8 and DEVELOPER=1
The return type of the `GetProcAddress()` function is `FARPROC` which
evaluates to `long long int (*)()`, i.e. it cannot be cast to the
correct function signature by GCC 8.

To work around that, we first cast to `void *` and go on with our merry
lives.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-13 09:34:16 -07:00
Johannes Sixt
7e6d6f7610 mergetool: use shell variable magic instead of awk
git-mergetool spawns an enormous amount of processes. For this reason,
the test script, t7610, is exceptionally slow, in particular, on
Windows. Most of the processes are invocations of git. There are
also some that can be replaced with shell builtins. Avoid repeated
calls of `git ls-files` and `awk`.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 13:20:56 -07:00
Johannes Sixt
8b01465510 mergetool: dissect strings with shell variable magic instead of expr
git-mergetool spawns an enormous amount of processes. For this reason,
the test script, t7610, is exceptionally slow, in particular, on
Windows. Most of the processes are invocations of git. There are
also some that can be replaced with shell builtins. Do so with `expr`.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 13:20:56 -07:00
Johannes Sixt
e10dffd067 t7610-mergetool: use test_cmp instead of test $(cat file) = $txt
Fix that anti-pattern by a sequence of echo and test_cmp.

The patch was generated with this command:

   sed -i -e '/test.*(cat/s/^\(\t*\)test "..cat \(.*\))" = \(".*"\)\(.*\)/\1echo \3 >expect \&\&\n\1test_cmp expect \2\4/' t7610-mergetool.sh

This helps on Windows, where test_cmp avoids spawning a process when
there is no difference.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 13:20:56 -07:00
Derrick Stolee
2d511cfc0b packfile: rename close_all_packs to close_object_store
The close_all_packs() method is now responsible for more than just pack-files.
It also closes the commit-graph and the multi-pack-index. Rename the function
to be more descriptive of its larger role. The name also fits because the
input parameter is a raw_object_store.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:33:54 -07:00
Derrick Stolee
5472c32c37 packfile: close commit-graph in close_all_packs
The close_all_packs() method is used to close all read handles to
pack-files and the multi-pack-index before running 'git gc --auto'.
This is particularly important on the Windows platform, where read
handles block any writes to those files. Replacing one of these
files with a rename() will fail in this situation.

The commit-graph also performs a rename, so is susceptable to this
problem. We are careful to close the commit-graph before writing,
but that doesn't work when a 'git fetch' (or similar) process runs
'git gc --auto' which may write a commit-graph.

Here, close the commit-graph as part of close_all_packs().

Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:33:54 -07:00
Derrick Stolee
c3a3a964b2 commit-graph: use raw_object_store when closing
The close_commit_graph() method took a repository struct, but then
only uses the raw_object_store within. Change the function prototype
to make the method more flexible.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:33:54 -07:00
Derrick Stolee
238def57fe commit-graph: extract write_commit_graph_file()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract write_commit_graph_file() that takes all of the information
in the context struct and writes the data to a commit-graph file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:54 -07:00
Derrick Stolee
f998d54226 commit-graph: extract copy_oids_to_commits()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract copy_oids_to_commits(), which fills the commits list
with the distinct commits from the oids list. During this loop,
it also counts the number of "extra" edges from octopus merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:54 -07:00
Derrick Stolee
014e3440f5 commit-graph: extract count_distinct_commits()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract count_distinct_commits(), which sorts the oids list, then
iterates through to find duplicates.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:54 -07:00
Derrick Stolee
b2c8306052 commit-graph: extract fill_oids_from_all_packs()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract fill_oids_from_all_packs() that reads all pack-files
for commits and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:54 -07:00
Derrick Stolee
4c9efe850d commit-graph: extract fill_oids_from_commit_hex()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

Extract fill_oids_from_commit_hex() that reads the given commit
id list and fille the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:54 -07:00
Derrick Stolee
ef5b83f2cf commit-graph: extract fill_oids_from_packs()
The write_commit_graph() method is too complex, so we are
extracting helper functions one by one.

This extracts fill_oids_from_packs() that reads the given
pack-file list and fills the oid list in the context.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Derrick Stolee
c9905beade commit-graph: create write_commit_graph_context
The write_commit_graph() method is too large and complex. To simplify
it, we should extract several helper functions. However, we will risk
repeating a lot of declarations related to progress incidators and
object id or commit lists.

Create a new write_commit_graph_context struct that contains the
core data structures used in this process. Replace the other local
variables with the values inside the context object. Following this
change, we will start to lift code segments wholesale out of the
write_commit_graph() method and into helper functions.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Derrick Stolee
10bd0be173 commit-graph: remove Future Work section
The commit-graph feature began with a long list of planned
benefits, most of which are now complete. The future work
section has only a few items left.

As for making more algorithms aware of generation numbers,
some are only waiting for generation number v2 to ensure the
performance matches the existing behavior using commit date.

It is unlikely that we will ever send a commit-graph file
as part of the protocol, since we would need to verify the
data, and that is expensive. If we want to start trusting
remote content, then that item can be investigated again.

While there is more work to be done on the feature, having
a section of the docs devoted to a TODO list is wasteful and
hard to keep up-to-date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Derrick Stolee
5af8039452 commit-graph: collapse parameters into flags
The write_commit_graph() and write_commit_graph_reachable() methods
currently take two boolean parameters: 'append' and 'report_progress'.
As we update these methods, adding more parameters this way becomes
cluttered and hard to maintain.

Collapse these parameters into a 'flags' parameter, and adjust the
callers to provide flags as necessary.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Derrick Stolee
e103f7276f commit-graph: return with errors during write
The write_commit_graph() method uses die() to report failure and
exit when confronted with an unexpected condition. This use of
die() in a library function is incorrect and is now replaced by
error() statements and an int return type. Return zero on success
and a negative value on failure.

Now that we use 'goto cleanup' to jump to the terminal condition
on an error, we have new paths that could lead to uninitialized
values. New initializers are added to correct for this.

The builtins 'commit-graph', 'gc', and 'commit' call these methods,
so update them to check the return value. Test that 'git commit-graph
write' returns a proper error code when hitting a failure condition
in write_commit_graph().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 11:20:53 -07:00
Ævar Arnfjörð Bjarmason
3efa1c6b33 Revert "test-lib: whitelist GIT_TR2_* in the environment"
This reverts my commit c1ee5796dc ("test-lib: whitelist GIT_TR2_* in
the environment", 2019-03-30), which is now redundant.

Since e4b75d6a1d ("trace2: rename environment variables to
GIT_TRACE2*", 2019-05-19) the GIT_TRACE2* variables match the existing
GIT_TRACE* pattern added in 95a1d12e9b ("tests: scrub environment of
GIT_* variables", 2011-03-15), so we no longer need to list TR2 here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 10:51:13 -07:00
Nguyễn Thái Ngọc Duy
69702523af completion: do not cache if --git-completion-helper fails
"git <cmd> --git-completion-helper" could fail if the command checks for
a repo before parse_options(). If the result is cached, later on when
the user moves to a worktree with repo, tab completion will still fail.

Avoid this by detecting errors and not cache the completion output. We
can try again and hopefully succeed next time (e.g. when a repo is
found).

Of course if --git-completion-helper fails permanently because of other
reasons (*), this will slow down completion. But I don't see any better
option to handle that case.

(*) one of those cases is if __gitcomp_builtin is called on a command
  that does not support --git-completion-helper. And we do have a
  generic call

    __git_complete_common "$command"

  but this case is protected with __git_support_parseopt_helper so we're
  good.

Reported-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-12 10:36:46 -07:00
Jonathan Tan
810e19322d t5616: cover case of client having delta base
When fetching into a partial clone, Git first prefetches missing
REF_DELTA bases from the promisor remote. (This feature was introduced
in [1].) But as can be seen in a recent test coverage report [2], the
case in which a REF_DELTA base is already present is not covered by
tests.

Extend the tests slightly to cover this case.

[1] 8a30a1efd1 ("index-pack: prefetch missing REF_DELTA bases",
2019-05-15).
[2] https://public-inbox.org/git/396091fc-5572-19a5-4f18-61c258590dd5@gmail.com/

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 14:29:09 -07:00
Jonathan Tan
5718c53d0a t5616: use correct flag to check object is missing
If we want to check whether an object is missing, the correct flag to
pass to rev-list is --ignore-missing; --exclude-promisor-objects will
exclude any object that came from the promisor remote, whether it is
present or missing. Use the correct flag.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 14:29:08 -07:00
Derrick Stolee
b526d8cbbb t5319-multi-pack-index.sh: test batch size zero
The 'git multi-pack-index repack' command can take a batch size of
zero, which creates a new pack-file containing all objects in the
multi-pack-index. The first 'repack' command will create one new
pack-file, and an 'expire' command after that will delete the old
pack-files, as they no longer contain any referenced objects in the
multi-pack-index.

We must remove the .keep file that was added in the previous test
in order to expire that pack-file.

Also test that a 'repack' will do nothing if there is only one
pack-file.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:41 -07:00
Derrick Stolee
10bfa3f7f5 midx: add test that 'expire' respects .keep files
The 'git multi-pack-index expire' subcommand may delete packs that
are not needed from the perspective of the multi-pack-index. If
a pack has a .keep file, then we should not delete that pack. Add
a test that ensures we preserve a pack that would otherwise be
expired. First, create a new pack that contains every object in
the repo, then add it to the multi-pack-index. Then create a .keep
file for a pack starting with "a-pack" that was added in the
previous test. Finally, expire and verify that the pack remains
and the other packs were expired.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Derrick Stolee
d2743315d4 multi-pack-index: test expire while adding packs
During development of the multi-pack-index expire subcommand, a
version went out that improperly computed the pack order if a new
pack was introduced while other packs were being removed. Part of
the subtlety of the bug involved the new pack being placed before
other packs that already existed in the multi-pack-index.

Add a test to t5319-multi-pack-index.sh that catches this issue.
The test adds new packs that cause another pack to be expired, and
creates new packs that are lexicographically sorted before and
after the existing packs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Derrick Stolee
ce1e4a105b midx: implement midx_repack()
To repack with a non-zero batch-size, first sort all pack-files by
their modified time. Second, walk those pack-files from oldest
to newest, compute their expected size, and add the packs to a list
if they are smaller than the given batch-size. Stop when the total
expected size is at least the batch size.

If the batch size is zero, select all packs in the multi-pack-index.

Finally, collect the objects from the multi-pack-index that are in
the selected packs and send them to 'git pack-objects'. Write a new
multi-pack-index that includes the new pack.

Using a batch size of zero is very similar to a standard 'git repack'
command, except that we do not delete the old packs and instead rely
on the new multi-pack-index to prevent new processes from reading the
old packs. This does not disrupt other Git processes that are currently
reading the old packs based on the old multi-pack-index.

While first designing a 'git multi-pack-index repack' operation, I
started by collecting the batches based on the actual size of the
objects instead of the size of the pack-files. This allows repacking
a large pack-file that has very few referencd objects. However, this
came at a significant cost of parsing pack-files instead of simply
reading the multi-pack-index and getting the file information for
the pack-files. The "expected size" version provides similar
behavior, but could skip a pack-file if the average object size is
much larger than the actual size of the referenced objects, or
can create a large pack if the actual size of the referenced objects
is larger than the expected size.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00
Derrick Stolee
2af890bb28 multi-pack-index: prepare 'repack' subcommand
In an environment where the multi-pack-index is useful, it is due
to many pack-files and an inability to repack the object store
into a single pack-file. However, it is likely that many of these
pack-files are rather small, and could be repacked into a slightly
larger pack-file without too much effort. It may also be important
to ensure the object store is highly available and the repack
operation does not interrupt concurrent git commands.

Introduce a 'repack' subcommand to 'git multi-pack-index' that
takes a '--batch-size' option. The subcommand will inspect the
multi-pack-index for referenced pack-files whose size is smaller
than the batch size, until collecting a list of pack-files whose
sizes sum to larger than the batch size. Then, a new pack-file
will be created containing the objects from those pack-files that
are referenced by the multi-pack-index. The resulting pack is
likely to actually be smaller than the batch size due to
compression and the fact that there may be objects in the pack-
files that have duplicate copies in other pack-files.

The current change introduces the command-line arguments, and we
add a test that ensures we parse these options properly. Since
we specify a small batch size, we will guarantee that future
implementations do not change the list of pack-files.

In addition, we hard-code the modified times of the packs in
the pack directory to ensure the list of packs sorted by modified
time matches the order if sorted by size (ascending). This will
be important in a future test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-11 10:34:40 -07:00