Before writing a list of objects and their offsets to a multi-pack-index,
we need to collect the list of objects contained in the packfiles. There
may be multiple copies of some objects, so this list must be deduplicated.
It is possible to artificially get into a state where there are many
duplicate copies of objects. That can create high memory pressure if we
are to create a list of all objects before de-duplication. To reduce
this memory pressure without a significant performance drop,
automatically group objects by the first byte of their object id. Use
the IDX fanout tables to group the data, copy to a local array, then
sort.
Copy only the de-duplicated entries. Select the duplicate based on the
most-recent modified time of a packfile containing the object.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index needs to track which packfiles it indexes. Store
these in our first required chunk. Since filenames are not well
structured, add padding to keep good alignment in later chunks.
Modify the 'git multi-pack-index read' subcommand to output the
existence of the pack-file name chunk. Modify t5319-multi-pack-index.sh
to reflect this new output and the new expected number of chunks.
Defense in depth: A pattern we are using in the multi-pack-index feature
is to verify the data as we write it. We want to ensure we never write
invalid data to the multi-pack-index. There are many checks that verify
that the values we are writing fit the format definitions. This mainly
helps developers while working on the feature, but it can also identify
issues that only appear when dealing with very large data sets. These
large sets are hard to encode into test cases.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When constructing a multi-pack-index file for a given object directory,
read the files within the enclosed pack directory and find matches that
end with ".idx" and find the correct paired packfile using
add_packed_git().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of sharing the pack directory listing with the
multi-pack-index, generalize prepare_packed_git_one() into
for_each_file_in_pack_dir().
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we build the multi-pack-index file format, we want to test the format
on real repositories. Add tests that create repository data including
multiple packfiles with both version 1 and version 2 formats.
The current 'git multi-pack-index write' command will always write the
same file with no "real" data. This will be expanded in future commits,
along with the test expectations.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create a new multi_pack_index struct for loading multi-pack-indexes into
memory. Create a test-tool builtin for reading basic information about
that multi-pack-index to verify the correct data is written.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we begin writing the multi-pack-index format to disk, start with
the basics: the 12-byte header and the 20-byte checksum footer. Start
with these basics so we can add the rest of the format in small
increments.
As we implement the format, we will use a technique to check that our
computed offsets within the multi-pack-index file match what we are
actually writing. Each method that writes to the hashfile will return
the number of bytes written, and we will track that those values match
our expectations.
Currently, write_midx_header() returns 12, but is not checked. We will
check the return value in a later commit.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In anticipation of writing multi-pack-indexes, add a skeleton
'git multi-pack-index write' subcommand and send the options to a
write_midx_file() method. Also create a skeleton test script that
tests the 'write' subcommand.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new 'git multi-pack-index' builtin will be the plumbing access
for writing, reading, and checking multi-pack-index files. The
initial implementation is a no-op.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The multi-pack-index feature generalizes the existing pack-index
feature by indexing objects across multiple pack-files.
Describe the basic file format, using a 12-byte header followed by
a lookup table for a list of "chunks" which will be described later.
The file ends with a footer containing a checksum using the hash
algorithm.
The header allows later versions to create breaking changes by
advancing the version number. We can also change the hash algorithm
using a different version value.
We will add the individual chunk format information as we introduce
the code that writes that information.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running t7400 in a shell you observe more output than expected:
...
ok 8 - setup - hide init subdirectory
ok 9 - setup - repository to add submodules to
ok 10 - submodule add
[master (root-commit) d79ce16] one
Author: A U Thor <author@example.com>
1 file changed, 1 insertion(+)
create mode 100644 one.t
ok 11 - redirected submodule add does not show progress
ok 12 - redirected submodule add --progress does show progress
ok 13 - submodule add to .gitignored path fails
...
Fix the output by encapsulating the setup code in test_expect_success
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When testing a reworded root commit, ensure that the squash-onto commit
which is created and amended is still the root commit.
Suggested-by: Phillip Wood <phillip.wood@talktalk.net>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"make NO_ICONV=NoThanks" did not override NEEDS_LIBICONV
(i.e. linkage of -lintl, -liconv, etc. that are platform-specific
tweaks), which has been corrected.
* es/make-no-iconv:
Makefile: make NO_ICONV really mean "no iconv"
A regression to "rebase -i --root" introduced during this cycle has
been fixed.
* js/rebase-i-root-fix:
rebase --root: fix amending root commit messages
rebase --root: demonstrate a bug while amending root commit messages
The code to read compressed bitmap was not careful to avoid reading
past the end of the file, which has been corrected.
* jk/ewah-bounds-check:
ewah: adjust callers of ewah_read_mmap()
ewah_read_mmap: bounds-check mmap reads
Update the Korean translation and change the team leader to Gwan-gyeong
Mun.
Signed-off-by: Gwan-gyeong Mun <elongbug@gmail.com>
Signed-off-by: Changwoo Ryu <cwryu@debian.org>
Reviewed-by: Gwan-gyeong Mun <elongbug@gmail.com>
Newly added codepath in merge-recursive had potential buffer
overrun, which has been fixed.
* en/rename-directory-detection:
merge-recursive: use xstrdup() instead of fixed buffer
Make zlib inflate codepath more robust against versions of zlib
that clobber unused portion of outbuf.
* jl/zlib-restore-nul-termination:
packfile: correct zlib buffer handling
"git p4" updates.
* ld/git-p4-updates:
git-p4: auto-size the block
git-p4: narrow the scope of exceptions caught when parsing an int
git-p4: raise exceptions from p4CmdList based on error from p4 server
git-p4: better error reporting when p4 fails
git-p4: add option to disable syncing of p4/master with p4
git-p4: disable-rebase: allow setting this via configuration
git-p4: add options --commit and --disable-rebase
Paths can be longer than PATH_MAX. Avoid a buffer overrun in
check_dir_renamed() by using xstrdup() to make a private copy safely.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was not "newer versions of bash" but newer versions of
bash-completion that made commit 085e2ee0e6 (completion: load
completion file for external subcommand, 2018-04-29) both necessary
and possible.
Update the corresponding RelNotes entry accordingly.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Three tests in 't7406-submodule-update' contain broken &&-chains, but
since they are all in subshells, chain-lint couldn't notice them.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code path that triggered that "BUG" really does not want to run
without an explicit commit message. In the case where we want to amend a
commit message, we have an *implicit* commit message, though: the one of
the commit to amend. Therefore, this code path should not even be
entered.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When splitting a repository, running `git rebase -i --root` to reword
the initial commit, Git dies with
BUG: sequencer.c:795: root commit without message.
Signed-off-by: Todd Zullinger <tmz@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value of ewah_read_mmap() is now an ssize_t,
since we could (in theory) process up to 32GB of data. This
would never happen in practice, but a corrupt or malicious
.bitmap or index file could convince us to do so.
Let's make sure that we don't stuff the value into an int,
which would cause us to incorrectly move our pointer
forward. We'd always move too little, since negative values
are used for reporting errors. So the worst case is just
that we end up reporting a corrupt file, not an
out-of-bounds read.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The on-disk ewah format tells us how big the ewah data is,
and we blindly read that much from the buffer without
considering whether the mmap'd data is long enough, which
can lead to out-of-bound reads.
Let's make sure we have data available before reading it,
both for the ewah header/footer as well as for the bit data
itself. In particular:
- keep our ptr/len pair in sync as we move through the
buffer, and check it before each read
- check the size for integer overflow (this should be
impossible on 64-bit, as the size is given as a 32-bit
count of 8-byte words, but is possible on a 32-bit
system)
- return the number of bytes read as an ssize_t instead of
an int, again to prevent integer overflow
- compute the return value using a pointer difference;
this should yield the same result as the existing code,
but makes it more obvious that we got our computations
right
The included test is far from comprehensive, as it just
picks a static point at which to truncate the generated
bitmap. But in practice this will hit in the middle of an
ewah and make sure we're at least exercising this code.
Reported-by: Luat Nguyen <root@l4w.io>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Support for the --set-upstream option was removed in 52668846ea
(builtin/branch: stop supporting the "--set-upstream" option,
2017-08-17). The change did not completely remove the command
due to an issue noted in the commit's log message.
So, a test was added to ensure that a command which uses the
'--set-upstream' option fails instead of silently acting as an alias
for the '--set-upstream-to' option due to option parsing features.
To avoid confusion, clarify that the option is disabled intentionally
in the corresponding test description.
The test is expected to be around as long as we intentionally fail
on seeing the '--set-upstream' option which in turn we expect to
do for a period of time after which we can be sure that existing
users of '--set-upstream' are aware that the option is no
longer supported.
Signed-off-by: Kaartic Sivaraam <kaartic.sivaraam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>