Add a repository argument to allow the callers of lookup_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to free struct bitmap_index instances, and use it where
needed (except when rebuild_existing_bitmaps() is used, since it creates
references to the bitmaps within the struct bitmap_index passed to it).
Note that the hashes field in struct bitmap_index is not freed because
it points to another field within the same struct. The documentation for
that field has been updated to clarify that.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Remove the bitmap_git global variable. Instead, generate on demand an
instance of struct bitmap_index for code that needs to access it.
This allows us significant control over the lifetime of instances of
struct bitmap_index. In particular, packs can now be closed without
worrying if an unnecessarily long-lived "pack" field in struct
bitmap_index still points to it.
The bitmap API is also clearer in that we need to first obtain a struct
bitmap_index, then we use it.
This patch raises two potential issues: (1) memory for the struct
bitmap_index is allocated without being freed, and (2)
prepare_bitmap_git() and prepare_bitmap_walk() can reuse a previously
loaded bitmap. For (1), this will be dealt with in a subsequent patch in
this patch set that also deals with freeing the contents of the struct
bitmap_index (which were not freed previously, because they have global
scope). For (2), current bitmap users only load the bitmap once at most
(note that pack-objects can use bitmaps or write bitmaps, but not both
at the same time), so support for reuse has no effect - and future users
can pass around the struct bitmap_index * obtained if they need to do 2
or more things with the same bitmap.
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (42 commits)
merge-one-file: compute empty blob object ID
add--interactive: compute the empty tree value
Update shell scripts to compute empty tree object ID
sha1_file: only expose empty object constants through git_hash_algo
dir: use the_hash_algo for empty blob object ID
sequencer: use the_hash_algo for empty tree object ID
cache-tree: use is_empty_tree_oid
sha1_file: convert cached object code to struct object_id
builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
submodule: convert several uses of EMPTY_TREE_SHA1_HEX
sequencer: convert one use of EMPTY_TREE_SHA1_HEX
merge: convert empty tree constant to the_hash_algo
builtin/merge: switch tree functions to use object_id
builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
sha1-file: add functions for hex empty tree and blob OIDs
builtin/receive-pack: avoid hard-coded constants for push certs
diff: specify abbreviation size in terms of the_hash_algo
upload-pack: replace use of several hard-coded constants
...
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.
* js/use-bug-macro:
BUG_exit_code: fix sparse "symbol not declared" warning
Convert remaining die*(BUG) messages
Replace all die("BUG: ...") calls by BUG() ones
run-command: use BUG() to report bugs, not die()
test-tool: help verifying BUG() code paths
The only thing these commands need is extra parseopt flag which can be
passed in by OPT_SET_INT_F() and it is a bit more compact than full
struct initialization.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.
* nd/pack-objects-pack-struct:
ci: exercise the whole test suite with uncommon code in pack-objects
pack-objects: reorder members to shrink struct object_entry
pack-objects: shrink delta_size field in struct object_entry
pack-objects: shrink size field in struct object_entry
pack-objects: clarify the use of object_entry::size
pack-objects: don't check size when the object is bad
pack-objects: shrink z_delta_size field in struct object_entry
pack-objects: refer to delta objects by index instead of pointer
pack-objects: move in_pack out of struct object_entry
pack-objects: move in_pack_pos out of struct object_entry
pack-objects: use bitfield for object_entry::depth
pack-objects: use bitfield for object_entry::dfs_state
pack-objects: turn type and in_pack_type to bitfields
pack-objects: a bit of document about struct object_entry
read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).
* sb/oid-object-info:
cache.h: allow oid_object_info to handle arbitrary repositories
packfile: add repository argument to cache_or_unpack_entry
packfile: add repository argument to unpack_entry
packfile: add repository argument to read_object
packfile: add repository argument to packed_object_info
packfile: add repository argument to packed_to_object_type
packfile: add repository argument to retry_bad_packed_offset
cache.h: add repository argument to oid_object_info
cache.h: add repository argument to oid_object_info_extended
"git gc" in a large repository takes a lot of time as it considers
to repack all objects into one pack by default. The command has
been taught to pretend as if the largest existing packfile is
marked with ".keep" so that it is left untouched while objects in
other packs and loose ones are repacked.
* nd/repack-keep-pack:
pack-objects: show some progress when counting kept objects
gc --auto: exclude base pack if not enough mem to "repack -ad"
gc: handle a corner case in gc.bigPackThreshold
gc: add gc.bigPackThreshold config
gc: add --keep-largest-pack option
repack: add --keep-pack option
t7700: have closing quote of a test at the beginning of line
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow callers of register_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.
* ds/commit-graph:
commit-graph: implement "--append" option
commit-graph: build graph from starting commits
commit-graph: read only from specific pack-indexes
commit: integrate commit graph with commit parsing
commit-graph: close under reachability
commit-graph: add core.commitGraph setting
commit-graph: implement git commit-graph read
commit-graph: implement git-commit-graph write
commit-graph: implement write_commit_graph()
commit-graph: create git-commit-graph builtin
graph: add commit graph design document
commit-graph: add format document
csum-file: refactor finalize_hashfile() method
csum-file: rename hashclose() to finalize_hashfile()
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).
The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.
Let's just convert all remaining ones in one fell swoop.
This trick was performed by this invocation:
sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These options are added in [1] [2] [3]. All these depend on running
rev-list internally which is normally true since they are always used
with "--all --objects" which implies --revs. But let's keep this
dependency explicit.
While at there, add documentation for them. These are mostly used
internally by git-repack. But it's still good to not chase down the
right commit message to know how they work.
[1] ca11b212eb (let pack-objects do the writing of unreachable objects
as loose objects - 2008-05-14)
[2] 08cdfb1337 (pack-objects --keep-unreachable - 2007-09-16)
[3] e26a8c4721 (repack: extend --keep-unreachable to loose objects -
2016-06-13)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using hard-coded instances of the constant 20, use
the_hash_algo to look up the correct constant.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the various functions for freshening objects and
has_loose_object_nonlocal to use struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow callers of packed_object_info to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only show progress when there are new objects to be packed. But
when --keep-pack is specified on the base pack, we will exclude most
of objects. This makes 'pack-objects' stay silent for a long time
while the counting phase is going.
Let's show some progress whenever we visit an object instead. The old
"Counting objects" is renamed to "Enumerating objects" and a new
progress "Counting objects" line is added.
This new "Counting objects" line should progress pretty quick when the
system is beefy. But when the system is under pressure, the reading
object header done in this phase could be slow and showing progress is
an improvement over staying silent in the current code.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.
Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.
gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).
This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.
A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.
In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.
But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.
If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.
Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.
Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.
Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).
Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like
make test GIT_TEST_OE_SIZE=4
which forces this code on all objects larger than 3 bytes.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.
"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.
This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.
Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].
[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.
This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.
If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.
The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running
make test GIT_TEST_FULL_IN_PACK_ARRAY=1
to take advantage of other special case tests.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.
A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.
But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger
fatal: unable to get type of object ...
Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.
[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)
[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.
Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.
* sb/object-store: (27 commits)
sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
sha1_file: allow map_sha1_file to handle arbitrary repositories
sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
sha1_file: allow open_sha1_file to handle arbitrary repositories
sha1_file: allow stat_sha1_file to handle arbitrary repositories
sha1_file: allow sha1_file_name to handle arbitrary repositories
sha1_file: add repository argument to sha1_loose_object_info
sha1_file: add repository argument to map_sha1_file
sha1_file: add repository argument to map_sha1_file_1
sha1_file: add repository argument to open_sha1_file
sha1_file: add repository argument to stat_sha1_file
sha1_file: add repository argument to sha1_file_name
sha1_file: allow prepare_alt_odb to handle arbitrary repositories
sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
sha1_file: add repository argument to prepare_alt_odb
sha1_file: add repository argument to link_alt_odb_entries
sha1_file: add repository argument to read_info_alternates
sha1_file: add repository argument to link_alt_odb_entry
sha1_file: add raw_object_store argument to alt_odb_usable
pack: move approximate object count to object store
...
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (36 commits)
convert: convert to struct object_id
sha1_file: introduce a constant for max header length
Convert lookup_replace_object to struct object_id
sha1_file: convert read_sha1_file to struct object_id
sha1_file: convert read_object_with_reference to object_id
tree-walk: convert tree entry functions to object_id
streaming: convert istream internals to struct object_id
tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
builtin/notes: convert static functions to object_id
builtin/fmt-merge-msg: convert remaining code to object_id
sha1_file: convert sha1_object_info* to object_id
Convert remaining callers of sha1_object_info_extended to object_id
packfile: convert unpack_entry to struct object_id
sha1_file: convert retry_bad_packed_offset to struct object_id
sha1_file: convert assert_sha1_type to object_id
builtin/mktree: convert to struct object_id
streaming: convert open_istream to use struct object_id
sha1_file: convert check_sha1_signature to struct object_id
sha1_file: convert read_loose_object to use struct object_id
builtin/index-pack: convert struct ref_delta_entry to object_id
...
If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.
Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.
Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The reason callers have to call this is to make sure either packed_git
or packed_git_mru pointers are initialized since we don't do that by
default. Sometimes it's hard to see this connection between where the
function is called and where packed_git pointer is used (sometimes in
separate functions).
Keep this dependency internal because now all access to packed_git and
packed_git_mru must go through get_xxx() wrappers.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow prepare_packed_git callers to be
more specific about which repository to handle. See commit "sha1_file:
add repository argument to link_alt_odb_entry" for an explanation of
the #define trick.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.
[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file. Do the same for read_sha1_file_extended.
Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id. Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)
@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names. Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:
@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)
@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)
@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)
@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the "flags" is shared, it's a good idea to keep track of who
uses what bit. When we need to use more flags in library code, we can
be sure it won't be re-used for another purpose by some caller.
While at there, fix the location of "5" (should be in a different
column than "4" two lines down)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More abstraction of hash function from the codepath.
* bc/hash-algo:
hash: update obsolete reference to SHA1_HEADER
bulk-checkin: abstract SHA-1 usage
csum-file: abstract uses of SHA-1
csum-file: rename sha1file to hashfile
read-cache: abstract away uses of SHA-1
pack-write: switch various SHA-1 values to abstract forms
pack-check: convert various uses of SHA-1 to abstract forms
fast-import: switch various uses of SHA-1 to the_hash_algo
sha1_file: switch uses of SHA-1 to the_hash_algo
builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
builtin/index-pack: improve hash function abstraction
hash: create union for hash context allocation
hash: move SHA-1 macros to hash.h
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.
* gs/retire-mru:
mru: Replace mru.[ch] with list.h implementation
The first step to getting rid of mru API and using the
doubly-linked list API directly instead.
* ot/mru-on-list:
mru: use double-linked list from list.h
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.
* jh/fsck-promisors:
gc: do not repack promisor packfiles
rev-list: support termination at promisor objects
sha1_file: support lazily fetching missing objects
introduce fetch-object: fetch one promisor object
index-pack: refactor writing of .keep files
fsck: support promisor objects as CLI argument
fsck: support referenced promisor objects
fsck: support refs pointing to promisor objects
fsck: introduce partialclone extension
extension.partialclone: introduce partial clone extension
Rename struct sha1file to struct hashfile, along with all of its related
functions.
The transformation in this commit was made by global search-and-replace.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the definition and declaration of force_object_loose to
struct object_id and adjust usage of this function.
Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.
* jh/object-filtering:
rev-list: support --no-filter argument
list-objects-filter-options: support --no-filter
list-objects-filter-options: fix 'keword' typo in comment
pack-objects: add list-objects filtering
rev-list: add list-objects filtering support
list-objects: filter objects in traverse_commit_list
oidset: add iterator methods to oidset
oidmap: add oidmap iterator methods
dir: allow exclusions from blob in addition to file
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach pack-objects to use the filtering provided by the
traverse_commit_list_filtered() interface to omit unwanted
objects from the resulting packfile.
Filtering requires the use of the "--stdout" option.
Add t5317 test.
In the future, we will introduce a "partial clone" mechanism
wherein an object in a repo, obtained from a remote, may
reference a missing object that can be dynamically fetched from
that remote once needed. This "partial clone" mechanism will
have a way, sometimes slow, of determining if a missing link
is one of the links expected to be produced by this mechanism.
This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert peel_ref (and its corresponding backend) to struct object_id.
This transformation was done with an update to the declaration,
definition, comments, and test helper and the following semantic patch:
@@
expression E1, E2;
@@
- peel_ref(E1, E2.hash)
+ peel_ref(E1, &E2)
@@
expression E1, E2;
@@
- peel_ref(E1, E2->hash)
+ peel_ref(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one of the last unconverted callers to peel_ref. While we're
fixing that, convert the rest of the file, since it will need to be
converted at some point anyway.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert traverse_bitmap_commit_list and the callbacks it takes to use a
pointer to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:
mid = (min + max) / 2;
Instead, use the overflow-safe version:
mid = min + (max - min) / 2;
This translation is safe since the operation occurs inside a loop
conditioned on "min < max". The included changes were found using
the following git grep:
git grep '/ *2;' '*.c'
Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify mru.[ch] and related code by reusing the double-linked list
implementation from list.h instead of a custom one.
This commit is an intermediate step. Our final goal is to get rid of
mru.[ch] at all and inline all logic.
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored by: Jeff King <peff@peff.net>
Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The MRU cache that keeps track of recently used packs is represented
using two global variables:
struct mru packed_git_mru_storage;
struct mru *packed_git_mru = &packed_git_mru_storage;
Callers never assign to the packed_git_mru pointer, though, so we can
simplify by eliminating it and using &packed_git_mru_storage (renamed
to &packed_git_mru) directly. This variable is only used by the
packfile subsystem, making this a relatively uninvasive change (and
any new unadapted callers would trigger a compile error).
Noticed while moving these globals to the object_store struct.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When checking the conditional of "while (me->remaining)", we did not
hold the lock. Calling find_deltas would still be safe, since it checks
"remaining" (after taking the lock) and is able to handle all values. In
fact, this could (currently) not trigger any bug: a bug could happen if
`remaining` transitioning from zero to non-zero races with the evaluation
of the while-condition, but these are always separated by the
data_ready-mechanism.
Make sure we have the lock when we read `remaining`. This does mean we
release it just so that find_deltas can take it immediately again. We
could tweak the contract so that the lock should be taken before calling
find_deltas, but let's defer that until someone can actually show that
"unlock+lock" has a measurable negative impact.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If done_pbase_paths is NULL then done_pbase_paths_num must be zero and
done_pbase_path_pos() returns -1 without accessing the array, so the
check is not necessary.
If the invariant was violated then the check would make sure we keep
on going and allocate the necessary amount of memory in the next
ALLOC_GROW call. That sounds nice, but all array entries except for
one would contain garbage data.
If the invariant was violated without the check we'd get a segfault in
done_pbase_path_pos(), i.e. an observable crash, alerting us of the
presence of a bug.
Currently there is no such bug: Only the functions check_pbase_path()
and cleanup_preferred_base() change pointer and counter, and both make
sure to keep them in sync. Get rid of the check anyway to allow us to
see if later changes introduce such a defect, and to simplify the code.
Detected by Coverity Scan.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Simplify the code for moving members inside of an array and make it more
robust by using the helper macro MOVE_ARRAY. It calculates the size
based on the specified number of elements for us and supports NULL
pointers when that number is zero. Raw memmove(3) calls with NULL can
cause the compiler to (over-eagerly) optimize out later NULL checks.
This patch was generated with contrib/coccinelle/array.cocci and spatch
(Coccinelle).
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.
* ab/free-and-null:
*.[ch] refactoring: make use of the FREE_AND_NULL() macro
coccinelle: make use of the "expression" FREE_AND_NULL() rule
coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
coccinelle: make use of the "type" FREE_AND_NULL() rule
coccinelle: add a rule to make "type" code use FREE_AND_NULL()
git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.
* bw/config-h:
config: don't implicitly use gitdir or commondir
config: respect commondir
setup: teach discover_git_directory to respect the commondir
config: don't include config.h by default
config: remove git_config_iter
config: create config.h
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop including config.h by default in cache.h. Instead only include
config.h in those files which require use of the config system.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"pack-objects" can stream a slice of an existing packfile out when
the pack bitmap can tell that the reachable objects are all needed
in the output, without inspecting individual objects. This
strategy however would not work well when "--local" and other
options are in use, and need to be disabled.
* jk/disable-pack-reuse-when-broken:
t5310: fix "; do" style
pack-objects: disable pack reuse for object-selection options
The internal implementation of "git grep" has seen some clean-up.
* ab/grep-preparatory-cleanup: (31 commits)
grep: assert that threading is enabled when calling grep_{lock,unlock}
grep: given --threads with NO_PTHREADS=YesPlease, warn
pack-objects: fix buggy warning about threads
pack-objects & index-pack: add test for --threads warning
test-lib: add a PTHREADS prerequisite
grep: move is_fixed() earlier to avoid forward declaration
grep: change internal *pcre* variable & function names to be *pcre1*
grep: change the internal PCRE macro names to be PCRE1
grep: factor test for \0 in grep patterns into a function
grep: remove redundant regflags assignments
grep: catch a missing enum in switch statement
perf: add a comparison test of log --grep regex engines with -F
perf: add a comparison test of log --grep regex engines
perf: add a comparison test of grep regex engines with -F
perf: add a comparison test of grep regex engines
perf: emit progress output when unpacking & building
perf: add a GIT_PERF_MAKE_COMMAND for when *_MAKE_OPTS won't do
grep: add tests to fix blind spots with \0 patterns
grep: prepare for testing binary regexes containing rx metacharacters
grep: add a test helper function for less verbose -f \0 tests
...
"pack-objects" can stream a slice of an existing packfile out when
the pack bitmap can tell that the reachable objects are all needed
in the output, without inspecting individual objects. This
strategy however would not work well when "--local" and other
options are in use, and need to be disabled.
* jk/disable-pack-reuse-when-broken:
t5310: fix "; do" style
pack-objects: disable pack reuse for object-selection options
Conversion from uchar[20] to struct object_id continues.
* bc/object-id: (53 commits)
object: convert parse_object* to take struct object_id
tree: convert parse_tree_indirect to struct object_id
sequencer: convert do_recursive_merge to struct object_id
diff-lib: convert do_diff_cache to struct object_id
builtin/ls-tree: convert to struct object_id
merge: convert checkout_fast_forward to struct object_id
sequencer: convert fast_forward_to to struct object_id
builtin/ls-files: convert overlay_tree_on_cache to object_id
builtin/read-tree: convert to struct object_id
sha1_name: convert internals of peel_onion to object_id
upload-pack: convert remaining parse_object callers to object_id
revision: convert remaining parse_object callers to object_id
revision: rename add_pending_sha1 to add_pending_oid
http-push: convert process_ls_object and descendants to object_id
refs/files-backend: convert many internals to struct object_id
refs: convert struct ref_update to use struct object_id
ref-filter: convert some static functions to struct object_id
Convert struct ref_array_item to struct object_id
Convert the verify_pack callback to struct object_id
Convert lookup_tag to struct object_id
...
Fix a buggy warning about threads under NO_PTHREADS=YesPlease. Due to
re-using the delta_search_threads variable for both the state of the
"pack.threads" config & the --threads option, setting "pack.threads"
but not supplying --threads would trigger the warning for both
"pack.threads" & --threads.
Solve this bug by resetting the delta_search_threads variable in
git_pack_config(), it might then be set by --threads again and be
subsequently warned about, as the test I'm changing here asserts.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If certain options like --honor-pack-keep, --local, or
--incremental are used with pack-objects, then we need to
feed each potential object to want_object_in_pack() to see
if it should be filtered out. But when the bitmap
reuse_packfile optimization is in effect, we do not call
that function at all, and in fact skip adding the objects to
the to_pack list entirely. This means we have a bug: for
certain requests we will silently ignore those options and
include objects in that pack that should not be there.
The problem has been present since the inception of the
pack-reuse code in 6b8fda2db (pack-objects: use bitmaps when
packing objects, 2013-12-21), but it was unlikely to come up
in practice. These options are generally used for on-disk
packing, not transfer packs (which go to stdout), but we've
never allowed pack reuse for non-stdout packs (until
645c432d6, we did not even use bitmaps, which the reuse
optimization relies on; after that, we explicitly turned it
off when not packing to stdout).
We can fix this by just disabling the reuse_packfile
optimization when the options are in use. In theory we could
teach the pack-reuse code to satisfy these checks, but it's
not worth the complexity. The purpose of the optimization is
to keep the amount of per-object work we do to a minimum.
But these options inherently require us to search for other
copies of each object, drowning out any benefit of the
pack-reuse optimization. But note that the optimizations
from 56dfeb626 (pack-objects: compute local/ignore_pack_keep
early, 2016-07-29) happen before pack-reuse, meaning that
specifying "--honor-pack-keep" in a repository with no .keep
files can still follow the fast path.
There are tests in t5310 that check these options with
bitmaps and --stdout, but they didn't catch the bug, and
it's hard to adapt them to do so.
One problem is that they don't use --delta-base-offset;
without that option, we always disable the reuse
optimization entirely. It would be fine to add it in (it
actually makes the test more realistic), but that still
isn't quite enough.
The other problem is that the reuse code is very picky; it
only kicks in when it can reuse most of a pack, starting
from the first byte. So we'd have to start from a fully
repacked and bitmapped state to trigger it. But the tests
for these options use a much more subtle state; they want to
be sure that the want_object_in_pack() code is allowing some
objects but not others. Doing a full repack runs counter to
that.
So this patch adds new tests at the end of the script which
create the fully-packed state and make sure that each option
is not fooled by reusable pack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert lookup_tag to take a pointer to struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct pack_idx_entry to use struct object_id by changing the
definition and applying the following semantic patch, plus the standard
object_id transforms:
@@
struct pack_idx_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct pack_idx_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert register_shallow and unregister_shallow to take struct
object_id. register_shallow is a caller of lookup_commit, which we will
convert later. It doesn't make sense for the registration and
unregistration functions to have incompatible interfaces, so convert
them both.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).
So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.
By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.
As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from unsigned char [40] to struct object_id continues.
* bc/object-id:
Documentation: update and rename api-sha1-array.txt
Rename sha1_array to oid_array
Convert sha1_array_for_each_unique and for_each_abbrev to object_id
Convert sha1_array_lookup to take struct object_id
Convert remaining callers of sha1_array_lookup to object_id
Make sha1_array_append take a struct object_id *
sha1-array: convert internal storage for struct sha1_array to object_id
builtin/pull: convert to struct object_id
submodule: convert check_for_new_submodule_commits to object_id
sha1_name: convert disambiguate_hint_fn to take object_id
sha1_name: convert struct disambiguate_state to object_id
test-sha1-array: convert most code to struct object_id
parse-options-cb: convert sha1_array_append caller to struct object_id
fsck: convert init_skiplist to struct object_id
builtin/receive-pack: convert portions to struct object_id
builtin/pull: convert portions to struct object_id
builtin/diff: convert to struct object_id
Convert GIT_SHA1_RAWSZ used for allocation to GIT_MAX_RAWSZ
Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Define new hash-size constants for allocating memory
Since this structure handles an array of object IDs, rename it to struct
oid_array. Also rename the accessor functions and the initialization
constant.
This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:
perl -pi -E 's/struct sha1_array/struct oid_array/g'
perl -pi -E 's/\bsha1_array_/oid_array_/g'
perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert this function by changing the declaration and definition and
applying the following semantic patch to update the callers:
@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2.hash)
+ sha1_array_lookup(E1, &E2)
@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2->hash)
+ sha1_array_lookup(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a very small number of callers which don't already use struct
object_id. Convert them.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the callers to pass struct object_id by changing the function
declaration and definition and applying the following semantic patch:
@@
expression E1, E2;
@@
- sha1_array_append(E1, E2.hash)
+ sha1_array_append(E1, &E2)
@@
expression E1, E2;
@@
- sha1_array_append(E1, E2->hash)
+ sha1_array_append(E1, E2)
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* jk/fast-import-cleanup:
pack.h: define largest possible encoded object size
encode_in_pack_object_header: respect output buffer length
fast-import: use xsnprintf for formatting headers
fast-import: use xsnprintf for writing sha1s
Several callers use fixed buffers for storing the pack
object header, and they've picked 10 as a magic number. This
is reasonable, since it handles objects up to 2^67. But
let's give them a constant so it's clear that the number
isn't pulled out of thin air.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The encode_in_pack_object_header() writes a variable-length
header to an output buffer, but it doesn't actually know
long the buffer is. At first glance, this looks like it
might be possible to overflow.
In practice, this is probably impossible. The smallest
buffer we use is 10 bytes, which would hold the header for
an object up to 2^67 bytes. Obviously we're not likely to
see such an object, but we might worry that an object could
lie about its size (causing us to overflow before we realize
it does not actually have that many bytes). But the argument
is passed as a uintmax_t. Even on systems that have __int128
available, uintmax_t is typically restricted to 64-bit by
the ABI.
So it's unlikely that a system exists where this could be
exploited. Still, it's easy enough to use a normal out/len
pair and make sure we don't write too far. That protects the
hypothetical 128-bit system, makes it harder for callers to
accidentally specify a too-small buffer, and makes the
resulting code easier to audit.
Note that the one caller in fast-import tried to catch such
a case, but did so _after_ the call (at which point we'd
have already overflowed!). This check can now go away.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>