Commit Graph

342 Commits

Author SHA1 Message Date
Jeff King
2fa233a554 pack-objects: handle island check for "external" delta base
Two recent topics, jk/pack-delta-reuse-with-bitmap and
cc/delta-islands, can have a funny interaction. When
checking if we can reuse an on-disk delta, the first topic
allows base_entry to be NULL when we find an object that's
not in the packing list. But the latter topic introduces a
call to in_same_island(), which needs to look at
base_entry->idx.oid. When these two features are used
together, we might try to dereference a NULL base_entry.

In practice, this doesn't really happen. We'd generally only
use delta islands when packing to disk, since the whole
point is to optimize the pack for serving fetches later. And
the new delta-reuse code relies on having used reachability
bitmaps to determine the set of objects, which we would
typically only do when serving an actual fetch.

However, it is technically possible to combine these
features. And even without doing so, building with
"SANITIZE=address,undefined" will cause t5310.46 to
complain.  Even though that test does not have delta islands
enabled, we still take the address of the NULL entry to pass
to in_same_island(). That function then promptly returns
without dereferencing the value when it sees that islands
are not enabled, but it's enough to trigger a sanitizer
error.

The solution is straight-forward: when both features are
used together, we should pass the oid of the found base to
in_same_island().

This is tricky to do inside a single "if" statement. And
after the merge in f3504ea3dd (Merge branch
'cc/delta-islands', 2018-09-17), that "if" condition is
already getting pretty unwieldy. So this patch moves the
logic into a helper function, where we can easily use
multiple return paths. The result is a bit longer, but the
logic should be much easier to follow.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-19 12:03:35 -07:00
Junio C Hamano
769af0fd9e Merge branch 'jk/cocci'
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.

* jk/cocci:
  show_dirstat: simplify same-content check
  read-cache: use oideq() in ce_compare functions
  convert hashmap comparison functions to oideq()
  convert "hashcmp() != 0" to "!hasheq()"
  convert "oidcmp() != 0" to "!oideq()"
  convert "hashcmp() == 0" to hasheq()
  convert "oidcmp() == 0" to oideq()
  introduce hasheq() and oideq()
  coccinelle: use <...> for function exclusion
2018-09-17 13:53:57 -07:00
Junio C Hamano
f3504ea3dd Merge branch 'cc/delta-islands'
Lift code from GitHub to restrict delta computation so that an
object that exists in one fork is not made into a delta against
another object that does not appear in the same forked repository.

* cc/delta-islands:
  pack-objects: move 'layer' into 'struct packing_data'
  pack-objects: move tree_depth into 'struct packing_data'
  t5320: tests for delta islands
  repack: add delta-islands support
  pack-objects: add delta-islands support
  pack-objects: refactor code into compute_layer_order()
  Add delta-islands.{c,h}
2018-09-17 13:53:55 -07:00
Junio C Hamano
3ebdef2e1b Merge branch 'jk/pack-delta-reuse-with-bitmap'
When creating a thin pack, which allows objects to be made into a
delta against another object that is not in the resulting pack but
is known to be present on the receiving end, the code learned to
take advantage of the reachability bitmap; this allows the server
to send a delta against a base beyond the "boundary" commit.

* jk/pack-delta-reuse-with-bitmap:
  pack-objects: reuse on-disk deltas for thin "have" objects
  pack-bitmap: save "have" bitmap from walk
  t/perf: add perf tests for fetches from a bitmapped server
  t/perf: add infrastructure for measuring sizes
  t/perf: factor out percent calculations
  t/perf: factor boilerplate out of test_perf
2018-09-17 13:53:53 -07:00
Junio C Hamano
49f210fd52 Merge branch 'ds/multi-pack-index'
When there are too many packfiles in a repository (which is not
recommended), looking up an object in these would require
consulting many pack .idx files; a new mechanism to have a single
file that consolidates all of these .idx files is introduced.

* ds/multi-pack-index: (32 commits)
  pack-objects: consider packs in multi-pack-index
  midx: test a few commands that use get_all_packs
  treewide: use get_all_packs
  packfile: add all_packs list
  midx: fix bug that skips midx with alternates
  midx: stop reporting garbage
  midx: mark bad packed objects
  multi-pack-index: store local property
  multi-pack-index: provide more helpful usage info
  midx: clear midx on repack
  packfile: skip loading index if in multi-pack-index
  midx: prevent duplicate packfile loads
  midx: use midx in approximate_object_count
  midx: use existing midx when writing new one
  midx: use midx in abbreviation calculations
  midx: read objects from multi-pack-index
  config: create core.multiPackIndex setting
  midx: write object offsets
  midx: write object id fanout chunk
  midx: write object ids in a chunk
  ...
2018-09-17 13:53:50 -07:00
Torsten Bögershausen
d64324cb60 Make git_check_attr() a void function
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:15:34 -07:00
Jeff King
4a7e27e957 convert "oidcmp() == 0" to oideq()
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.

The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).

This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.

I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Junio C Hamano
29d9e3e2c4 Merge branch 'nd/pack-deltify-regression-fix'
In a recent update in 2.18 era, "git pack-objects" started
producing a larger than necessary packfiles by missing
opportunities to use large deltas.

* nd/pack-deltify-regression-fix:
  pack-objects: fix performance issues on packing large deltas
2018-08-22 11:17:05 -07:00
Jeff King
6a1e32d532 pack-objects: reuse on-disk deltas for thin "have" objects
When we serve a fetch, we pass the "wants" and "haves" from
the fetch negotiation to pack-objects. That tells us not
only which objects we need to send, but we also use the
boundary commits as "preferred bases": their trees and blobs
are candidates for delta bases, both for reusing on-disk
deltas and for finding new ones.

However, this misses some opportunities. Modulo some special
cases like shallow or partial clones, we know that every
object reachable from the "haves" could be a preferred base.
We don't use all of them for two reasons:

  1. It's expensive to traverse the whole history and
     enumerate all of the objects the other side has.

  2. The delta search is expensive, so we want to keep the
     number of candidate bases sane. The boundary commits
     are the most likely to work.

When we have reachability bitmaps, though, reason 1 no
longer applies. We can efficiently compute the set of
reachable objects on the other side (and in fact already did
so as part of the bitmap set-difference to get the list of
interesting objects). And using this set conveniently
covers the shallow and partial cases, since we have to
disable the use of bitmaps for those anyway.

The second reason argues against using these bases in the
search for new deltas. But there's one case where we can use
this information for free: when we have an existing on-disk
delta that we're considering reusing, we can do so if we
know the other side has the base object. This in fact saves
time during the delta search, because it's one less delta we
have to compute.

And that's exactly what this patch does: when we're
considering whether to reuse an on-disk delta, if bitmaps
tell us the other side has the object (and we're making a
thin-pack), then we reuse it.

Here are the results on p5311 using linux.git, which
simulates a client fetching after `N` days since their last
fetch:

 Test                         origin              HEAD
 --------------------------------------------------------------------------
 5311.3: server   (1 days)    0.27(0.27+0.04)     0.12(0.09+0.03) -55.6%
 5311.4: size     (1 days)               0.9M              237.0K -73.7%
 5311.5: client   (1 days)    0.04(0.05+0.00)     0.10(0.10+0.00) +150.0%
 5311.7: server   (2 days)    0.34(0.42+0.04)     0.13(0.10+0.03) -61.8%
 5311.8: size     (2 days)               1.5M              347.7K -76.5%
 5311.9: client   (2 days)    0.07(0.08+0.00)     0.16(0.15+0.01) +128.6%
 5311.11: server   (4 days)   0.56(0.77+0.08)     0.13(0.10+0.02) -76.8%
 5311.12: size     (4 days)              2.8M              566.6K -79.8%
 5311.13: client   (4 days)   0.13(0.15+0.00)     0.34(0.31+0.02) +161.5%
 5311.15: server   (8 days)   0.97(1.39+0.11)     0.30(0.25+0.05) -69.1%
 5311.16: size     (8 days)              4.3M                1.0M -76.0%
 5311.17: client   (8 days)   0.20(0.22+0.01)     0.53(0.52+0.01) +165.0%
 5311.19: server  (16 days)   1.52(2.51+0.12)     0.30(0.26+0.03) -80.3%
 5311.20: size    (16 days)              8.0M                2.0M -74.5%
 5311.21: client  (16 days)   0.40(0.47+0.03)     1.01(0.98+0.04) +152.5%
 5311.23: server  (32 days)   2.40(4.44+0.20)     0.31(0.26+0.04) -87.1%
 5311.24: size    (32 days)             14.1M                4.1M -70.9%
 5311.25: client  (32 days)   0.70(0.90+0.03)     1.81(1.75+0.06) +158.6%
 5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
 5311.28: size    (64 days)             89.4M               47.4M -47.0%
 5311.29: client  (64 days)   5.71(9.31+0.27)     15.20(15.20+0.32) +166.2%
 5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
 5311.32: size   (128 days)            134.8M              100.4M -25.5%
 5311.33: client (128 days)   9.42(16.86+0.49)    25.34(25.80+0.46) +169.0%

In all cases we save CPU time on the server (sometimes
significant) and the resulting pack is smaller. We do spend
more CPU time on the client side, because it has to
reconstruct more deltas. But that's the right tradeoff to
make, since clients tend to outnumber servers. It just means
the thin pack mechanism is doing its job.

From the user's perspective, the end-to-end time of the
operation will generally be faster. E.g., in the 128-day
case, we saved 15s on the server at a cost of 16s on the
client. Since the resulting pack is 34MB smaller, this is a
net win if the network speed is less than 270Mbit/s. And
that's actually the worst case. The 64-day case saves just
over 11s at a cost of just under 11s. So it's a slight win
at any network speed, and the 40MB saved is pure bonus. That
trend continues for the smaller fetches.

The implementation itself is mostly straightforward, with
the new logic going into check_object(). But there are two
tricky bits.

The first is that check_object() needs access to the
relevant information (the thin flag and bitmap result). We
can do this by pushing these into program-lifetime globals.

The second is that the rest of the code assumes that any
reused delta will point to another "struct object_entry" as
its base. But of course the case we are interested in here
is the one where don't have such an entry!

I looked at a number of options that didn't quite work:

 - we could use a flag to signal a reused delta, but it's
   not a single bit. We have to actually store the oid of
   the base, which is normally done by pointing to the
   existing object_entry. And we'd have to modify all the
   code which looks at deltas.

 - we could add the reused bases to the end of the existing
   object_entry array. While this does create some extra
   work as later stages consider the extra entries, it's
   actually not too bad (we're not sending them, so they
   don't cost much in the delta search, and at most we'd
   have 2*N of them).

   But there's a more subtle problem. Adding to the existing
   array means we might need to grow it with realloc, which
   could move the earlier entries around. While many of the
   references to other entries are done by integer index,
   some (including ones on the stack) use pointers, which
   would become invalidated.

   This isn't insurmountable, but it would require quite a
   bit of refactoring (and it's hard to know that you've got
   it all, since it may work _most_ of the time and then
   fail subtly based on memory allocation patterns).

 - we could allocate a new one-off entry for the base. In
   fact, this is what an earlier version of this patch did.
   However, since the refactoring brought in by ad635e82d6
   (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23),
   the delta_idx code requires that both entries be in the
   main packing list.

So taking all of those options into account, what I ended up
with is a separate list of "external bases" that are not
part of the main packing list. Each delta entry that points
to an external base has a single-bit flag to do so; we have a
little breathing room in the bitfield section of
object_entry.

This lets us limit the change primarily to the oe_delta()
and oe_set_delta_ext() functions. And as a bonus, most of
the rest of the code does not consider these dummy entries
at all, saving both runtime CPU and code complexity.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 12:45:49 -07:00
Derrick Stolee
6a22d52126 pack-objects: consider packs in multi-pack-index
When running 'git pack-objects --local', we want to avoid packing
objects that are in an alternate. Currently, we check for these
objects using the packed_git_mru list, which excludes the pack-files
covered by a multi-pack-index.

Add a new iteration over the multi-pack-indexes to find these
copies and mark them as unwanted.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Derrick Stolee
454ea2e4d7 treewide: use get_all_packs
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Junio C Hamano
dc0f6f9e1d Merge branch 'nd/no-the-index'
The more library-ish parts of the codebase learned to work on the
in-core index-state instance that is passed in by their callers,
instead of always working on the singleton "the_index" instance.

* nd/no-the-index: (24 commits)
  blame.c: remove implicit dependency on the_index
  apply.c: remove implicit dependency on the_index
  apply.c: make init_apply_state() take a struct repository
  apply.c: pass struct apply_state to more functions
  resolve-undo.c: use the right index instead of the_index
  archive-*.c: use the right repository
  archive.c: avoid access to the_index
  grep: use the right index instead of the_index
  attr: remove index from git_attr_set_direction()
  entry.c: use the right index instead of the_index
  submodule.c: use the right index instead of the_index
  pathspec.c: use the right index instead of the_index
  unpack-trees: avoid the_index in verify_absent()
  unpack-trees: convert clear_ce_flags* to avoid the_index
  unpack-trees: don't shadow global var the_index
  unpack-trees: add a note about path invalidation
  unpack-trees: remove 'extern' on function declaration
  ls-files: correct index argument to get_convert_attr_ascii()
  preload-index.c: use the right index instead of the_index
  dir.c: remove an implicit dependency on the_index in pathspec code
  ...
2018-08-20 11:33:53 -07:00
Junio C Hamano
8963bb0c2d Merge branch 'rs/parse-opt-lithelp'
The parse-options machinery learned to refrain from enclosing
placeholder string inside a "<bra" and "ket>" pair automatically
without PARSE_OPT_LITERAL_ARGHELP.  Existing help text for option
arguments that are not formatted correctly have been identified and
fixed.

* rs/parse-opt-lithelp:
  parse-options: automatically infer PARSE_OPT_LITERAL_ARGHELP
  shortlog: correct option help for -w
  send-pack: specify --force-with-lease argument help explicitly
  pack-objects: specify --index-version argument help explicitly
  difftool: remove angular brackets from argument help
  add, update-index: fix --chmod argument help
  push: use PARSE_OPT_LITERAL_ARGHELP instead of unbalanced brackets
2018-08-17 13:09:56 -07:00
Christian Couder
fe0ac2fb7f pack-objects: move 'layer' into 'struct packing_data'
This reduces the size of 'struct object_entry' from 88 bytes
to 80 and therefore makes packing objects more efficient.

For example on a Linux repo with 12M objects,
`git pack-objects --all` needs extra 96MB memory even if the
layer feature is not used.

Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16 10:56:44 -07:00
Christian Couder
108f530385 pack-objects: move tree_depth into 'struct packing_data'
This reduces the size of 'struct object_entry' and therefore
makes packing objects more efficient.

This also renames cmp_tree_depth() into tree_depth_compare(),
as it is more modern to have the name of the compare functions
end with "compare".

Helped-by: Jeff King <peff@peff.net>
Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16 10:56:44 -07:00
Jeff King
28b8a73080 pack-objects: add delta-islands support
Implement support for delta islands in git pack-objects
and document how delta islands work in
"Documentation/git-pack-objects.txt" and Documentation/config.txt.

This allows users to setup delta islands in their config and
get the benefit of less disk usage while cloning and fetching
is still quite fast and not much more CPU intensive.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16 10:51:17 -07:00
Christian Couder
f64ba53ee7 pack-objects: refactor code into compute_layer_order()
In a following commit, as we will use delta islands, we will
have to compute the write order for different layers, not just
for one.

Let's prepare for that by refactoring the code that will be
used to compute the write order for a given layer into a new
compute_layer_order() function.

This will make it easier to see and understand what the
following changes are doing.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-16 10:51:17 -07:00
Junio C Hamano
2e2c24f82a Merge branch 'nd/pack-objects-threading-doc'
Doc fix.

* nd/pack-objects-threading-doc:
  pack-objects: document about thread synchronization
2018-08-15 15:08:27 -07:00
Junio C Hamano
4bea8485e3 Merge branch 'nd/i18n'
Many more strings are prepared for l10n.

* nd/i18n: (23 commits)
  transport-helper.c: mark more strings for translation
  transport.c: mark more strings for translation
  sha1-file.c: mark more strings for translation
  sequencer.c: mark more strings for translation
  replace-object.c: mark more strings for translation
  refspec.c: mark more strings for translation
  refs.c: mark more strings for translation
  pkt-line.c: mark more strings for translation
  object.c: mark more strings for translation
  exec-cmd.c: mark more strings for translation
  environment.c: mark more strings for translation
  dir.c: mark more strings for translation
  convert.c: mark more strings for translation
  connect.c: mark more strings for translation
  config.c: mark more strings for translation
  commit-graph.c: mark more strings for translation
  builtin/replace.c: mark more strings for translation
  builtin/pack-objects.c: mark more strings for translation
  builtin/grep.c: mark strings for translation
  builtin/config.c: mark more strings for translation
  ...
2018-08-15 15:08:23 -07:00
Junio C Hamano
1689c22c1c Merge branch 'jk/core-use-replace-refs'
A new configuration variable core.usereplacerefs has been added,
primarily to help server installations that want to ignore the
replace mechanism altogether.

* jk/core-use-replace-refs:
  add core.usereplacerefs config option
  check_replace_refs: rename to read_replace_refs
  check_replace_refs: fix outdated comment
2018-08-15 15:08:23 -07:00
Nguyễn Thái Ngọc Duy
7a400a2c02 attr: remove an implicit dependency on the_index
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.

There is one ugly temporary workaround added in attr.c that needs some
more explanation.

Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.

The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
René Scharfe
5f0df44cd7 parse-options: automatically infer PARSE_OPT_LITERAL_ARGHELP
Parseopt wraps argument help strings in a pair of angular brackets by
default, to tell users that they need to replace it with an actual
value.  This is useful in most cases, because most option arguments
are indeed single values of a certain type.  The option
PARSE_OPT_LITERAL_ARGHELP needs to be used in option definitions with
arguments that have multiple parts or are literal strings.

Stop adding these angular brackets if special characters are present,
as they indicate that we don't deal with a simple placeholder.  This
simplifies the code a bit and makes defining special options slightly
easier.

Remove the flag PARSE_OPT_LITERAL_ARGHELP in the cases where the new
and more cautious handling suffices.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-03 08:36:20 -07:00
René Scharfe
cbd23de8bb pack-objects: specify --index-version argument help explicitly
Wrap both placeholders in the argument help string in angular brackets
to signal that users needs replace them with some actual value.  Use the
flag PARSE_OPT_LITERAL_ARGHELP to prevent parseopt from adding another
pair.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-03 08:36:20 -07:00
Junio C Hamano
3a2a1dc170 Merge branch 'sb/object-store-lookup'
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.

* sb/object-store-lookup: (32 commits)
  commit.c: allow lookup_commit_reference to handle arbitrary repositories
  commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
  tag.c: allow deref_tag to handle arbitrary repositories
  object.c: allow parse_object to handle arbitrary repositories
  object.c: allow parse_object_buffer to handle arbitrary repositories
  commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
  commit.c: allow set_commit_buffer to handle arbitrary repositories
  commit.c: migrate the commit buffer to the parsed object store
  commit-slabs: remove realloc counter outside of slab struct
  commit.c: allow parse_commit_buffer to handle arbitrary repositories
  tag: allow parse_tag_buffer to handle arbitrary repositories
  tag: allow lookup_tag to handle arbitrary repositories
  commit: allow lookup_commit to handle arbitrary repositories
  tree: allow lookup_tree to handle arbitrary repositories
  blob: allow lookup_blob to handle arbitrary repositories
  object: allow lookup_object to handle arbitrary repositories
  object: allow object_as_type to handle arbitrary repositories
  tag: add repository argument to deref_tag
  tag: add repository argument to parse_tag_buffer
  tag: add repository argument to lookup_tag
  ...
2018-08-02 15:30:42 -07:00
Nguyễn Thái Ngọc Duy
ffbd51cc60 pack-objects: document about thread synchronization
These extra comments should be make it easier to understand how to use
locks in pack-objects delta search code. For reference, see

8ecce684a3 (basic threaded delta search - 2007-09-06)
384b32c09b (pack-objects: fix threaded load balancing - 2007-12-08)
50f22ada52 (threaded pack-objects: Use condition... - 2007-12-16)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-30 11:28:30 -07:00
Nguyễn Thái Ngọc Duy
f616db6a5c builtin/pack-objects.c: mark more strings for translation
Most of these are straight forward. GETTEXT_POISON does catch the last
string in cmd_pack_objects(), but since this is --progress output, it's
not supposed to be machine-readable.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:09 -07:00
Nguyễn Thái Ngọc Duy
1a07e59c3e Update messages in preparation for i18n
Many messages will be marked for translation in the following
commits. This commit updates some of them to be more consistent and
reduce diff noise in those commits. Changes are

- keep the first letter of die(), error() and warning() in lowercase
- no full stop in die(), error() or warning() if it's single sentence
  messages
- indentation
- some messages are turned to BUG(), or prefixed with "BUG:" and will
  not be marked for i18n
- some messages are improved to give more information
- some messages are broken down by sentence to be i18n friendly
  (on the same token, combine multiple warning() into one big string)
- the trailing \n is converted to printf_ln if possible, or deleted
  if not redundant
- errno_errno() is used instead of explicit strerror()

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 11:19:09 -07:00
Nguyễn Thái Ngọc Duy
9ac3f0e5b3 pack-objects: fix performance issues on packing large deltas
Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

    Version  Pack (MB)  MaxRSS(kB)  Time (s)
    -------  ---------  ----------  --------
     2.17.0     5498     43513628    2494.85
     2.18.0    10531     40449596    4168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
  existing pack files as much

- better in terms of pack size because the pack heuristics is back to
  2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.

The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.

Reported-by: Elijah Newren <newren@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 10:21:29 -07:00
Jeff King
6ebd1cafe2 check_replace_refs: rename to read_replace_refs
This was added as a NEEDSWORK in c3c36d7de2 (replace-object:
check_replace_refs is safe in multi repo environment, 2018-04-11),
waiting for a calmer period. Since doing so now doesn't conflict
with anything in 'pu', it seems as good a time as any.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18 15:45:14 -07:00
Junio C Hamano
ad7b8a7c5a Merge branch 'jt/remove-pack-bitmap-global'
The effort to move globals to per-repository in-core structure
continues.

* jt/remove-pack-bitmap-global:
  pack-bitmap: add free function
  pack-bitmap: remove bitmap_git global variable
2018-07-18 12:20:30 -07:00
Junio C Hamano
00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Stefan Beller
ce71efb713 tag: add repository argument to lookup_tag
Add a repository argument to allow the callers of lookup_tag
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:39 -07:00
Junio C Hamano
b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Jonathan Tan
f3c23db2d7 pack-bitmap: add free function
Add a function to free struct bitmap_index instances, and use it where
needed (except when rebuild_existing_bitmaps() is used, since it creates
references to the bitmaps within the struct bitmap_index passed to it).

Note that the hashes field in struct bitmap_index is not freed because
it points to another field within the same struct. The documentation for
that field has been updated to clarify that.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-21 12:22:48 -07:00
Jonathan Tan
3ae5fa0768 pack-bitmap: remove bitmap_git global variable
Remove the bitmap_git global variable. Instead, generate on demand an
instance of struct bitmap_index for code that needs to access it.

This allows us significant control over the lifetime of instances of
struct bitmap_index. In particular, packs can now be closed without
worrying if an unnecessarily long-lived "pack" field in struct
bitmap_index still points to it.

The bitmap API is also clearer in that we need to first obtain a struct
bitmap_index, then we use it.

This patch raises two potential issues: (1) memory for the struct
bitmap_index is allocated without being freed, and (2)
prepare_bitmap_git() and prepare_bitmap_walk() can reuse a previously
loaded bitmap. For (1), this will be dealt with in a subsequent patch in
this patch set that also deals with freeing the contents of the struct
bitmap_index (which were not freed previously, because they have global
scope). For (2), current bitmap users only load the bitmap once at most
(note that pack-objects can use bitmaps or write bitmaps, but not both
at the same time), so support for reuse has no effect - and future users
can pass around the struct bitmap_index * obtained if they need to do 2
or more things with the same bitmap.

Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-21 11:17:39 -07:00
Junio C Hamano
e1149fd7d9 Merge branch 'nd/use-opt-int-set-f'
Code simplification.

* nd/use-opt-int-set-f:
  Use OPT_SET_INT_F() for cmdline option specification
2018-06-01 15:06:38 +09:00
Junio C Hamano
42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano
50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Nguyễn Thái Ngọc Duy
3e4a67b47d Use OPT_SET_INT_F() for cmdline option specification
The only thing these commands need is extra parseopt flag which can be
passed in by OPT_SET_INT_F() and it is a bit more compact than full
struct initialization.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-24 16:12:29 +09:00
Junio C Hamano
5a97e7be88 Merge branch 'nd/pack-unreachable-objects-doc'
Doc update.

* nd/pack-unreachable-objects-doc:
  pack-objects: validation and documentation about unreachable options
2018-05-23 14:38:24 +09:00
Junio C Hamano
ad635e82d6 Merge branch 'nd/pack-objects-pack-struct'
"git pack-objects" needs to allocate tons of "struct object_entry"
while doing its work, and shrinking its size helps the performance
quite a bit.

* nd/pack-objects-pack-struct:
  ci: exercise the whole test suite with uncommon code in pack-objects
  pack-objects: reorder members to shrink struct object_entry
  pack-objects: shrink delta_size field in struct object_entry
  pack-objects: shrink size field in struct object_entry
  pack-objects: clarify the use of object_entry::size
  pack-objects: don't check size when the object is bad
  pack-objects: shrink z_delta_size field in struct object_entry
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: move in_pack out of struct object_entry
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: use bitfield for object_entry::depth
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: a bit of document about struct object_entry
  read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
2018-05-23 14:38:19 +09:00
Junio C Hamano
fcb6df3254 Merge branch 'sb/oid-object-info'
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).

* sb/oid-object-info:
  cache.h: allow oid_object_info to handle arbitrary repositories
  packfile: add repository argument to cache_or_unpack_entry
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to read_object
  packfile: add repository argument to packed_object_info
  packfile: add repository argument to packed_to_object_type
  packfile: add repository argument to retry_bad_packed_offset
  cache.h: add repository argument to oid_object_info
  cache.h: add repository argument to oid_object_info_extended
2018-05-23 14:38:16 +09:00
Junio C Hamano
30b015bffe Merge branch 'nd/repack-keep-pack'
"git gc" in a large repository takes a lot of time as it considers
to repack all objects into one pack by default.  The command has
been taught to pretend as if the largest existing packfile is
marked with ".keep" so that it is left untouched while objects in
other packs and loose ones are repacked.

* nd/repack-keep-pack:
  pack-objects: show some progress when counting kept objects
  gc --auto: exclude base pack if not enough mem to "repack -ad"
  gc: handle a corner case in gc.bigPackThreshold
  gc: add gc.bigPackThreshold config
  gc: add --keep-largest-pack option
  repack: add --keep-pack option
  t7700: have closing quote of a test at the beginning of line
2018-05-23 14:38:14 +09:00
Stefan Beller
c88134870e shallow: add repository argument to is_repository_shallow
Add a repository argument to allow callers of is_repository_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller
19143f139d shallow: add repository argument to register_shallow
Add a repository argument to allow callers of register_shallow
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Junio C Hamano
b10edb2df5 Merge branch 'ds/commit-graph'
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.

* ds/commit-graph:
  commit-graph: implement "--append" option
  commit-graph: build graph from starting commits
  commit-graph: read only from specific pack-indexes
  commit: integrate commit graph with commit parsing
  commit-graph: close under reachability
  commit-graph: add core.commitGraph setting
  commit-graph: implement git commit-graph read
  commit-graph: implement git-commit-graph write
  commit-graph: implement write_commit_graph()
  commit-graph: create git-commit-graph builtin
  graph: add commit graph design document
  commit-graph: add format document
  csum-file: refactor finalize_hashfile() method
  csum-file: rename hashclose() to finalize_hashfile()
2018-05-08 15:59:20 +09:00
Johannes Schindelin
033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
Nguyễn Thái Ngọc Duy
58bd77b66a pack-objects: validation and documentation about unreachable options
These options are added in [1] [2] [3]. All these depend on running
rev-list internally which is normally true since they are always used
with "--all --objects" which implies --revs. But let's keep this
dependency explicit.

While at there, add documentation for them. These are mostly used
internally by git-repack. But it's still good to not chase down the
right commit message to know how they work.

[1] ca11b212eb (let pack-objects do the writing of unreachable objects
    as loose objects - 2008-05-14)
[2] 08cdfb1337 (pack-objects --keep-unreachable - 2007-09-16)
[3] e26a8c4721 (repack: extend --keep-unreachable to loose objects -
    2016-06-13)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 18:49:32 +09:00
brian m. carlson
411791009b pack-objects: abstract away hash algorithm
Instead of using hard-coded instances of the constant 20, use
the_hash_algo to look up the correct constant.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:50 +09:00
brian m. carlson
6862ebbfcb sha1-file: convert freshen functions to object_id
Convert the various functions for freshening objects and
has_loose_object_nonlocal to use struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:49 +09:00
Jonathan Nieder
720aaa1a74 packfile: add repository argument to packed_object_info
Add a repository argument to allow callers of packed_object_info to be
more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Stefan Beller
0df8e96566 cache.h: add repository argument to oid_object_info
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Nguyễn Thái Ngọc Duy
5af050437a pack-objects: show some progress when counting kept objects
We only show progress when there are new objects to be packed. But
when --keep-pack is specified on the base pack, we will exclude most
of objects. This makes 'pack-objects' stay silent for a long time
while the counting phase is going.

Let's show some progress whenever we visit an object instead. The old
"Counting objects" is renamed to "Enumerating objects" and a new
progress "Counting objects" line is added.

This new "Counting objects" line should progress pretty quick when the
system is beefy. But when the system is under pressure, the reading
object header done in this phase could be slow and showing progress is
an improvement over staying silent in the current code.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy
9806f5a7bf gc --auto: exclude base pack if not enough mem to "repack -ad"
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy
ed7e5fc3a2 repack: add --keep-pack option
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.

A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.

In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.

But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy
0aca34e826 pack-objects: shrink delta_size field in struct object_entry
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.

If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.

Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:59 +09:00
Nguyễn Thái Ngọc Duy
ac77d0c370 pack-objects: shrink size field in struct object_entry
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.

Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.

Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).

Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like

    make test GIT_TEST_OE_SIZE=4

which forces this code on all objects larger than 3 bytes.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:59 +09:00
Nguyễn Thái Ngọc Duy
27a7d0679f pack-objects: clarify the use of object_entry::size
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.

"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:59 +09:00
Nguyễn Thái Ngọc Duy
660b373542 pack-objects: don't check size when the object is bad
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.

This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
0cb3c1427a pack-objects: shrink z_delta_size field in struct object_entry
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
898eba5e63 pack-objects: refer to delta objects by index instead of pointer
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.

Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].

[1] This means we can only index 2^32-2 objects even though nr_objects
    could contain 2^32-1 objects. It should not be a problem in
    practice because when we grow objects[], nr_alloc would probably
    blow up long before nr_objects hits the wall.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
43fa44fa3b pack-objects: move in_pack out of struct object_entry
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.

If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.

The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running

    make test GIT_TEST_FULL_IN_PACK_ARRAY=1

to take advantage of other special case tests.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
06af3bba41 pack-objects: move in_pack_pos out of struct object_entry
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
b5c0cbd808 pack-objects: use bitfield for object_entry::depth
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
0c6804ab4e pack-objects: use bitfield for object_entry::dfs_state
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Nguyễn Thái Ngọc Duy
fd9b1baef8 pack-objects: turn type and in_pack_type to bitfields
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.

A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.

But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger

    fatal: unable to get type of object ...

Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.

[1] See 5c49c11686 (pack-objects: better check_object() performances -
    2007-04-16)

[2] 21666f1aae (convert object type handling from a string to a number
    - 2007-02-26)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
Junio C Hamano
3a1ec60c43 Merge branch 'sb/packfiles-in-repository'
Refactoring of the internal global data structure continues.

* sb/packfiles-in-repository:
  packfile: keep prepare_packed_git() private
  packfile: allow find_pack_entry to handle arbitrary repositories
  packfile: add repository argument to find_pack_entry
  packfile: allow reprepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_one to handle arbitrary repositories
  packfile: add repository argument to reprepare_packed_git
  packfile: add repository argument to prepare_packed_git
  packfile: add repository argument to prepare_packed_git_one
  packfile: allow install_packed_git to handle arbitrary repositories
  packfile: allow rearrange_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-11 13:09:55 +09:00
Junio C Hamano
cf0b1793ea Merge branch 'sb/object-store'
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.

Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.

* sb/object-store: (27 commits)
  sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: add repository argument to map_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to sha1_file_name
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add raw_object_store argument to alt_odb_usable
  pack: move approximate object count to object store
  ...
2018-04-11 13:09:55 +09:00
Junio C Hamano
a5bbc29994 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (36 commits)
  convert: convert to struct object_id
  sha1_file: introduce a constant for max header length
  Convert lookup_replace_object to struct object_id
  sha1_file: convert read_sha1_file to struct object_id
  sha1_file: convert read_object_with_reference to object_id
  tree-walk: convert tree entry functions to object_id
  streaming: convert istream internals to struct object_id
  tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
  builtin/notes: convert static functions to object_id
  builtin/fmt-merge-msg: convert remaining code to object_id
  sha1_file: convert sha1_object_info* to object_id
  Convert remaining callers of sha1_object_info_extended to object_id
  packfile: convert unpack_entry to struct object_id
  sha1_file: convert retry_bad_packed_offset to struct object_id
  sha1_file: convert assert_sha1_type to object_id
  builtin/mktree: convert to struct object_id
  streaming: convert open_istream to use struct object_id
  sha1_file: convert check_sha1_signature to struct object_id
  sha1_file: convert read_loose_object to use struct object_id
  builtin/index-pack: convert struct ref_delta_entry to object_id
  ...
2018-04-10 08:25:45 +09:00
Derrick Stolee
cfe83216e4 csum-file: refactor finalize_hashfile() method
If we want to use a hashfile on the temporary file for a lockfile, then
we need finalize_hashfile() to fully write the trailing hash but also keep
the file descriptor open.

Do this by adding a new CSUM_HASH_IN_STREAM flag along with a functional
change that checks this flag before writing the checksum to the stream.
This differs from previous behavior since it would be written if either
CSUM_CLOSE or CSUM_FSYNC is provided.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:30 -07:00
Derrick Stolee
f2af9f5e02 csum-file: rename hashclose() to finalize_hashfile()
The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:30 -07:00
Nguyễn Thái Ngọc Duy
464416a2ea packfile: keep prepare_packed_git() private
The reason callers have to call this is to make sure either packed_git
or packed_git_mru pointers are initialized since we don't do that by
default. Sometimes it's hard to see this connection between where the
function is called and where packed_git pointer is used (sometimes in
separate functions).

Keep this dependency internal because now all access to packed_git and
packed_git_mru must go through get_xxx() wrappers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Stefan Beller
6fdb4e9f5a packfile: add repository argument to prepare_packed_git
Add a repository argument to allow prepare_packed_git callers to be
more specific about which repository to handle. See commit "sha1_file:
add repository argument to link_alt_odb_entry" for an explanation of
the #define trick.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Stefan Beller
a80d72db2a object-store: move packed_git and packed_git_mru to object store
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.

[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:46 -07:00
Junio C Hamano
99321e327b Merge branch 'nd/object-allocation-comments'
Code doc update.

* nd/object-allocation-comments:
  object.h: realign object flag allocation comment
  object.h: update flag allocation comment
2018-03-14 12:01:06 -07:00
brian m. carlson
b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
02f0547eaa sha1_file: convert read_object_with_reference to object_id
Convert read_object_with_reference to take pointers to struct object_id.
Update the internals of the function accordingly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson
abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson
ef7b5195f1 streaming: convert open_istream to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
Junio C Hamano
169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Nguyễn Thái Ngọc Duy
95308d64ce object.h: update flag allocation comment
Since the "flags" is shared, it's a good idea to keep track of who
uses what bit. When we need to use more flags in library code, we can
be sure it won't be re-used for another purpose by some caller.

While at there, fix the location of "5" (should be in a different
column than "4" two lines down)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06 11:41:21 -08:00
Junio C Hamano
0fd90daba8 Merge branch 'bc/hash-algo'
More abstraction of hash function from the codepath.

* bc/hash-algo:
  hash: update obsolete reference to SHA1_HEADER
  bulk-checkin: abstract SHA-1 usage
  csum-file: abstract uses of SHA-1
  csum-file: rename sha1file to hashfile
  read-cache: abstract away uses of SHA-1
  pack-write: switch various SHA-1 values to abstract forms
  pack-check: convert various uses of SHA-1 to abstract forms
  fast-import: switch various uses of SHA-1 to the_hash_algo
  sha1_file: switch uses of SHA-1 to the_hash_algo
  builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
  builtin/index-pack: improve hash function abstraction
  hash: create union for hash context allocation
  hash: move SHA-1 macros to hash.h
2018-02-15 14:55:47 -08:00
Junio C Hamano
8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Brandon Williams
095b3b2c04 pack-objects: rename 'this' variables
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano
867622398f Merge branch 'gs/retire-mru'
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.

* gs/retire-mru:
  mru: Replace mru.[ch] with list.h implementation
2018-02-13 13:39:06 -08:00
Junio C Hamano
afc8aa3fbf Merge branch 'ot/mru-on-list'
The first step to getting rid of mru API and using the
doubly-linked list API directly instead.

* ot/mru-on-list:
  mru: use double-linked list from list.h
2018-02-13 13:39:05 -08:00
Junio C Hamano
f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
brian m. carlson
98a3beab6a csum-file: rename sha1file to hashfile
Rename struct sha1file to struct hashfile, along with all of its related
functions.

The transformation in this commit was made by global search-and-replace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Patryk Obara
4bdb70a4f7 sha1_file: convert force_object_loose to object_id
Convert the definition and declaration of force_object_loose to
struct object_id and adjust usage of this function.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Gargi Sharma
ec2dd32c70 mru: Replace mru.[ch] with list.h implementation
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.

Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-24 09:52:16 -08:00
Junio C Hamano
61061abba7 Merge branch 'jh/object-filtering'
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.

* jh/object-filtering:
  rev-list: support --no-filter argument
  list-objects-filter-options: support --no-filter
  list-objects-filter-options: fix 'keword' typo in comment
  pack-objects: add list-objects filtering
  rev-list: add list-objects filtering support
  list-objects: filter objects in traverse_commit_list
  oidset: add iterator methods to oidset
  oidmap: add oidmap iterator methods
  dir: allow exclusions from blob in addition to file
2017-12-27 11:16:21 -08:00
Jonathan Tan
0c16cd499d gc: do not repack promisor packfiles
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jeff Hostetler
9535ce7337 pack-objects: add list-objects filtering
Teach pack-objects to use the filtering provided by the
traverse_commit_list_filtered() interface to omit unwanted
objects from the resulting packfile.

Filtering requires the use of the "--stdout" option.

Add t5317 test.

In the future, we will introduce a "partial clone" mechanism
wherein an object in a repo, obtained from a remote, may
reference a missing object that can be dynamically fetched from
that remote once needed.  This "partial clone" mechanism will
have a way, sometimes slow, of determining if a missing link
is one of the links expected to be produced by this mechanism.

This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:57 +09:00
brian m. carlson
b420d90980 refs: convert peel_ref to struct object_id
Convert peel_ref (and its corresponding backend) to struct object_id.

This transformation was done with an update to the declaration,
definition, comments, and test helper and the following semantic patch:

@@
expression E1, E2;
@@
- peel_ref(E1, E2.hash)
+ peel_ref(E1, &E2)

@@
expression E1, E2;
@@
- peel_ref(E1, E2->hash)
+ peel_ref(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
brian m. carlson
188960b4d6 builtin/pack-objects: convert to struct object_id
This is one of the last unconverted callers to peel_ref.  While we're
fixing that, convert the rest of the file, since it will need to be
converted at some point anyway.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
brian m. carlson
206649672e pack-bitmap: convert traverse_bitmap_commit_list to object_id
Convert traverse_bitmap_commit_list and the callbacks it takes to use a
pointer to struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
Derrick Stolee
19716b21a4 cleanup: fix possible overflow errors in binary search
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:

	mid = (min + max) / 2;

Instead, use the overflow-safe version:

	mid = min + (max - min) / 2;

This translation is safe since the operation occurs inside a loop
conditioned on "min < max". The included changes were found using
the following git grep:

	git grep '/ *2;' '*.c'

Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:57:24 +09:00
Olga Telezhnaya
8865859dfc mru: use double-linked list from list.h
Simplify mru.[ch] and related code by reusing the double-linked list
implementation from list.h instead of a custom one.
This commit is an intermediate step. Our final goal is to get rid of
mru.[ch] at all and inline all logic.

Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored by: Jeff King <peff@peff.net>
Signed-off-by: Olga Telezhnaia <olyatelezhnaya@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-01 17:30:26 +09:00
Junio C Hamano
14a8168e2f Merge branch 'rj/no-sign-compare'
Many codepaths have been updated to squelch -Wsign-compare
warnings.

* rj/no-sign-compare:
  ALLOC_GROW: avoid -Wsign-compare warnings
  cache.h: hex2chr() - avoid -Wsign-compare warnings
  commit-slab.h: avoid -Wsign-compare warnings
  git-compat-util.h: xsize_t() - avoid -Wsign-compare warnings
2017-09-29 11:23:42 +09:00
Ramsay Jones
071bcaab64 ALLOC_GROW: avoid -Wsign-compare warnings
Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 13:21:11 +09:00