Commit Graph

76 Commits

Author SHA1 Message Date
Junio C Hamano
61061abba7 Merge branch 'jh/object-filtering'
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.

* jh/object-filtering:
  rev-list: support --no-filter argument
  list-objects-filter-options: support --no-filter
  list-objects-filter-options: fix 'keword' typo in comment
  pack-objects: add list-objects filtering
  rev-list: add list-objects filtering support
  list-objects: filter objects in traverse_commit_list
  oidset: add iterator methods to oidset
  oidmap: add oidmap iterator methods
  dir: allow exclusions from blob in addition to file
2017-12-27 11:16:21 -08:00
Jeff Hostetler
f4371a883f rev-list: support --no-filter argument
Teach rev-list to support --no-filter to override a
previous --filter=<filter_spec> argument.  This is
to be consistent with commands that use OPT_PARSE
macros.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:44:37 -08:00
Jeff Hostetler
caf3827e2f rev-list: add list-objects filtering support
Teach rev-list to use the filtering provided by the
traverse_commit_list_filtered() interface to omit
unwanted objects from the result.

Object filtering is only allowed when one of the "--objects*"
options are used.

When the "--filter-print-omitted" option is used, the omitted
objects are printed at the end.  These are marked with a "~".
This option can be combined with "--quiet" to get a list of
just the omitted objects.

Add t6112 test.

In the future, we will introduce a "partial clone" mechanism
wherein an object in a repo, obtained from a remote, may
reference a missing object that can be dynamically fetched from
that remote once needed.  This "partial clone" mechanism will
have a way, sometimes slow, of determining if a missing link
is one of the links expected to be produced by this mechanism.

This patch introduces handling of missing objects to help
debugging and development of the "partial clone" mechanism,
and once the mechanism is implemented, for a power user to
perform operations that are missing-object aware without
incurring the cost of checking if a missing link is expected.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:57 +09:00
Junio C Hamano
f116163171 Merge branch 'ma/bisect-leakfix'
Leak fixes.

* ma/bisect-leakfix:
  bisect: fix memory leak when returning best element
  bisect: fix off-by-one error in `best_bisection_sorted()`
  bisect: fix memory leak in `find_bisection()`
  bisect: change calling-convention of `find_bisection()`
2017-11-15 12:14:28 +09:00
Junio C Hamano
8cc633286a Merge branch 'bw/diff-opt-impl-to-bitfields'
A single-word "unsigned flags" in the diff options is being split
into a structure with many bitfields.

* bw/diff-opt-impl-to-bitfields:
  diff: make struct diff_flags members lowercase
  diff: remove DIFF_OPT_CLR macro
  diff: remove DIFF_OPT_SET macro
  diff: remove DIFF_OPT_TST macro
  diff: remove touched flags
  diff: add flag to indicate textconv was set via cmdline
  diff: convert flags to be stored in bitfields
  add, reset: use DIFF_OPT_SET macro to set a diff flag
2017-11-09 14:31:27 +09:00
Martin Ågren
24d707f636 bisect: change calling-convention of find_bisection()
This function takes a commit list and returns a commit list. The
returned list is built by modifying the original list. Thus the caller
should not use the original list again (and after the next commit fixes
a memory leak, it must not).

Change the function signature so that it takes a **list and has void
return type. That should make it harder to misuse this function.

While we're here, document this function.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-06 10:15:29 +09:00
Brandon Williams
0d1e0e7801 diff: make struct diff_flags members lowercase
Now that the flags stored in struct diff_flags are being accessed
directly and not through macros, change all struct members from being
uppercase to lowercase.
This conversion is done using the following semantic patch:

	@@
	expression E;
	@@
	- E.RECURSIVE
	+ E.recursive

	@@
	expression E;
	@@
	- E.TREE_IN_RECURSIVE
	+ E.tree_in_recursive

	@@
	expression E;
	@@
	- E.BINARY
	+ E.binary

	@@
	expression E;
	@@
	- E.TEXT
	+ E.text

	@@
	expression E;
	@@
	- E.FULL_INDEX
	+ E.full_index

	@@
	expression E;
	@@
	- E.SILENT_ON_REMOVE
	+ E.silent_on_remove

	@@
	expression E;
	@@
	- E.FIND_COPIES_HARDER
	+ E.find_copies_harder

	@@
	expression E;
	@@
	- E.FOLLOW_RENAMES
	+ E.follow_renames

	@@
	expression E;
	@@
	- E.RENAME_EMPTY
	+ E.rename_empty

	@@
	expression E;
	@@
	- E.HAS_CHANGES
	+ E.has_changes

	@@
	expression E;
	@@
	- E.QUICK
	+ E.quick

	@@
	expression E;
	@@
	- E.NO_INDEX
	+ E.no_index

	@@
	expression E;
	@@
	- E.ALLOW_EXTERNAL
	+ E.allow_external

	@@
	expression E;
	@@
	- E.EXIT_WITH_STATUS
	+ E.exit_with_status

	@@
	expression E;
	@@
	- E.REVERSE_DIFF
	+ E.reverse_diff

	@@
	expression E;
	@@
	- E.CHECK_FAILED
	+ E.check_failed

	@@
	expression E;
	@@
	- E.RELATIVE_NAME
	+ E.relative_name

	@@
	expression E;
	@@
	- E.IGNORE_SUBMODULES
	+ E.ignore_submodules

	@@
	expression E;
	@@
	- E.DIRSTAT_CUMULATIVE
	+ E.dirstat_cumulative

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_FILE
	+ E.dirstat_by_file

	@@
	expression E;
	@@
	- E.ALLOW_TEXTCONV
	+ E.allow_textconv

	@@
	expression E;
	@@
	- E.TEXTCONV_SET_VIA_CMDLINE
	+ E.textconv_set_via_cmdline

	@@
	expression E;
	@@
	- E.DIFF_FROM_CONTENTS
	+ E.diff_from_contents

	@@
	expression E;
	@@
	- E.DIRTY_SUBMODULES
	+ E.dirty_submodules

	@@
	expression E;
	@@
	- E.IGNORE_UNTRACKED_IN_SUBMODULES
	+ E.ignore_untracked_in_submodules

	@@
	expression E;
	@@
	- E.IGNORE_DIRTY_SUBMODULES
	+ E.ignore_dirty_submodules

	@@
	expression E;
	@@
	- E.OVERRIDE_SUBMODULE_CONFIG
	+ E.override_submodule_config

	@@
	expression E;
	@@
	- E.DIRSTAT_BY_LINE
	+ E.dirstat_by_line

	@@
	expression E;
	@@
	- E.FUNCCONTEXT
	+ E.funccontext

	@@
	expression E;
	@@
	- E.PICKAXE_IGNORE_CASE
	+ E.pickaxe_ignore_case

	@@
	expression E;
	@@
	- E.DEFAULT_FOLLOW_RENAMES
	+ E.default_follow_renames

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:51:40 +09:00
Brandon Williams
3b69daed86 diff: remove DIFF_OPT_TST macro
Remove the `DIFF_OPT_TST` macro and instead access the flags directly.
This conversion is done using the following semantic patch:

	@@
	expression E;
	identifier fld;
	@@
	- DIFF_OPT_TST(&E, fld)
	+ E.flags.fld

	@@
	type T;
	T *ptr;
	identifier fld;
	@@
	- DIFF_OPT_TST(ptr, fld)
	+ ptr->flags.fld

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-01 11:50:03 +09:00
brian m. carlson
206649672e pack-bitmap: convert traverse_bitmap_commit_list to object_id
Convert traverse_bitmap_commit_list and the callbacks it takes to use a
pointer to struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-16 11:05:51 +09:00
Junio C Hamano
d33a433236 Merge branch 'jc/simplify-progress'
The API to start showing progress meter after a short delay has
been simplified.

* jc/simplify-progress:
  progress: simplify "delayed" progress API
2017-08-24 10:20:02 -07:00
Junio C Hamano
8aade107dd progress: simplify "delayed" progress API
We used to expose the full power of the delayed progress API to the
callers, so that they can specify, not just the message to show and
expected total amount of work that is used to compute the percentage
of work performed so far, the percent-threshold parameter P and the
delay-seconds parameter N.  The progress meter starts to show at N
seconds into the operation only if we have not yet completed P per-cent
of the total work.

Most callers used either (0%, 2s) or (50%, 1s) as (P, N), but there
are oddballs that chose more random-looking values like 95%.

For a smoother workload, (50%, 1s) would allow us to start showing
the progress meter earlier than (0%, 2s), while keeping the chance
of not showing progress meter for long running operation the same as
the latter.  For a task that would take 2s or more to complete, it
is likely that less than half of it would complete within the first
second, if the workload is smooth.  But for a spiky workload whose
earlier part is easier, such a setting is likely to fail to show the
progress meter entirely and (0%, 2s) is more appropriate.

But that is merely a theory.  Realistically, it is of dubious value
to ask each codepath to carefully consider smoothness of their
workload and specify their own setting by passing two extra
parameters.  Let's simplify the API by dropping both parameters and
have everybody use (0%, 2s).

Oh, by the way, the percent-threshold parameter and the structure
member were consistently misspelled, which also is now fixed ;-)

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-19 14:01:34 -07:00
Junio C Hamano
8fbaf0b13b Merge branch 'jk/rev-list-empty-input'
"git log --tag=no-such-tag" showed log starting from HEAD, which
has been fixed---it now shows nothing.

* jk/rev-list-empty-input:
  revision: do not fallback to default when rev_input_given is set
  rev-list: don't show usage when we see empty ref patterns
  revision: add rev_input_given flag
  t6018: flesh out empty input/output rev-list tests
2017-08-11 13:27:07 -07:00
Junio C Hamano
3ab01ac3f7 Merge branch 'jk/reflog-walk'
Numerous bugs in walking of reflogs via "log -g" and friends have
been fixed.

* jk/reflog-walk:
  reflog-walk: apply --since/--until to reflog dates
  reflog-walk: stop using fake parents
  rev-list: check reflog_info before showing usage
  get_revision_1(): replace do-while with an early return
  log: do not free parents when walking reflog
  log: clarify comment about reflog cycles
  revision: disallow reflog walking with revs->limited
  t1414: document some reflog-walk oddities
2017-08-11 13:27:01 -07:00
Jeff King
0159ba3226 rev-list: don't show usage when we see empty ref patterns
If the user gives us no starting point for a traversal, we
want to complain with our normal usage message. But if they
tried to do so with "--all" or "--glob", but that happened
not to match any refs, the usage message isn't helpful. We
should just give them the empty output they asked for
instead.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-02 15:45:21 -07:00
Jeff King
d75dfb1089 rev-list: pass diffopt->use_colors through to pretty-print
When rev-list pretty-prints a commit, it creates a new
pretty_print_context and copies items from the rev_info
struct. We don't currently copy the "use_color" field,
though. Nobody seems to have noticed because the only part
of pretty.c that cares is the %C(auto,...) placeholder, and
presumably not many people use that with the rev-list
plumbing (as opposed to with git-log).

It will become more noticeable in a future patch, though,
when we start treating all user-format colors as auto-colors
(in which case it would become impossible to format colors
with rev-list, even with --color=always).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13 12:42:51 -07:00
Jeff King
7f97de5ee1 rev-list: check reflog_info before showing usage
When git-rev-list sees no pending commits, it shows a usage
message. This works even when reflog-walking is requested,
because the reflog-walk code currently puts the reflog tips
into the pending queue.

In preparation for refactoring the reflog-walk code, let's
explicitly check whether we have any reflogs to walk. For
now this is a noop, but the existing reflog tests will make
sure that it kicks in after the refactoring. Likewise, we'll
add a test that "rev-list -g" without specifying any reflogs
continues to fail (so that we know our check does not kick
in too aggressively).

Note that the implementation needs to go into its own
sub-function, as the walk code does not expose its innards
outside of reflog-walk.c.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-09 10:00:48 -07:00
Junio C Hamano
f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Junio C Hamano
50ad8561de Merge branch 'jk/consistent-h'
"git $cmd -h" for builtin commands calls the implementation of the
command (i.e. cmd_$cmd() function) without doing any repository
set-up, and the commands that expect RUN_SETUP is done by the Git
potty needs to be prepared to show the help text without barfing.

* jk/consistent-h:
  t0012: test "-h" with builtins
  git: add hidden --list-builtins option
  version: convert to parse-options
  diff- and log- family: handle "git cmd -h" early
  submodule--helper: show usage for "-h"
  remote-{ext,fd}: print usage message on invalid arguments
  upload-archive: handle "-h" option early
  credential: handle invalid arguments earlier
2017-06-19 12:38:45 -07:00
Brandon Williams
b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano
5a88f97cff diff- and log- family: handle "git cmd -h" early
"git $builtin -h" bypasses the usual repository setup and calls the
cmd_$builtin() function, expecting it to show the help text.

Unfortunately the commands in the log- and the diff- family want to
call into the revisions machinery, which by definition needs to have
a repository already discovered.  Strictly speaking, they may not
need a repository only for parsing "-h", but it is a good discipline
to future-proof codepath to ensure that setup_revisions() is called
after we know that a repository is there.

Handle the "git $builtin -h" special case very early in these
commands to work around potential issues.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-05 11:39:59 +09:00
Junio C Hamano
6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
brian m. carlson
c251c83df2 object: convert parse_object* to take struct object_id
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:

@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)

@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)

@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)

@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
Johannes Schindelin
cb71f8bdb5 PRItime: introduce a new "printf format" for timestamps
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.

There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.

So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 20:19:15 -07:00
brian m. carlson
dc01505f7f Convert GIT_SHA1_HEXSZ used for allocation to GIT_MAX_HEXSZ
Since we will likely be introducing a new hash function at some point,
and that hash function might be longer than 40 hex characters, use the
constant GIT_MAX_HEXSZ, which is designed to be suitable for
allocations, instead of GIT_SHA1_HEXSZ.  This will ease the transition
down the line by distinguishing between places where we need to allocate
memory suitable for the largest hash from those where we need to handle
the current hash.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-26 22:08:21 -07:00
René Scharfe
2490574d15 use oid_to_hex_r() for converting struct object_id hashes to hex strings
Patch generated by Coccinelle and contrib/coccinelle/object_id.cocci.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-30 14:23:40 -08:00
Jacob Keller
98985c6911 rev-list: use hdr_termination instead of a always using a newline
When adding support for prefixing output of log and other commands using
--line-prefix, commit 660e113ce1 ("graph: add support for
--line-prefix on all graph-aware output", 2016-08-31) accidentally
broke rev-list --header output.

In order to make the output appear with a line-prefix, the flow was
changed to always use the graph subsystem for display. Unfortunately
the graph flow in rev-list did not use info->hdr_termination as it was
assumed that graph output would never need to putput NULs.

Since we now always use the graph code in order to handle the case of
line-prefix, simply replace putchar('\n') with
putchar(info->hdr_termination) which will correct this issue.

Add a test for the --header case to make sure we don't break it in the
future.

Reported-by: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-20 14:44:37 -07:00
Jacob Keller
660e113ce1 graph: add support for --line-prefix on all graph-aware output
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:09 -07:00
Jeff King
434ea3cdad rev-list: add optional progress reporting
It's easy to ask rev-list to do a traversal that may takes
many seconds (e.g., by calling "--objects --all"). In theory
you can monitor its progress by the output you get to
stdout, but this isn't always easy.

Some operations, like "--count", don't make any output until
the end.

And some callers, like check_everything_connected(), are
using it just for the error-checking of the traversal, and
throw away stdout entirely.

This patch adds a "--progress" option which can be used to
give some eye-candy for a user waiting for a long traversal.
This is just a rev-list option and not a regular traversal
option, because it needs cooperation from the callbacks in
builtin/rev-list.c to do the actual count.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:10:44 -07:00
Junio C Hamano
6d8c5454b6 Merge branch 'jk/rev-list-count-with-bitmap'
"git rev-list --count" whose walk-length is limited with "-n"
option did not work well with the counting optimized to look at the
bitmap index.

* jk/rev-list-count-with-bitmap:
  rev-list: disable bitmaps when "-n" is used with listing objects
  rev-list: "adjust" results of "--count --use-bitmap-index -n"
2016-06-20 11:01:03 -07:00
Jeff King
fb85db84dc rev-list: disable bitmaps when "-n" is used with listing objects
You can ask rev-list to use bitmaps to speed up an --objects
traversal, which should generally give you your answers much
faster.

Likewise, you can ask rev-list to limit such a traversal
with `-n`, in which case we'll show only a limited set of
commits (and only the tree and commit objects directly
reachable from those commits).

But if you do both together, the results are nonsensical. We
end up limiting any fallback traversal we do to _find_ the
bitmaps, but the actual set of objects we output will be
picked arbitrarily from the union of any bitmaps we do find,
and will involve the objects of many more commits.

It's possible that somebody might want this as a "show me
what you can, but limit the amount of work you do" flag.
But as with the prior commit clamping "--count", the results
are basically non-deterministic; you'll get the values from
some commits between `n` and the total number, and you can't
tell which.

And unlike the `--count` case, we can't easily generate the
"real" value from the bitmap values (you can't just walk
back `-n` commits and subtract out the reachable objects
from the boundary commits; the bitmaps for `X` record its
total reachability, so you don't know which objects are
directly from `X` itself, which from `X^`, and so on).

So let's just fallback to the non-bitmap code path in this
case, so we always give a sane answer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-03 09:01:02 -07:00
Jeff King
5c9f9bf313 rev-list: "adjust" results of "--count --use-bitmap-index -n"
If you ask rev-list for:

    git rev-list --count --use-bitmap-index HEAD

we optimize out the actual traversal and just give you the
number of bits set in the commit bitmap. This is faster,
which is good.

But if you ask to limit the size of the traversal, like:

    git rev-list --count --use-bitmap-index -n 100 HEAD

we'll still output the full bitmapped number we found. On
the surface, that might even seem OK. You explicitly asked
to use the bitmap index, and it was cheap to compute the
real answer, so we gave it to you.

But there's something much more complicated going on under
the hood. If we don't have a bitmap directly for HEAD, then
we have to actually traverse backwards, looking for a
bitmapped commit. And _that_ traversal is bounded by our
`-n` count.

This is a good thing, because it bounds the work we have to
do, which is probably what the user wanted by asking for
`-n`. But now it makes the output quite confusing. You might
get many values:

  - your `-n` value, if we walked back and never found a
    bitmap (or fewer if there weren't that many commits)

  - the actual full count, if we found a bitmap root for
    every path of our traversal with in the `-n` limit

  - any number in between! We might have walked back and
    found _some_ bitmaps, but then cut off the traversal
    early with some commits not accounted for in the result.

So you cannot even see a value higher than your `-n` and say
"OK, bitmaps kicked in, this must be the real full count".
The only sane thing is for git to just clamp the value to a
maximum of the `-n` value, which means we should output the
exact same results whether bitmaps are in use or not.

The test in t5310 demonstrates this by using `-n 1`.
Without this patch we fail in the full-bitmap case (where we
do not have to traverse at all) but _not_ in the
partial-bitmap case (where we have to walk down to find an
actual bitmap). With this patch, both cases just work.

I didn't implement the crazy in-between case, just because
it's complicated to set up, and is really a subset of the
full-count case, which we do cover.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-03 09:00:59 -07:00
Jeff King
2824e1841b list-objects: pass full pathname to callbacks
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16 10:41:04 -07:00
Jeff King
dc06dc8800 list-objects: drop name_path entirely
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-16 10:41:03 -07:00
Jeff King
de1e67d070 list-objects: pass full pathname to callbacks
When we find a blob at "a/b/c", we currently pass this to
our show_object_fn callbacks as two components: "a/b/" and
"c". Callbacks which want the full value then call
path_name(), which concatenates the two. But this is an
inefficient interface; the path is a strbuf, and we could
simply append "c" to it temporarily, then roll back the
length, without creating a new copy.

So we could improve this by teaching the callsites of
path_name() this trick (and there are only 3). But we can
also notice that no callback actually cares about the
broken-down representation, and simply pass each callback
the full path "a/b/c" as a string. The callback code becomes
even simpler, then, as we do not have to worry about freeing
an allocated buffer, nor rolling back our modification to
the strbuf.

This is theoretically less efficient, as some callbacks
would not bother to format the final path component. But in
practice this is not measurable. Since we use the same
strbuf over and over, our work to grow it is amortized, and
we really only pay to memcpy a few bytes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:17 -08:00
Jeff King
bd64516aca list-objects: drop name_path entirely
In the previous commit, we left name_path as a thin wrapper
around a strbuf. This patch drops it entirely. As a result,
every show_object_fn callback needs to be adjusted. However,
none of their code needs to be changed at all, because the
only use was to pass it to path_name(), which now handles
the bare strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-12 12:51:15 -08:00
brian m. carlson
ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson
7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Jeff King
d59f765ac9 use sha1_to_hex_r() instead of strcpy
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:

  strcpy(buf, sha1_to_hex(sha1));

This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.

Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King
2aea7a51a1 rev-list: make it obvious that we do not support notes
The rev-list command does not have the internal
infrastructure to display notes. Running:

  git rev-list --notes HEAD

will silently ignore the "--notes" option. Running:

  git rev-list --notes --grep=. HEAD

will crash on an assert. Running:

  git rev-list --format=%N HEAD

will place a literal "%N" in the output (it does not even
expand to an empty string).

Let's have rev-list tell the user that it cannot fill the
user's request, rather than silently producing wrong data.
Likewise, let's remove mention of the notes options from the
rev-list documentation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-24 10:33:15 -07:00
Junio C Hamano
3afcec9057 Merge branch 'ls/hint-rev-list-count' into maint
* ls/hint-rev-list-count:
  rev-list: add --count to usage guide
2015-07-27 12:21:47 -07:00
Junio C Hamano
b3a30f6e0c Merge branch 'ls/hint-rev-list-count'
* ls/hint-rev-list-count:
  rev-list: add --count to usage guide
2015-07-10 14:26:13 -07:00
Jeff King
c8a70d3509 rev-list: disable --use-bitmap-index when pruning commits
The reachability bitmaps do not have enough information to
tell us which commits might have changed path "foo", so the
current code produces wrong answers for:

  git rev-list --use-bitmap-index --count HEAD -- foo

(it silently ignores the "foo" limiter). Instead, we should
fall back to doing a normal traversal (it is OK to fall
back rather than complain, because --use-bitmap-index is a
pure optimization, and might not kick in for other reasons,
such as there being no bitmaps in the repository).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-01 12:00:50 -07:00
Lawrence Siebert
75d2e5a7b0 rev-list: add --count to usage guide
--count should be mentioned in the usage guide, this updates code and
documentation.

Signed-off-by: Lawrence Siebert <lawrencesiebert@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-01 09:29:11 -07:00
Jeff King
8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King
a97934d820 use get_cached_commit_buffer where appropriate
Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Jeff King
0fb370da9c provide a helper to free commit buffer
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.

Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it.  But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual).  In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.

Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:07:47 -07:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Vicent Marti
aa32939fea rev-list: add bitmap mode to speed up object lists
The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).

Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.

These are some example timings for `torvalds/linux` (warm cache,
best-of-five):

    $ time git rev-list --objects master > /dev/null

    real    0m34.191s
    user    0m33.904s
    sys     0m0.268s

    $ time git rev-list --objects --use-bitmap-index master > /dev/null

    real    0m1.041s
    user    0m0.976s
    sys     0m0.064s

Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.

Here are some sample timings of different ways to count commits in
`torvalds/linux`:

    $ time git rev-list master | wc -l
        399882

        real    0m6.524s
        user    0m6.060s
        sys     0m3.284s

    $ time git rev-list --count master
        399882

        real    0m4.318s
        user    0m4.236s
        sys     0m0.076s

    $ time git rev-list --use-bitmap-index --count master
        399882

        real    0m0.217s
        user    0m0.176s
        sys     0m0.040s

This also respects negative refs, so you can use it to count
a slice of history:

        $ time git rev-list --count v3.0..master
        144843

        real    0m1.971s
        user    0m1.932s
        sys     0m0.036s

        $ time git rev-list --use-bitmap-index --count v3.0..master
        real    0m0.280s
        user    0m0.220s
        sys     0m0.056s

Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.

Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., "--count --left-right master...topic"). The slow part
of that walk actually happens during the merge-base determination when
we parse "master...topic". Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.

Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-30 12:19:22 -08:00
Junio C Hamano
c01499ef69 C: have space around && and || operators
Correct all hits from

    git grep -e '\(&&\|||\)[^ ]' -e '[^	 ]\(&&\|||\)' -- '*.c'

i.e. && or || operators that are followed by anything but a SP,
or that follow something other than a SP or a HT, so that these
operators have a SP around it when necessary.

We usually refrain from making this kind of a tree-wide change in
order to avoid unnecessary conflicts with other "real work" patches,
but in this case, the end result does not have a potentially
cumbersome tree-wide impact, while this is a tree-wide cleanup.

Fixes to compat/regex/regcomp.c and xdiff/xemit.c are to replace a
HT immediately after && with a SP.

This is based on Felipe's patch to bultin/symbolic-ref.c; I did all
the finding out what other files in the whole tree need to be fixed
and did the fix and also the log message while reviewing that single
liner, so any screw-ups in this version are mine.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-16 10:26:39 -07:00