Git with broken hash generation to generate collisions between object IDs. Don't use this! https://undefinedbehavior.de/posts/commit-vandalism/
Go to file
SZEDER Gábor c525ce95b4 commit-graph: check all leading directories in changed path Bloom filters
The file 'dir/subdir/file' can only be modified if its leading
directories 'dir' and 'dir/subdir' are modified as well.

So when checking modified path Bloom filters looking for commits
modifying a path with multiple path components, then check not only
the full path in the Bloom filters, but all its leading directories as
well.  Take care to check these paths in "deepest first" order,
because it's the full path that is least likely to be modified, and
the Bloom filter queries can short circuit sooner.

This can significantly reduce the average false positive rate, by
about an order of magnitude or three(!), and can further speed up
pathspec-limited revision walks.  The table below compares the average
false positive rate and runtime of

  git rev-list HEAD -- "$path"

before and after this change for 5000+ randomly* selected paths from
each repository:

                    Average false           Average        Average
                    positive rate           runtime        runtime
                  before     after     before     after   difference
  ------------------------------------------------------------------
  git             3.220%   0.7853%     0.0558s   0.0387s   -30.6%
  linux           2.453%   0.0296%     0.1046s   0.0766s   -26.8%
  tensorflow      2.536%   0.6977%     0.0594s   0.0420s   -29.2%

*Path selection was done with the following pipeline:

	git ls-tree -r --name-only HEAD | sort -R | head -n 5000

The improvements in runtime are much smaller than the improvements in
average false positive rate, as we are clearly reaching diminishing
returns here.  However, all these timings depend on that accessing
tree objects is reasonably fast (warm caches).  If we had a partial
clone and the tree objects had to be fetched from a promisor remote,
e.g.:

  $ git clone --filter=tree:0 --bare file://.../webkit.git webkit.notrees.git
  $ git -C webkit.git -c core.modifiedPathBloomFilters=1 \
        commit-graph write --reachable
  $ cp webkit.git/objects/info/commit-graph webkit.notrees.git/objects/info/
  $ git -C webkit.notrees.git -c core.modifiedPathBloomFilters=1 \
        rev-list HEAD -- "$path"

then checking all leading path component can reduce the runtime from
over an hour to a few seconds (and this is with the clone and the
promisor on the same machine).

This adjusts the tracing values in t4216-log-bloom.sh, which provides a
concrete way to notice the improvement.

Helped-by: Taylor Blau <me@ttaylorr.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-01 14:17:43 -07:00
.github
block-sha1
builtin commit-graph: persist existence of changed-paths 2020-07-01 14:17:43 -07:00
ci commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag 2020-04-06 11:08:37 -07:00
compat Merge branch 'js/mingw-open-in-gdb' into maint 2020-03-17 15:02:25 -07:00
contrib completion: offer '--(no-)patch' among 'git log' options 2020-05-11 09:33:56 -07:00
Documentation commit-graph: persist existence of changed-paths 2020-07-01 14:17:43 -07:00
ewah Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
git-gui Merge https://github.com/prati0100/git-gui 2020-03-19 16:06:51 -07:00
gitk-git
gitweb
mergetools mergetools: add support for smerge (Sublime Merge) 2019-04-04 18:21:25 +09:00
negotiator
perl
po l10n: tr.po: change file mode to 644 2020-03-21 18:26:56 +08:00
ppc
refs
sha1collisiondetection@855827c583
sha1dc
sha256 hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
t commit-graph: check all leading directories in changed path Bloom filters 2020-07-01 14:17:43 -07:00
templates
trace2
vcs-svn
xdiff
.cirrus.yml
.clang-format
.editorconfig
.gitattributes
.gitignore stash: remove the stash.useBuiltin setting 2020-03-05 12:50:28 -08:00
.gitmodules
.mailmap Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
.travis.yml
.tsan-suppressions
abspath.c real_path_if_valid(): remove unsafe API 2020-03-10 11:41:40 -07:00
aclocal.m4
add-interactive.c Merge branch 'js/builtin-add-i-cmds' into maint 2020-03-17 15:02:20 -07:00
add-interactive.h
add-patch.c
advice.c Merge branch 'hw/advise-ng' 2020-03-25 13:57:41 -07:00
advice.h Merge branch 'hw/advise-ng' 2020-03-25 13:57:41 -07:00
alias.c
alias.h
alloc.c
alloc.h
apply.c convert: permit passing additional metadata to filter processes 2020-03-16 11:37:02 -07:00
apply.h
archive-tar.c
archive-zip.c
archive.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
archive.h convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
argv-array.c
argv-array.h
attr.c
attr.h
azure-pipelines.yml Azure Pipeline: switch to the latest agent pools 2020-02-27 09:58:43 -08:00
banned.h
base85.c
bisect.c bisect: libify bisect_next_all 2020-02-19 09:37:15 -08:00
bisect.h bisect: libify bisect_next_all 2020-02-19 09:37:15 -08:00
blame.c blame: drop unused parameter from maybe_changed_path 2020-04-23 14:37:03 -07:00
blame.h blame: use changed-path Bloom filters 2020-04-16 15:38:06 -07:00
blob.c
blob.h
bloom.c bloom: fix logic in get_bloom_filter() 2020-07-01 14:17:43 -07:00
bloom.h line-log: integrate with changed-path Bloom filters 2020-05-11 09:33:56 -07:00
branch.c
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c
cache-tree.h cache-tree: share code between functions writing an index as a tree 2019-08-19 10:08:03 -07:00
cache.h Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
chdir-notify.c
chdir-notify.h
check_bindir
check-builtins.sh
checkout.c
checkout.h
CODE_OF_CONDUCT.md
color.c
color.h
column.c
column.h
combine-diff.c
command-list.txt
commit-graph.c commit-graph: check chunk sizes after writing 2020-07-01 14:17:43 -07:00
commit-graph.h commit-graph: persist existence of changed-paths 2020-07-01 14:17:43 -07:00
commit-reach.c
commit-reach.h
commit-slab-decl.h commit-slab: add a function to deep free entries on the slab 2020-06-08 12:28:49 -07:00
commit-slab-impl.h commit-slab: add a function to deep free entries on the slab 2020-06-08 12:28:49 -07:00
commit-slab.h commit-slab: add a function to deep free entries on the slab 2020-06-08 12:28:49 -07:00
commit.c Merge branch 'at/rebase-fork-point-regression-fix' 2020-03-26 17:11:21 -07:00
commit.h
common-main.c
config.c Merge branch 'bw/remote-rename-update-config' 2020-02-25 11:18:32 -08:00
config.h
config.mak.dev Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
config.mak.in
config.mak.uname
configure.ac
connect.c
connect.h
connected.c connected.c: reprepare packs for corner cases 2020-03-15 15:39:00 -07:00
connected.h
convert.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
convert.h convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
copy.c
COPYING
credential-cache--daemon.c
credential-cache.c
credential-store.c
credential.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
credential.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
csum-file.c hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
csum-file.h
ctype.c
daemon.c
date.c
decorate.c
decorate.h
delta-islands.c
delta-islands.h
delta.h
detect-compiler
diff-delta.c
diff-lib.c
diff-no-index.c
diff.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
diff.h diff.h: drop diff_tree_oid() & friends' return value 2020-06-08 12:28:49 -07:00
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c Merge branch 'ds/sparse-add' 2020-03-05 10:43:02 -08:00
dir.h
editor.c real_path: remove unsafe API 2020-03-10 11:41:40 -07:00
entry.c convert: provide additional metadata to filters 2020-03-16 11:37:02 -07:00
environment.c real_path: remove unsafe API 2020-03-10 11:41:40 -07:00
exec-cmd.c
exec-cmd.h
fast-import.c fast-import: add options for rewriting submodules 2020-02-28 09:53:41 -08:00
fetch-negotiator.c repo-settings: create feature.experimental setting 2019-08-13 13:33:55 -07:00
fetch-negotiator.h
fetch-pack.c
fetch-pack.h
fmt-merge-msg.h
fsck.c
fsck.h
fsmonitor.c
fsmonitor.h
fuzz-commit-graph.c
fuzz-pack-headers.c
fuzz-pack-idx.c
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl
git-archimport.perl
git-bisect.sh
git-compat-util.h
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py Merge branch 'yz/p4-py3' 2020-03-25 13:57:43 -07:00
git-parse-remote.sh
git-quiltimport.sh
git-rebase--preserve-merges.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh Merge branch 'es/recursive-single-branch-clone' 2020-03-05 10:43:03 -08:00
git-svn.perl
GIT-VERSION-GEN The first batch post 2.26 cycle 2020-03-25 13:57:44 -07:00
git-web--browse.sh
git.c stash: remove the stash.useBuiltin setting 2020-03-05 12:50:28 -08:00
git.rc
gpg-interface.c gpg-interface: prefer check_signature() for GPG verification 2020-03-15 09:46:28 -07:00
gpg-interface.h gpg-interface: prefer check_signature() for GPG verification 2020-03-15 09:46:28 -07:00
graph.c
graph.h
grep.c
grep.h
hash.h hash: implement and use a context cloning function 2020-02-24 09:33:21 -08:00
hashmap.c
hashmap.h
help.c
help.h
hex.c hex: add functions to parse hex object IDs in any algorithm 2020-02-24 09:33:21 -08:00
http-backend.c
http-fetch.c
http-push.c
http-walker.c
http.c Merge branch 'js/https-proxy-config' 2020-03-25 13:57:42 -07:00
http.h
ident.c
imap-send.c
INSTALL Merge branch 'ar/install-doc-update-cmds-needing-the-shell' 2019-12-01 09:04:41 -08:00
interdiff.c
interdiff.h
iterator.h
json-writer.c
json-writer.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
LGPL-2.1
line-log.c line-log: integrate with changed-path Bloom filters 2020-05-11 09:33:56 -07:00
line-log.h line-log: more responsive, incremental 'git log -L' 2020-05-11 09:33:56 -07:00
line-range.c
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c
list-objects-filter-options.h
list-objects-filter.c
list-objects-filter.h
list-objects.c
list-objects.h
list.h tempfile: use list.h for linked list 2017-09-06 17:19:54 +09:00
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c Merge branch 'hi/gpg-prefer-check-signature' 2020-03-26 17:11:20 -07:00
log-tree.h
ls-refs.c
ls-refs.h
mailinfo.c Merge branch 'rs/micro-cleanups' 2020-03-02 15:07:20 -08:00
mailinfo.h
mailmap.c
mailmap.h
Makefile bloom.c: add the murmur3 hash implementation 2020-03-30 09:59:53 -07:00
match-trees.c
mem-pool.c
mem-pool.h
merge-blobs.c
merge-blobs.h
merge-recursive.c Merge branch 'en/t3433-rebase-stat-dirty-failure' into maint 2020-03-17 15:02:23 -07:00
merge-recursive.h
merge.c builtin/checkout: compute checkout metadata for checkouts 2020-03-16 11:37:02 -07:00
mergesort.c mergesort: rename it to llist_mergesort() 2012-04-17 11:07:01 -07:00
mergesort.h
midx.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
midx.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c
notes-utils.h
notes.c Merge branch 'rs/strbuf-insertstr' 2020-02-17 13:22:17 -08:00
notes.h
object-store.h packed_object_info(): use object_id for returning delta base 2020-02-24 12:55:53 -08:00
object.c Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
object.h
oidmap.c
oidmap.h
oidset.c
oidset.h Merge branch 'en/oidset-uninclude-hashmap' 2020-03-25 13:57:44 -07:00
pack-bitmap-write.c
pack-bitmap.c Merge branch 'jk/nth-packed-object-id' 2020-03-05 10:43:03 -08:00
pack-bitmap.h Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
pack-check.c pack-check: push oid lookup into loop 2020-02-24 12:55:53 -08:00
pack-objects.c pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-objects.h pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
packfile.c packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
packfile.h packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
pager.c
parse-options-cb.c
parse-options.c Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
parse-options.h Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
patch-delta.c
patch-ids.c
patch-ids.h
path.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
path.h
pathspec.c prefix_path: show gitdir if worktree unavailable 2020-03-15 09:35:46 -07:00
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c Merge branch 'rs/strbuf-insertstr' 2020-02-17 13:22:17 -08:00
pretty.h
prio-queue.c
prio-queue.h
progress.c
progress.h
promisor-remote.c
promisor-remote.h
prompt.c
prompt.h
protocol.c
protocol.h
quote.c quote: use isalnum() to check for alphanumeric characters 2020-02-24 09:30:29 -08:00
quote.h
range-diff.c
range-diff.h
reachable.c
reachable.h
read-cache.c
README.md
rebase-interactive.c Merge branch 'rt/format-zero-length-fix' 2020-03-09 11:21:21 -07:00
rebase-interactive.h Merge branch 'en/rebase-backend' 2020-03-02 15:07:19 -08:00
rebase.c
rebase.h
ref-filter.c Merge branch 'dr/push-remote-ref-update' 2020-03-11 10:58:16 -07:00
ref-filter.h
reflog-walk.c
reflog-walk.h
refs.c
refs.h
refspec.c
refspec.h
RelNotes The first batch post 2.26 cycle 2020-03-25 13:57:44 -07:00
remote-curl.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
remote-testsvn.c
remote.c remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
remote.h remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
replace-object.c
replace-object.h
repo-settings.c config: set pack.useSparse=true by default 2020-03-20 14:22:31 -07:00
repository.c repository: require a build flag to use SHA-256 2020-02-24 09:33:21 -08:00
repository.h
rerere.c
rerere.h
resolve-undo.c
resolve-undo.h
revision.c commit-graph: check all leading directories in changed path Bloom filters 2020-07-01 14:17:43 -07:00
revision.h commit-graph: check all leading directories in changed path Bloom filters 2020-07-01 14:17:43 -07:00
run-command.c
run-command.h run-command.h: fix mis-indented struct member 2020-02-22 09:05:34 -08:00
send-pack.c
send-pack.h
sequencer.c Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
sequencer.h Merge branch 'pw/advise-rebase-skip' 2020-03-25 13:57:43 -07:00
serve.c
serve.h
server-info.c
setup.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
sh-i18n--envsubst.c
sha1-array.c
sha1-array.h
sha1-file.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
sha1-lookup.c
sha1-lookup.h
sha1-name.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
sha1dc_git.c
sha1dc_git.h
shallow.c commit-slab: add a function to deep free entries on the slab 2020-06-08 12:28:49 -07:00
shell.c
shortlog.h
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
stable-qsort.c
strbuf.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
strbuf.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
streaming.c
streaming.h
string-list.c
string-list.h Merge branch 'en/string-list-can-be-custom-sorted' 2020-01-22 15:07:31 -08:00
sub-process.c
sub-process.h
submodule-config.c Merge branch 'mr/show-config-scope' 2020-02-17 13:22:17 -08:00
submodule-config.h
submodule.c Merge branch 'dt/submodule-rm-with-stale-cache' into maint 2020-03-17 15:02:21 -07:00
submodule.h get_superproject_working_tree(): return strbuf 2020-03-10 11:41:40 -07:00
symlinks.c
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace2.c
trace2.h
trace.c
trace.h
trailer.c
trailer.h
transport-helper.c
transport-internal.h
transport.c
transport.h
tree-diff.c diff.h: drop diff_tree_oid() & friends' return value 2020-06-08 12:28:49 -07:00
tree-walk.c tree-walk.c: don't match submodule entries for 'submod/anything' 2020-06-08 12:28:48 -07:00
tree-walk.h
tree.c
tree.h
unicode-width.h unicode: update the width tables to Unicode 13.0 2020-03-17 15:06:37 -07:00
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Merge branch 'bc/filter-process' 2020-03-26 17:11:20 -07:00
unpack-trees.h builtin/checkout: compute checkout metadata for checkouts 2020-03-16 11:37:02 -07:00
upload-pack.c
upload-pack.h
url.c
url.h
urlmatch.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
urlmatch.h credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
walker.h remote-curl: show progress for fetches over dumb HTTP 2020-03-03 13:15:40 -08:00
wildmatch.c wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode 2018-10-29 13:19:22 +09:00
wildmatch.h
worktree.c Merge branch 'bc/sha-256-part-1-of-4' 2020-03-26 17:11:20 -07:00
worktree.h worktree: add utility to find worktree by pathname 2020-02-24 13:04:30 -08:00
wrap-for-bin.sh
wrapper.c
write-or-die.c
ws.c
wt-status.c
wt-status.h
xdiff-interface.c
xdiff-interface.h
zlib.c

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks