Git with broken hash generation to generate collisions between object IDs. Don't use this! https://undefinedbehavior.de/posts/commit-vandalism/
Go to file
Jeff King d8410a816b fast-import: replace custom hash with hashmap.c
We use a custom hash in fast-import to store the set of objects we've
imported so far. It has a fixed set of 2^16 buckets and chains any
collisions with a linked list. As the number of objects grows larger
than that, the load factor increases and we degrade to O(n) lookups and
O(n^2) insertions.

We can scale better by using our hashmap.c implementation, which will
resize the bucket count as we grow. This does incur an extra memory cost
of 8 bytes per object, as hashmap stores the integer hash value for each
entry in its hashmap_entry struct (which we really don't care about
here, because we're just reusing the embedded object hash). But I think
the numbers below justify this (and our per-object memory cost is
already much higher).

I also looked at using khash, but it seemed to perform slightly worse
than hashmap at all sizes, and worse even than the existing code for
small sizes. It's also awkward to use here, because we want to look up a
"struct object_entry" from a "struct object_id", and it doesn't handle
mismatched keys as well. Making a mapping of object_id to object_entry
would be more natural, but that would require pulling the embedded oid
out of the object_entry or incurring an extra 32 bytes per object.

In a synthetic test creating as many cheap, tiny objects as possible

  perl -e '
      my $bits = shift;
      my $nr = 2**$bits;

      for (my $i = 0; $i < $nr; $i++) {
              print "blob\n";
              print "data 4\n";
              print pack("N", $i);
      }
  ' $bits | git fast-import

I got these results:

  nr_objects   master       khash      hashmap
  2^20         0m4.317s     0m5.109s   0m3.890s
  2^21         0m10.204s    0m9.702s   0m7.933s
  2^22         0m27.159s    0m17.911s  0m16.751s
  2^23         1m19.038s    0m35.080s  0m31.963s
  2^24         4m18.766s    1m10.233s  1m6.793s

which points to hashmap as the winner. We didn't have any perf tests for
fast-export or fast-import, so I added one as a more real-world case.
It uses an export without blobs since that's significantly cheaper than
a full one, but still is an interesting case people might use (e.g., for
rewriting history). It will emphasize this change in some ways (as a
percentage we spend more time making objects and less shuffling blob
bytes around) and less in others (the total object count is lower).

Here are the results for linux.git:

  Test                        HEAD^                 HEAD
  ----------------------------------------------------------------------------
  9300.1: export (no-blobs)   67.64(66.96+0.67)     67.81(67.06+0.75) +0.3%
  9300.2: import (no-blobs)   284.04(283.34+0.69)   198.09(196.01+0.92) -30.3%

It only has ~5.2M commits and trees, so this is a larger effect than I
expected (the 2^23 case above only improved by 50s or so, but here we
gained almost 90s). This is probably due to actually performing more
object lookups in a real import with trees and commits, as opposed to
just dumping a bunch of blobs into a pack.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-06 13:41:24 -07:00
.github
block-sha1
builtin Merge branch 'hi/gpg-use-check-signature' into maint 2020-03-17 15:02:23 -07:00
ci Merge branch 'js/ci-windows-update' 2020-03-05 10:43:04 -08:00
compat Merge branch 'js/mingw-open-in-gdb' into maint 2020-03-17 15:02:25 -07:00
contrib Merge branch 'kk/complete-diff-color-moved' 2020-03-09 11:21:20 -07:00
Documentation RelNotes/2.26.0: fix various typos 2020-03-18 15:42:37 -07:00
ewah Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
git-gui Merge https://github.com/prati0100/git-gui 2020-03-19 16:06:51 -07:00
gitk-git
gitweb
mergetools
negotiator
perl
po l10n: tr.po: change file mode to 644 2020-03-21 18:26:56 +08:00
ppc
refs C: use skip_prefix() to avoid hardcoded string length 2020-01-31 13:03:45 -08:00
sha1collisiondetection@855827c583
sha1dc
sha256
t fast-import: replace custom hash with hashmap.c 2020-04-06 13:41:24 -07:00
templates Merge branch 'kw/fsmonitor-watchman-racefix' 2020-02-14 12:54:20 -08:00
trace2
vcs-svn
xdiff
.cirrus.yml
.clang-format
.editorconfig
.gitattributes
.gitignore
.gitmodules
.mailmap Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
.travis.yml
.tsan-suppressions replace-object: make replace operations thread-safe 2020-01-17 13:52:14 -08:00
abspath.c
aclocal.m4
add-interactive.c Merge branch 'js/builtin-add-i-cmds' into maint 2020-03-17 15:02:20 -07:00
add-interactive.h
add-patch.c
advice.c add: change advice config variables used by the add API 2020-02-06 11:08:00 -08:00
advice.h add: change advice config variables used by the add API 2020-02-06 11:08:00 -08:00
alias.c
alias.h
alloc.c
alloc.h
apply.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
apply.h
archive-tar.c streaming: allow open_istream() to handle any repo 2020-01-31 10:45:39 -08:00
archive-zip.c streaming: allow open_istream() to handle any repo 2020-01-31 10:45:39 -08:00
archive.c
archive.h
argv-array.c
argv-array.h
attr.c
attr.h
azure-pipelines.yml Azure Pipeline: switch to the latest agent pools 2020-02-27 09:58:43 -08:00
banned.h
base85.c
bisect.c bisect: libify bisect_next_all 2020-02-19 09:37:15 -08:00
bisect.h bisect: libify bisect_next_all 2020-02-19 09:37:15 -08:00
blame.c
blame.h blame: provide type of fingerprints pointer 2020-02-24 12:08:48 -08:00
blob.c
blob.h
branch.c
branch.h
builtin.h
bulk-checkin.c
bulk-checkin.h
bundle.c
bundle.h
cache-tree.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
cache-tree.h
cache.h Merge branch 'mt/use-passed-repo-more-in-funcs' 2020-02-14 12:54:22 -08:00
chdir-notify.c
chdir-notify.h
check_bindir
check-builtins.sh
checkout.c
checkout.h
CODE_OF_CONDUCT.md
color.c color.c: alias RGB colors 8-15 to aixterm colors 2020-02-11 11:19:00 -08:00
color.h
column.c
column.h
combine-diff.c
command-list.txt
commit-graph.c Merge branch 'rs/commit-graph-code-simplification' 2020-03-05 10:43:04 -08:00
commit-graph.h commit-graph.h: use odb in 'load_commit_graph_one_fd_st' 2020-02-04 11:36:51 -08:00
commit-reach.c
commit-reach.h
commit-slab-decl.h
commit-slab-impl.h
commit-slab.h
commit.c Merge branch 'rs/strbuf-insertstr' 2020-02-17 13:22:17 -08:00
commit.h
common-main.c
config.c Merge branch 'bw/remote-rename-update-config' 2020-02-25 11:18:32 -08:00
config.h config: provide access to the current line number 2020-02-10 10:52:10 -08:00
config.mak.dev config.mak.dev: re-enable -Wformat-zero-length 2020-02-28 08:39:45 -08:00
config.mak.in
config.mak.uname
configure.ac
connect.c
connect.h
connected.c connected: verify promisor-ness of partial clone 2020-01-30 10:55:31 -08:00
connected.h connected: verify promisor-ness of partial clone 2020-01-30 10:55:31 -08:00
convert.c Merge branch 'mt/use-passed-repo-more-in-funcs' 2020-02-14 12:54:22 -08:00
convert.h
copy.c
COPYING
credential-cache--daemon.c
credential-cache.c
credential-store.c
credential.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
credential.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
csum-file.c
csum-file.h csum-file: introduce hashfile_total() 2020-01-23 10:51:50 -08:00
ctype.c
daemon.c
date.c
decorate.c
decorate.h
delta-islands.c
delta-islands.h
delta.h
detect-compiler
diff-delta.c
diff-lib.c
diff-no-index.c
diff.c Merge branch 'mt/use-passed-repo-more-in-funcs' 2020-02-14 12:54:22 -08:00
diff.h
diffcore-break.c
diffcore-delta.c
diffcore-order.c
diffcore-pickaxe.c
diffcore-rename.c sha1-file: pass git_hash_algo to hash_object_file() 2020-01-31 10:45:39 -08:00
diffcore.h
dir-iterator.c
dir-iterator.h
dir.c Merge branch 'ds/sparse-add' 2020-03-05 10:43:02 -08:00
dir.h
editor.c
entry.c
environment.c
exec-cmd.c
exec-cmd.h
fast-import.c fast-import: replace custom hash with hashmap.c 2020-04-06 13:41:24 -07:00
fetch-negotiator.c
fetch-negotiator.h
fetch-pack.c
fetch-pack.h
fmt-merge-msg.h
fsck.c
fsck.h
fsmonitor.c
fsmonitor.h
fuzz-commit-graph.c
fuzz-pack-headers.c
fuzz-pack-idx.c
generate-cmdlist.sh
gettext.c
gettext.h
git-add--interactive.perl
git-archimport.perl
git-bisect.sh
git-compat-util.h
git-cvsexportcommit.perl
git-cvsimport.perl
git-cvsserver.perl
git-difftool--helper.sh
git-filter-branch.sh
git-instaweb.sh
git-legacy-stash.sh Merge branch 'js/patch-mode-in-others-in-c' 2020-02-05 14:34:58 -08:00
git-merge-octopus.sh
git-merge-one-file.sh
git-merge-resolve.sh
git-mergetool--lib.sh
git-mergetool.sh
git-p4.py git-p4: avoid leak of file handle when cloning 2020-01-30 12:21:13 -08:00
git-parse-remote.sh
git-quiltimport.sh
git-rebase--preserve-merges.sh
git-request-pull.sh
git-send-email.perl
git-sh-i18n.sh
git-sh-setup.sh
git-submodule.sh Merge branch 'es/recursive-single-branch-clone' 2020-03-05 10:43:03 -08:00
git-svn.perl
GIT-VERSION-GEN Git 2.26 2020-03-22 16:50:46 -07:00
git-web--browse.sh
git.c
git.rc
gpg-interface.c Merge branch 'hi/gpg-use-check-signature' 2020-03-05 10:43:05 -08:00
gpg-interface.h Merge branch 'hi/gpg-use-check-signature' 2020-03-05 10:43:05 -08:00
graph.c
graph.h
grep.c grep: replace grep_read_mutex by internal obj read lock 2020-01-17 13:52:14 -08:00
grep.h grep: replace grep_read_mutex by internal obj read lock 2020-01-17 13:52:14 -08:00
hash.h
hashmap.c
hashmap.h
help.c
help.h
hex.c
http-backend.c
http-fetch.c
http-push.c
http-walker.c
http.c strbuf: add and use strbuf_insertstr() 2020-02-10 09:04:45 -08:00
http.h
ident.c
imap-send.c
INSTALL
interdiff.c
interdiff.h
iterator.h
json-writer.c
json-writer.h
khash.h
kwset.c
kwset.h
levenshtein.c
levenshtein.h
LGPL-2.1
line-log.c
line-log.h
line-range.c
line-range.h
linear-assignment.c
linear-assignment.h
list-objects-filter-options.c
list-objects-filter-options.h
list-objects-filter.c
list-objects-filter.h
list-objects.c
list-objects.h
list.h
ll-merge.c
ll-merge.h
lockfile.c
lockfile.h
log-tree.c Merge branch 'hd/show-one-mergetag-fix' into maint 2020-03-17 15:02:24 -07:00
log-tree.h
ls-refs.c
ls-refs.h
mailinfo.c Merge branch 'rs/micro-cleanups' 2020-03-02 15:07:20 -08:00
mailinfo.h
mailmap.c
mailmap.h
Makefile Merge branch 'bw/remote-rename-update-config' 2020-02-25 11:18:32 -08:00
match-trees.c
mem-pool.c
mem-pool.h
merge-blobs.c
merge-blobs.h
merge-recursive.c Merge branch 'en/t3433-rebase-stat-dirty-failure' into maint 2020-03-17 15:02:23 -07:00
merge-recursive.h
merge.c
mergesort.c
mergesort.h
midx.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
midx.h
name-hash.c
notes-cache.c
notes-cache.h
notes-merge.c
notes-merge.h
notes-utils.c strbuf: add and use strbuf_insertstr() 2020-02-10 09:04:45 -08:00
notes-utils.h
notes.c Merge branch 'jh/notes-fanout-fix' into maint 2020-03-17 15:02:22 -07:00
notes.h
object-store.h packed_object_info(): use object_id for returning delta base 2020-02-24 12:55:53 -08:00
object.c Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
object.h pack-bitmap: fix leak of haves/wants object lists 2020-02-13 09:08:58 -08:00
oidmap.c
oidmap.h
oidset.c
oidset.h
pack-bitmap-write.c
pack-bitmap.c Merge branch 'jk/nth-packed-object-id' 2020-03-05 10:43:03 -08:00
pack-bitmap.h Merge branch 'jk/object-filter-with-bitmap' 2020-03-02 15:07:18 -08:00
pack-check.c pack-check: push oid lookup into loop 2020-02-24 12:55:53 -08:00
pack-objects.c pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-objects.h pack-objects: convert oe_set_delta_ext() to use object_id 2020-02-24 12:55:52 -08:00
pack-revindex.c
pack-revindex.h
pack-write.c
pack.h
packfile.c packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
packfile.h packfile: drop nth_packed_object_sha1() 2020-02-24 12:55:53 -08:00
pager.c
parse-options-cb.c parse-options: simplify parse_options_dup() 2020-02-10 09:45:49 -08:00
parse-options.c Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
parse-options.h Merge branch 'pb/am-show-current-patch' 2020-03-09 11:21:19 -07:00
patch-delta.c
patch-ids.c
patch-ids.h
path.c normalize_path_copy(): document "dst" size expectations 2020-01-30 13:45:58 -08:00
path.h
pathspec.c prefix_path: show gitdir if worktree unavailable 2020-03-15 09:35:46 -07:00
pathspec.h
pkt-line.c
pkt-line.h
preload-index.c
pretty.c Merge branch 'rs/strbuf-insertstr' 2020-02-17 13:22:17 -08:00
pretty.h
prio-queue.c
prio-queue.h
progress.c
progress.h
promisor-remote.c
promisor-remote.h
prompt.c
prompt.h
protocol.c
protocol.h
quote.c quote: use isalnum() to check for alphanumeric characters 2020-02-24 09:30:29 -08:00
quote.h
range-diff.c
range-diff.h
reachable.c pack-bitmap: basic noop bitmap filter infrastructure 2020-02-14 10:46:22 -08:00
reachable.h
read-cache.c
README.md
rebase-interactive.c Merge branch 'rt/format-zero-length-fix' 2020-03-09 11:21:21 -07:00
rebase-interactive.h Merge branch 'en/rebase-backend' 2020-03-02 15:07:19 -08:00
rebase.c pull --rebase/remote rename: document and honor single-letter abbreviations rebase types 2020-02-10 10:52:10 -08:00
rebase.h pull --rebase/remote rename: document and honor single-letter abbreviations rebase types 2020-02-10 10:52:10 -08:00
ref-filter.c Merge branch 'dr/push-remote-ref-update' 2020-03-11 10:58:16 -07:00
ref-filter.h
reflog-walk.c
reflog-walk.h
refs.c
refs.h
refspec.c
refspec.h
RelNotes Git 2.25.2 2020-03-17 15:06:37 -07:00
remote-curl.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
remote-testsvn.c
remote.c remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
remote.h remote: drop "explicit" parameter from remote_ref_for_branch() 2020-03-03 14:56:05 -08:00
replace-object.c replace-object: make replace operations thread-safe 2020-01-17 13:52:14 -08:00
replace-object.h replace-object: make replace operations thread-safe 2020-01-17 13:52:14 -08:00
repo-settings.c
repository.c
repository.h
rerere.c
rerere.h
resolve-undo.c
resolve-undo.h
revision.c
revision.h
run-command.c Merge branch 'bc/run-command-nullness-after-free-fix' into maint 2020-02-14 12:42:27 -08:00
run-command.h run-command.h: fix mis-indented struct member 2020-02-22 09:05:34 -08:00
send-pack.c
send-pack.h
sequencer.c Merge branch 'js/rebase-i-with-colliding-hash' into maint 2020-03-17 15:02:21 -07:00
sequencer.h Merge branch 'en/rebase-backend' 2020-03-02 15:07:19 -08:00
serve.c
serve.h
server-info.c
setup.c Merge branch 'es/outside-repo-errmsg-hints' 2020-03-16 12:43:29 -07:00
sh-i18n--envsubst.c
sha1-array.c
sha1-array.h
sha1-file.c packed_object_info(): use object_id for returning delta base 2020-02-24 12:55:53 -08:00
sha1-lookup.c
sha1-lookup.h
sha1-name.c nth_packed_object_oid(): use customary integer return 2020-02-24 12:55:42 -08:00
sha1dc_git.c
sha1dc_git.h
shallow.c
shell.c
shortlog.h
sideband.c
sideband.h
sigchain.c
sigchain.h
split-index.c
split-index.h
stable-qsort.c
strbuf.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
strbuf.h Merge branch 'bc/wildcard-credential' 2020-03-05 10:43:02 -08:00
streaming.c streaming: allow open_istream() to handle any repo 2020-01-31 10:45:39 -08:00
streaming.h streaming: allow open_istream() to handle any repo 2020-01-31 10:45:39 -08:00
string-list.c style: the opening '{' of a function is in a separate line 2018-12-10 15:41:09 +09:00
string-list.h Merge branch 'en/string-list-can-be-custom-sorted' into maint 2020-02-14 12:42:27 -08:00
sub-process.c
sub-process.h
submodule-config.c Merge branch 'mr/show-config-scope' 2020-02-17 13:22:17 -08:00
submodule-config.h submodule-config: add skip_if_read option to repo_read_gitmodules() 2020-01-17 13:52:14 -08:00
submodule.c Merge branch 'dt/submodule-rm-with-stale-cache' into maint 2020-03-17 15:02:21 -07:00
submodule.h
symlinks.c
tag.c
tag.h
tar.h
tempfile.c
tempfile.h
thread-utils.c
thread-utils.h
tmp-objdir.c
tmp-objdir.h
trace2.c
trace2.h
trace.c
trace.h
trailer.c
trailer.h
transport-helper.c C: use skip_prefix() to avoid hardcoded string length 2020-01-31 13:03:45 -08:00
transport-internal.h
transport.c Merge branch 'jk/no-flush-upon-disconnecting-slrpc-transport' into maint 2020-02-14 12:42:28 -08:00
transport.h
tree-diff.c
tree-walk.c tree-walk.c: break circular dependency with unpack-trees 2020-02-04 10:32:15 -08:00
tree-walk.h tree-walk.c: break circular dependency with unpack-trees 2020-02-04 10:32:15 -08:00
tree.c
tree.h
unicode-width.h unicode: update the width tables to Unicode 13.0 2020-03-17 15:06:37 -07:00
unimplemented.sh
unix-socket.c
unix-socket.h
unpack-trees.c Merge branch 'en/simplify-check-updates-in-unpack-trees' into maint 2020-03-17 15:02:25 -07:00
unpack-trees.h tree-walk.c: break circular dependency with unpack-trees 2020-02-04 10:32:15 -08:00
upload-pack.c config: split repo scope to local and worktree 2020-02-10 10:32:20 -08:00
upload-pack.h
url.c
url.h
urlmatch.c credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
urlmatch.h credential: allow wildcard patterns when matching config 2020-02-20 13:05:43 -08:00
usage.c
userdiff.c
userdiff.h
utf8.c
utf8.h
varint.c
varint.h
version.c
version.h
versioncmp.c
walker.c Merge branch 'rs/show-progress-in-dumb-http-fetch' 2020-03-09 11:21:21 -07:00
walker.h remote-curl: show progress for fetches over dumb HTTP 2020-03-03 13:15:40 -08:00
wildmatch.c
wildmatch.h
worktree.c Merge branch 'hv/receive-denycurrent-everywhere' 2020-03-05 10:43:03 -08:00
worktree.h worktree: add utility to find worktree by pathname 2020-02-24 13:04:30 -08:00
wrap-for-bin.sh
wrapper.c
write-or-die.c
ws.c
wt-status.c
wt-status.h
xdiff-interface.c xdiff: avoid computing non-zero offset from NULL pointer 2020-01-28 23:13:25 -08:00
xdiff-interface.h
zlib.c

Build Status

Git - fast, scalable, distributed revision control system

Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.

Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.

Please read the file INSTALL for installation instructions.

Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.

See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-<commandname>.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial or git help tutorial, and the documentation of each command with man git-<commandname> or git help <commandname>.

CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration or git help cvs-migration if git is installed).

The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.

Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.

The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.

The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):

  • random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
  • stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
  • "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
  • "goddamn idiotic truckload of sh*t": when it breaks