d8410a816b
We use a custom hash in fast-import to store the set of objects we've imported so far. It has a fixed set of 2^16 buckets and chains any collisions with a linked list. As the number of objects grows larger than that, the load factor increases and we degrade to O(n) lookups and O(n^2) insertions. We can scale better by using our hashmap.c implementation, which will resize the bucket count as we grow. This does incur an extra memory cost of 8 bytes per object, as hashmap stores the integer hash value for each entry in its hashmap_entry struct (which we really don't care about here, because we're just reusing the embedded object hash). But I think the numbers below justify this (and our per-object memory cost is already much higher). I also looked at using khash, but it seemed to perform slightly worse than hashmap at all sizes, and worse even than the existing code for small sizes. It's also awkward to use here, because we want to look up a "struct object_entry" from a "struct object_id", and it doesn't handle mismatched keys as well. Making a mapping of object_id to object_entry would be more natural, but that would require pulling the embedded oid out of the object_entry or incurring an extra 32 bytes per object. In a synthetic test creating as many cheap, tiny objects as possible perl -e ' my $bits = shift; my $nr = 2**$bits; for (my $i = 0; $i < $nr; $i++) { print "blob\n"; print "data 4\n"; print pack("N", $i); } ' $bits | git fast-import I got these results: nr_objects master khash hashmap 2^20 0m4.317s 0m5.109s 0m3.890s 2^21 0m10.204s 0m9.702s 0m7.933s 2^22 0m27.159s 0m17.911s 0m16.751s 2^23 1m19.038s 0m35.080s 0m31.963s 2^24 4m18.766s 1m10.233s 1m6.793s which points to hashmap as the winner. We didn't have any perf tests for fast-export or fast-import, so I added one as a more real-world case. It uses an export without blobs since that's significantly cheaper than a full one, but still is an interesting case people might use (e.g., for rewriting history). It will emphasize this change in some ways (as a percentage we spend more time making objects and less shuffling blob bytes around) and less in others (the total object count is lower). Here are the results for linux.git: Test HEAD^ HEAD ---------------------------------------------------------------------------- 9300.1: export (no-blobs) 67.64(66.96+0.67) 67.81(67.06+0.75) +0.3% 9300.2: import (no-blobs) 284.04(283.34+0.69) 198.09(196.01+0.92) -30.3% It only has ~5.2M commits and trees, so this is a larger effect than I expected (the 2^23 case above only improved by 50s or so, but here we gained almost 90s). This is probably due to actually performing more object lookups in a real import with trees and commits, as opposed to just dumping a bunch of blobs into a pack. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.github | ||
block-sha1 | ||
builtin | ||
ci | ||
compat | ||
contrib | ||
Documentation | ||
ewah | ||
git-gui | ||
gitk-git | ||
gitweb | ||
mergetools | ||
negotiator | ||
perl | ||
po | ||
ppc | ||
refs | ||
sha1collisiondetection@855827c583 | ||
sha1dc | ||
sha256 | ||
t | ||
templates | ||
trace2 | ||
vcs-svn | ||
xdiff | ||
.cirrus.yml | ||
.clang-format | ||
.editorconfig | ||
.gitattributes | ||
.gitignore | ||
.gitmodules | ||
.mailmap | ||
.travis.yml | ||
.tsan-suppressions | ||
abspath.c | ||
aclocal.m4 | ||
add-interactive.c | ||
add-interactive.h | ||
add-patch.c | ||
advice.c | ||
advice.h | ||
alias.c | ||
alias.h | ||
alloc.c | ||
alloc.h | ||
apply.c | ||
apply.h | ||
archive-tar.c | ||
archive-zip.c | ||
archive.c | ||
archive.h | ||
argv-array.c | ||
argv-array.h | ||
attr.c | ||
attr.h | ||
azure-pipelines.yml | ||
banned.h | ||
base85.c | ||
bisect.c | ||
bisect.h | ||
blame.c | ||
blame.h | ||
blob.c | ||
blob.h | ||
branch.c | ||
branch.h | ||
builtin.h | ||
bulk-checkin.c | ||
bulk-checkin.h | ||
bundle.c | ||
bundle.h | ||
cache-tree.c | ||
cache-tree.h | ||
cache.h | ||
chdir-notify.c | ||
chdir-notify.h | ||
check_bindir | ||
check-builtins.sh | ||
checkout.c | ||
checkout.h | ||
CODE_OF_CONDUCT.md | ||
color.c | ||
color.h | ||
column.c | ||
column.h | ||
combine-diff.c | ||
command-list.txt | ||
commit-graph.c | ||
commit-graph.h | ||
commit-reach.c | ||
commit-reach.h | ||
commit-slab-decl.h | ||
commit-slab-impl.h | ||
commit-slab.h | ||
commit.c | ||
commit.h | ||
common-main.c | ||
config.c | ||
config.h | ||
config.mak.dev | ||
config.mak.in | ||
config.mak.uname | ||
configure.ac | ||
connect.c | ||
connect.h | ||
connected.c | ||
connected.h | ||
convert.c | ||
convert.h | ||
copy.c | ||
COPYING | ||
credential-cache--daemon.c | ||
credential-cache.c | ||
credential-store.c | ||
credential.c | ||
credential.h | ||
csum-file.c | ||
csum-file.h | ||
ctype.c | ||
daemon.c | ||
date.c | ||
decorate.c | ||
decorate.h | ||
delta-islands.c | ||
delta-islands.h | ||
delta.h | ||
detect-compiler | ||
diff-delta.c | ||
diff-lib.c | ||
diff-no-index.c | ||
diff.c | ||
diff.h | ||
diffcore-break.c | ||
diffcore-delta.c | ||
diffcore-order.c | ||
diffcore-pickaxe.c | ||
diffcore-rename.c | ||
diffcore.h | ||
dir-iterator.c | ||
dir-iterator.h | ||
dir.c | ||
dir.h | ||
editor.c | ||
entry.c | ||
environment.c | ||
exec-cmd.c | ||
exec-cmd.h | ||
fast-import.c | ||
fetch-negotiator.c | ||
fetch-negotiator.h | ||
fetch-pack.c | ||
fetch-pack.h | ||
fmt-merge-msg.h | ||
fsck.c | ||
fsck.h | ||
fsmonitor.c | ||
fsmonitor.h | ||
fuzz-commit-graph.c | ||
fuzz-pack-headers.c | ||
fuzz-pack-idx.c | ||
generate-cmdlist.sh | ||
gettext.c | ||
gettext.h | ||
git-add--interactive.perl | ||
git-archimport.perl | ||
git-bisect.sh | ||
git-compat-util.h | ||
git-cvsexportcommit.perl | ||
git-cvsimport.perl | ||
git-cvsserver.perl | ||
git-difftool--helper.sh | ||
git-filter-branch.sh | ||
git-instaweb.sh | ||
git-legacy-stash.sh | ||
git-merge-octopus.sh | ||
git-merge-one-file.sh | ||
git-merge-resolve.sh | ||
git-mergetool--lib.sh | ||
git-mergetool.sh | ||
git-p4.py | ||
git-parse-remote.sh | ||
git-quiltimport.sh | ||
git-rebase--preserve-merges.sh | ||
git-request-pull.sh | ||
git-send-email.perl | ||
git-sh-i18n.sh | ||
git-sh-setup.sh | ||
git-submodule.sh | ||
git-svn.perl | ||
GIT-VERSION-GEN | ||
git-web--browse.sh | ||
git.c | ||
git.rc | ||
gpg-interface.c | ||
gpg-interface.h | ||
graph.c | ||
graph.h | ||
grep.c | ||
grep.h | ||
hash.h | ||
hashmap.c | ||
hashmap.h | ||
help.c | ||
help.h | ||
hex.c | ||
http-backend.c | ||
http-fetch.c | ||
http-push.c | ||
http-walker.c | ||
http.c | ||
http.h | ||
ident.c | ||
imap-send.c | ||
INSTALL | ||
interdiff.c | ||
interdiff.h | ||
iterator.h | ||
json-writer.c | ||
json-writer.h | ||
khash.h | ||
kwset.c | ||
kwset.h | ||
levenshtein.c | ||
levenshtein.h | ||
LGPL-2.1 | ||
line-log.c | ||
line-log.h | ||
line-range.c | ||
line-range.h | ||
linear-assignment.c | ||
linear-assignment.h | ||
list-objects-filter-options.c | ||
list-objects-filter-options.h | ||
list-objects-filter.c | ||
list-objects-filter.h | ||
list-objects.c | ||
list-objects.h | ||
list.h | ||
ll-merge.c | ||
ll-merge.h | ||
lockfile.c | ||
lockfile.h | ||
log-tree.c | ||
log-tree.h | ||
ls-refs.c | ||
ls-refs.h | ||
mailinfo.c | ||
mailinfo.h | ||
mailmap.c | ||
mailmap.h | ||
Makefile | ||
match-trees.c | ||
mem-pool.c | ||
mem-pool.h | ||
merge-blobs.c | ||
merge-blobs.h | ||
merge-recursive.c | ||
merge-recursive.h | ||
merge.c | ||
mergesort.c | ||
mergesort.h | ||
midx.c | ||
midx.h | ||
name-hash.c | ||
notes-cache.c | ||
notes-cache.h | ||
notes-merge.c | ||
notes-merge.h | ||
notes-utils.c | ||
notes-utils.h | ||
notes.c | ||
notes.h | ||
object-store.h | ||
object.c | ||
object.h | ||
oidmap.c | ||
oidmap.h | ||
oidset.c | ||
oidset.h | ||
pack-bitmap-write.c | ||
pack-bitmap.c | ||
pack-bitmap.h | ||
pack-check.c | ||
pack-objects.c | ||
pack-objects.h | ||
pack-revindex.c | ||
pack-revindex.h | ||
pack-write.c | ||
pack.h | ||
packfile.c | ||
packfile.h | ||
pager.c | ||
parse-options-cb.c | ||
parse-options.c | ||
parse-options.h | ||
patch-delta.c | ||
patch-ids.c | ||
patch-ids.h | ||
path.c | ||
path.h | ||
pathspec.c | ||
pathspec.h | ||
pkt-line.c | ||
pkt-line.h | ||
preload-index.c | ||
pretty.c | ||
pretty.h | ||
prio-queue.c | ||
prio-queue.h | ||
progress.c | ||
progress.h | ||
promisor-remote.c | ||
promisor-remote.h | ||
prompt.c | ||
prompt.h | ||
protocol.c | ||
protocol.h | ||
quote.c | ||
quote.h | ||
range-diff.c | ||
range-diff.h | ||
reachable.c | ||
reachable.h | ||
read-cache.c | ||
README.md | ||
rebase-interactive.c | ||
rebase-interactive.h | ||
rebase.c | ||
rebase.h | ||
ref-filter.c | ||
ref-filter.h | ||
reflog-walk.c | ||
reflog-walk.h | ||
refs.c | ||
refs.h | ||
refspec.c | ||
refspec.h | ||
RelNotes | ||
remote-curl.c | ||
remote-testsvn.c | ||
remote.c | ||
remote.h | ||
replace-object.c | ||
replace-object.h | ||
repo-settings.c | ||
repository.c | ||
repository.h | ||
rerere.c | ||
rerere.h | ||
resolve-undo.c | ||
resolve-undo.h | ||
revision.c | ||
revision.h | ||
run-command.c | ||
run-command.h | ||
send-pack.c | ||
send-pack.h | ||
sequencer.c | ||
sequencer.h | ||
serve.c | ||
serve.h | ||
server-info.c | ||
setup.c | ||
sh-i18n--envsubst.c | ||
sha1-array.c | ||
sha1-array.h | ||
sha1-file.c | ||
sha1-lookup.c | ||
sha1-lookup.h | ||
sha1-name.c | ||
sha1dc_git.c | ||
sha1dc_git.h | ||
shallow.c | ||
shell.c | ||
shortlog.h | ||
sideband.c | ||
sideband.h | ||
sigchain.c | ||
sigchain.h | ||
split-index.c | ||
split-index.h | ||
stable-qsort.c | ||
strbuf.c | ||
strbuf.h | ||
streaming.c | ||
streaming.h | ||
string-list.c | ||
string-list.h | ||
sub-process.c | ||
sub-process.h | ||
submodule-config.c | ||
submodule-config.h | ||
submodule.c | ||
submodule.h | ||
symlinks.c | ||
tag.c | ||
tag.h | ||
tar.h | ||
tempfile.c | ||
tempfile.h | ||
thread-utils.c | ||
thread-utils.h | ||
tmp-objdir.c | ||
tmp-objdir.h | ||
trace2.c | ||
trace2.h | ||
trace.c | ||
trace.h | ||
trailer.c | ||
trailer.h | ||
transport-helper.c | ||
transport-internal.h | ||
transport.c | ||
transport.h | ||
tree-diff.c | ||
tree-walk.c | ||
tree-walk.h | ||
tree.c | ||
tree.h | ||
unicode-width.h | ||
unimplemented.sh | ||
unix-socket.c | ||
unix-socket.h | ||
unpack-trees.c | ||
unpack-trees.h | ||
upload-pack.c | ||
upload-pack.h | ||
url.c | ||
url.h | ||
urlmatch.c | ||
urlmatch.h | ||
usage.c | ||
userdiff.c | ||
userdiff.h | ||
utf8.c | ||
utf8.h | ||
varint.c | ||
varint.h | ||
version.c | ||
version.h | ||
versioncmp.c | ||
walker.c | ||
walker.h | ||
wildmatch.c | ||
wildmatch.h | ||
worktree.c | ||
worktree.h | ||
wrap-for-bin.sh | ||
wrapper.c | ||
write-or-die.c | ||
ws.c | ||
wt-status.c | ||
wt-status.h | ||
xdiff-interface.c | ||
xdiff-interface.h | ||
zlib.c |
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-<commandname>.txt
for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial
or git help tutorial
, and the
documentation of each command with man git-<commandname>
or git help <commandname>
.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration
or git help cvs-migration
if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://lore.kernel.org/git/, http://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks