a10a17877b
The current method for getting the refs from an alternate is to run upload-pack in the alternate and parse its output using the normal transport code. This works and is reasonably short, but it has a very bad memory footprint when there are a lot of refs in the alternate. There are two problems: 1. It reads in all of the refs before passing any back to us. Which means that our peak memory usage has to store every ref (including duplicates for peeled variants), even if our callback could determine that some are not interesting (e.g., because they point to the same sha1 as another ref). 2. It allocates a "struct ref" for each one. Among other things, this contains 3 separate 20-byte oids, along with the name and various pointers. That can add up, especially if the callback is only interested in the sha1 (which it can store in a sha1_array as just 20 bytes). On a particularly pathological case, where the alternate had over 80 million refs pointing to only around 60,000 unique objects, the peak heap usage of "git clone --reference" grew to over 25GB. This patch instead calls git-for-each-ref in the alternate repository, and passes each line to the callback as we read it. That drops the peak heap of the same command to 50MB. I considered and rejected a few alternatives. We could read all of the refs in the alternate using our own ref code, just as we do with submodules. However, as memory footprint is one of the concerns here, we want to avoid loading those refs into our own memory as a whole. It's possible that this will be a better technique in the future when the ref code can more easily iterate without loading all of packed-refs into memory. Another option is to keep calling upload-pack, and just parse its output ourselves in a streaming fashion. Besides for-each-ref being simpler (we get to define the format ourselves, and don't have to deal with speaking the git protocol), it's more flexible for possible future changes. For instance, it might be useful for the caller to be able to limit the set of "interesting" alternate refs. The motivating example is one where many "forks" of a particular repository share object storage, and the shared storage has refs for each fork (which is why so many of the refs are duplicates; each fork has the same tags). A plausible future optimization would be to ask for the alternate refs for just _one_ fork (if you had some out-of-band way of knowing which was the most interesting or important for the current operation). Similarly, no callbacks actually care about the symref value of alternate refs, and as before, this patch ignores them entirely. However, if we wanted to add them, for-each-ref's "%(symref)" is going to be more flexible than upload-pack, because the latter only handles the HEAD symref due to historical constraints. There is one potential downside, though: unlike upload-pack, our for-each-ref command doesn't report the peeled value of refs. The existing code calls the alternate_ref_fn callback twice for tags: once for the tag, and once for the peeled value with the refname set to "ref^{}". For the callers in fetch-pack, this doesn't matter at all. We immediately peel each tag down to a commit either way (so there's a slight improvement, as do not bother passing the redundant data over the pipe). For the caller in receive-pack, it means we will not advertise the peeled values of tags in our alternate. However, we also don't advertise peeled values for our _own_ tags, so this is actually making things more consistent. It's unclear whether receive-pack advertising peeled values is a win or not. On one hand, giving more information to the other side may let it omit some objects from the push. On the other hand, for tags which both sides have, they simply bloat the advertisement. The upload-pack advertisement of git.git is about 30% larger than the receive-pack advertisement due to its peeled information. This patch omits the peeled information from for_each_alternate_ref entirely, and leaves it up to the caller whether they want to dig up the information. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
block-sha1 | ||
builtin | ||
ci | ||
compat | ||
contrib | ||
Documentation | ||
ewah | ||
git-gui | ||
gitk-git | ||
gitweb | ||
mergetools | ||
perl | ||
po | ||
ppc | ||
refs | ||
t | ||
templates | ||
vcs-svn | ||
xdiff | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
.travis.yml | ||
abspath.c | ||
aclocal.m4 | ||
advice.c | ||
advice.h | ||
alias.c | ||
alloc.c | ||
apply.c | ||
apply.h | ||
archive-tar.c | ||
archive-zip.c | ||
archive.c | ||
archive.h | ||
argv-array.c | ||
argv-array.h | ||
attr.c | ||
attr.h | ||
base85.c | ||
bisect.c | ||
bisect.h | ||
blob.c | ||
blob.h | ||
branch.c | ||
branch.h | ||
builtin.h | ||
bulk-checkin.c | ||
bulk-checkin.h | ||
bundle.c | ||
bundle.h | ||
cache-tree.c | ||
cache-tree.h | ||
cache.h | ||
check_bindir | ||
check-builtins.sh | ||
check-racy.c | ||
color.c | ||
color.h | ||
column.c | ||
column.h | ||
combine-diff.c | ||
command-list.txt | ||
commit-slab.h | ||
commit.c | ||
commit.h | ||
common-main.c | ||
config.c | ||
config.mak.in | ||
config.mak.uname | ||
configure.ac | ||
connect.c | ||
connect.h | ||
connected.c | ||
connected.h | ||
convert.c | ||
convert.h | ||
copy.c | ||
COPYING | ||
credential-cache--daemon.c | ||
credential-cache.c | ||
credential-store.c | ||
credential.c | ||
credential.h | ||
csum-file.c | ||
csum-file.h | ||
ctype.c | ||
daemon.c | ||
date.c | ||
decorate.c | ||
decorate.h | ||
delta.h | ||
diff-delta.c | ||
diff-lib.c | ||
diff-no-index.c | ||
diff.c | ||
diff.h | ||
diffcore-break.c | ||
diffcore-delta.c | ||
diffcore-order.c | ||
diffcore-pickaxe.c | ||
diffcore-rename.c | ||
diffcore.h | ||
dir-iterator.c | ||
dir-iterator.h | ||
dir.c | ||
dir.h | ||
editor.c | ||
entry.c | ||
environment.c | ||
exec_cmd.c | ||
exec_cmd.h | ||
fast-import.c | ||
fetch-pack.c | ||
fetch-pack.h | ||
fmt-merge-msg.h | ||
fsck.c | ||
fsck.h | ||
generate-cmdlist.sh | ||
gettext.c | ||
gettext.h | ||
git-add--interactive.perl | ||
git-archimport.perl | ||
git-bisect.sh | ||
git-compat-util.h | ||
git-cvsexportcommit.perl | ||
git-cvsimport.perl | ||
git-cvsserver.perl | ||
git-difftool--helper.sh | ||
git-filter-branch.sh | ||
git-instaweb.sh | ||
git-merge-octopus.sh | ||
git-merge-one-file.sh | ||
git-merge-resolve.sh | ||
git-mergetool--lib.sh | ||
git-mergetool.sh | ||
git-p4.py | ||
git-parse-remote.sh | ||
git-quiltimport.sh | ||
git-rebase--am.sh | ||
git-rebase--interactive.sh | ||
git-rebase--merge.sh | ||
git-rebase.sh | ||
git-remote-testgit.sh | ||
git-request-pull.sh | ||
git-send-email.perl | ||
git-sh-i18n.sh | ||
git-sh-setup.sh | ||
git-stash.sh | ||
git-submodule.sh | ||
git-svn.perl | ||
GIT-VERSION-GEN | ||
git-web--browse.sh | ||
git.c | ||
git.rc | ||
gpg-interface.c | ||
gpg-interface.h | ||
graph.c | ||
graph.h | ||
grep.c | ||
grep.h | ||
hashmap.c | ||
hashmap.h | ||
help.c | ||
help.h | ||
hex.c | ||
http-backend.c | ||
http-fetch.c | ||
http-push.c | ||
http-walker.c | ||
http.c | ||
http.h | ||
ident.c | ||
imap-send.c | ||
INSTALL | ||
iterator.h | ||
khash.h | ||
kwset.c | ||
kwset.h | ||
levenshtein.c | ||
levenshtein.h | ||
LGPL-2.1 | ||
line-log.c | ||
line-log.h | ||
line-range.c | ||
line-range.h | ||
list-objects.c | ||
list-objects.h | ||
list.h | ||
ll-merge.c | ||
ll-merge.h | ||
lockfile.c | ||
lockfile.h | ||
log-tree.c | ||
log-tree.h | ||
mailinfo.c | ||
mailinfo.h | ||
mailmap.c | ||
mailmap.h | ||
Makefile | ||
match-trees.c | ||
merge-blobs.c | ||
merge-blobs.h | ||
merge-recursive.c | ||
merge-recursive.h | ||
merge.c | ||
mergesort.c | ||
mergesort.h | ||
mru.c | ||
mru.h | ||
name-hash.c | ||
notes-cache.c | ||
notes-cache.h | ||
notes-merge.c | ||
notes-merge.h | ||
notes-utils.c | ||
notes-utils.h | ||
notes.c | ||
notes.h | ||
object.c | ||
object.h | ||
pack-bitmap-write.c | ||
pack-bitmap.c | ||
pack-bitmap.h | ||
pack-check.c | ||
pack-objects.c | ||
pack-objects.h | ||
pack-revindex.c | ||
pack-revindex.h | ||
pack-write.c | ||
pack.h | ||
pager.c | ||
parse-options-cb.c | ||
parse-options.c | ||
parse-options.h | ||
patch-delta.c | ||
patch-ids.c | ||
patch-ids.h | ||
path.c | ||
pathspec.c | ||
pathspec.h | ||
pkt-line.c | ||
pkt-line.h | ||
preload-index.c | ||
pretty.c | ||
prio-queue.c | ||
prio-queue.h | ||
progress.c | ||
progress.h | ||
prompt.c | ||
prompt.h | ||
quote.c | ||
quote.h | ||
reachable.c | ||
reachable.h | ||
read-cache.c | ||
README.md | ||
ref-filter.c | ||
ref-filter.h | ||
reflog-walk.c | ||
reflog-walk.h | ||
refs.c | ||
refs.h | ||
RelNotes | ||
remote-curl.c | ||
remote-testsvn.c | ||
remote.c | ||
remote.h | ||
replace_object.c | ||
rerere.c | ||
rerere.h | ||
resolve-undo.c | ||
resolve-undo.h | ||
revision.c | ||
revision.h | ||
run-command.c | ||
run-command.h | ||
send-pack.c | ||
send-pack.h | ||
sequencer.c | ||
sequencer.h | ||
server-info.c | ||
setup.c | ||
sh-i18n--envsubst.c | ||
sha1_file.c | ||
sha1_name.c | ||
sha1-array.c | ||
sha1-array.h | ||
sha1-lookup.c | ||
sha1-lookup.h | ||
shallow.c | ||
shell.c | ||
shortlog.h | ||
show-index.c | ||
sideband.c | ||
sideband.h | ||
sigchain.c | ||
sigchain.h | ||
split-index.c | ||
split-index.h | ||
strbuf.c | ||
strbuf.h | ||
streaming.c | ||
streaming.h | ||
string-list.c | ||
string-list.h | ||
submodule-config.c | ||
submodule-config.h | ||
submodule.c | ||
submodule.h | ||
symlinks.c | ||
tag.c | ||
tag.h | ||
tar.h | ||
tempfile.c | ||
tempfile.h | ||
thread-utils.c | ||
thread-utils.h | ||
tmp-objdir.c | ||
tmp-objdir.h | ||
trace.c | ||
trace.h | ||
trailer.c | ||
trailer.h | ||
transport-helper.c | ||
transport.c | ||
transport.h | ||
tree-diff.c | ||
tree-walk.c | ||
tree-walk.h | ||
tree.c | ||
tree.h | ||
unicode_width.h | ||
unimplemented.sh | ||
unix-socket.c | ||
unix-socket.h | ||
unpack-trees.c | ||
unpack-trees.h | ||
upload-pack.c | ||
url.c | ||
url.h | ||
urlmatch.c | ||
urlmatch.h | ||
usage.c | ||
userdiff.c | ||
userdiff.h | ||
utf8.c | ||
utf8.h | ||
varint.c | ||
varint.h | ||
version.c | ||
version.h | ||
versioncmp.c | ||
walker.c | ||
walker.h | ||
wildmatch.c | ||
wildmatch.h | ||
worktree.c | ||
worktree.h | ||
wrap-for-bin.sh | ||
wrapper.c | ||
write_or_die.c | ||
ws.c | ||
wt-status.c | ||
wt-status.h | ||
xdiff-interface.c | ||
xdiff-interface.h | ||
zlib.c |
Git - fast, scalable, distributed revision control system
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from http://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see
Documentation/giteveryday.txt for a useful minimum set of commands, and
Documentation/git-.txt for documentation of each command.
If git has been correctly installed, then the tutorial can also be
read with man gittutorial
or git help tutorial
, and the
documentation of each command with man git-<commandname>
or git help <commandname>
.
CVS users may also want to read Documentation/gitcvs-migration.txt
(man gitcvs-migration
or git help cvs-migration
if git is
installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git, http://marc.info/?l=git and other archival sites.
The maintainer frequently sends the "What's cooking" reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name "git" was given by Linus Torvalds when he wrote the very first version. He described the tool as "the stupid content tracker" and the name as (depending on your mood):
- random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant.
- stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang.
- "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room.
- "goddamn idiotic truckload of sh*t": when it breaks