Add a 'done' command that causes fast-import to stop reading from the
stream and exit.
If the new --done command line flag was passed on the command line
(or a "feature done" declaration included at the start of the stream),
make the 'done' command mandatory. So "git fast-import --done"'s
input format will be prefix-free, making errors easier to detect when
they show up as early termination at some convenient time of the
upstream of a pipe writing to fast-import.
Another possible application of the 'done' command would to be allow a
fast-import stream that is only a small part of a larger encapsulating
stream to be easily parsed, leaving the file offset after the "done\n"
so the other application can pick up from there. This patch does not
teach fast-import to do that --- fast-import still uses buffered input
(stdio).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If fast-export did not complete successfully the error handling code
itself would error out.
This was broken in commit 23b093ee0 (Brandon Casey, Wed Jun 9 2010,
Remove python 2.5'isms). Revert that commit an introduce our own copy
of check_call in util.py instead.
Tested by changing 'if retcode' to 'if not retcode' temporarily.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Trying to push for local repositories will fail since there is no
local checkout in .git/info/... to push from as that is only used for
non-local repositories (local repositories are pushed to directly).
This went unnoticed because the transport helper infrastructure does
not check the return value of the helper.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This went unnoticed because the transport helper infrastructore did
not check the return value of the helper, nor did the helper print
anything before exiting.
While at it also make sure that the stream doesn't end unexpectedly.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The gitdir capability is recognized by git and can be used to tell
the helper where the .git directory is. But it is not mentioned in
the documentation and considered worse than if gitdir was passed
via GIT_DIR environment variable.
Remove support for the gitdir capability and export GIT_DIR instead.
Teach testgit to use env instead of the now-removed gitdir command.
[sr: fixed up documentation]
Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a remote helper exports to a non-local git repo, the
steps are roughly:
1. fast-export into a local staging area; the set of
interesting refs is defined by what is in the fast-export
stream
2. git push from the staging area to the non-local repo
In the second step, we should explicitly push all refs, not
just matching ones. This will let us push refs that do not
yet exist in the remote repo.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we want to push to a remote helper that has the
"export" capability, we collect all of the refs we want to
push and then feed them to fast-export.
However, the list of refs is actually a list of remote refs,
not local refs. The mapped local refs are included via the
peer_ref pointer. So when we add an argument to our
fast-export command line, we must be sure to use the local
peer_ref name (and if there is no local name, it is because
we are not actually sending that ref, or we may not even
have the ref at all).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon receiving an "import" command, the testgit remote
helper would ignore the ref asked for by git and generate a
fast-export stream based on HEAD. Instead, we should
actually give git the ref it asked for.
This requires adding a new parameter to the export_repo
method in the remote-helpers python library, which may be
used by code outside of git.git. We use a default parameter
so that callers without the new parameter will get the same
behavior as before.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are all things one might expect to work in a helper
that is capable of handling multiple branches (which our
testgit helper in theory should be able to do, as it is
backed by git). All of these bugs are specific to the
import/export codepaths, so they don't affect helpers like
git-remote-curl that use fetch/push commands.
The first and fourth tests are about fetching and pushing
new refs, and demonstrate bugs in the git_remote_helpers
library (so they would be most likely to impact helpers for
other VCSs which import/export git).
The second test is about importing multiple refs; it
demonstrates a bug in git-remote-testgit, which is mostly
for exercising the test code. Therefore it probably doesn't
affect anyone in practice.
The third test demonstrates a bug in git's side of the
helper code when the upstream has added refs that we do not
have locally. This could impact git users who use remote
helpers to access foreign VCSs.
All of those bugs have fixes later in this series.
The fifth test is the most complex, and does not have a fix
in this series. It tests pushing a ref via the export
mechanism to a new name on the remote side (i.e.,
"git push $remote old:new").
The problem is that we push all of the work of generating
the export stream onto fast-export, but we have no way of
communicating to fast-export that this name mapping is
happening. So we tell fast-export to generate a stream with
the commits for "old", but we can't tell it to label them
all as "new".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All tests require python 2.4 or higher.
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These are a little hard to read, and I'm about to add more
just like them. Plus the failure output is nicer if we use
test_cmp than a comparison with "test".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Sverre Rabbelier <srabbelier@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/index-pack:
verify-pack: use index-pack --verify
index-pack: show histogram when emulating "verify-pack -v"
index-pack: start learning to emulate "verify-pack -v"
index-pack: a miniscule refactor
index-pack --verify: read anomalous offsets from v2 idx file
write_idx_file: need_large_offset() helper function
index-pack: --verify
write_idx_file: introduce a struct to hold idx customization options
index-pack: group the delta-base array entries also by type
Conflicts:
builtin/verify-pack.c
cache.h
sha1_file.c
* jn/mime-type-with-params:
gitweb: Serve */*+xml 'blob_plain' as text/plain with $prevent_xss
gitweb: Serve text/* 'blob_plain' as text/plain with $prevent_xss
* jk/clone-cmdline-config:
clone: accept config options on the command line
config: make git_config_parse_parameter a public function
remote: use new OPT_STRING_LIST
parse-options: add OPT_STRING_LIST helper
* jk/maint-config-param:
config: use strbuf_split_str instead of a temporary strbuf
strbuf: allow strbuf_split to work on non-strbufs
config: avoid segfault when parsing command-line config
config: die on error in command-line config
fix "git -c" parsing of values with equals signs
strbuf_split: add a max parameter
* jc/zlib-wrap:
zlib: allow feeding more than 4GB in one go
zlib: zlib can only process 4GB at a time
zlib: wrap deflateBound() too
zlib: wrap deflate side of the API
zlib: wrap inflateInit2 used to accept only for gzip format
zlib: wrap remaining calls to direct inflate/inflateEnd
zlib wrapper: refactor error message formatter
Conflicts:
sha1_file.c
* ak/gcc46-profile-feedback:
Add explanation of the profile feedback build to the README
Add profile feedback build to git
Add option to disable NORETURN
IPv6 hosts are often unreachable on the primarily IPv4 Internet and
therefore we shouldn't print an error if there are still other hosts we
can try to connect() to. This helps "git fetch --quiet" stay quiet.
Signed-off-by: Dave Zarzycki <zarzycki@apple.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description for 'git rebase --abort' currently says:
Restore the original branch and abort the rebase operation.
The "restore" can be misinterpreted to imply that the original branch
was somehow in a broken state during the rebase operation. It is also
not completely clear what "the original branch" is --- is it the
branch that was checked out before the rebase operation was called or
is the the branch that is being rebased (it is the latter)? Although
both issues are made clear in the DESCRIPTION section, let us also
make the entry in the OPTIONS secion more clear.
Also remove the term "rebasing process" from the usage text, since the
user already knows that the text is about "git rebase".
Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The latter is meant to be an API for internal callers that want to inspect
the resulting diff-queue, while the former is an implementation of "git
diff-index" command. Extract the common logic into a single helper
function and make them thin wrappers around it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 34110cd (Make 'unpack_trees()' have a separate source and
destination index, 2008-03-06), we can run unpack_trees() without munging
the index at all, but do_diff_cache() tried ever so carefully to work
around the old behaviour of the function.
We can just tell unpack_trees() not to touch the original index and there
is no need to clean-up whatever the previous round has done.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because "diff --cached HEAD" showed an incorrect blob object name on the
LHS of the diff, we ended up updating the index entry with bogus value,
not what we read from the tree.
Noticed by John Nowak.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/submodule-foreach-stdin-fix-1.7.4:
git-submodule.sh: preserve stdin for the command spawned by foreach
t/t7407: demonstrate that the command called by 'submodule foreach' loses stdin
Conflicts:
git-submodule.sh
* nk/ref-doc:
glossary: clarify description of HEAD
glossary: update description of head and ref
glossary: update description of "tag"
git.txt: de-emphasize the implementation detail of a ref
check-ref-format doc: de-emphasize the implementation detail of a ref
git-remote.txt: avoid sounding as if loose refs are the only ones in the world
git-remote.txt: fix wrong remote refspec
* rj/config-cygwin:
config.c: Make git_config() work correctly when called recursively
t1301-*.sh: Fix the 'forced modes' test on cygwin
help.c: Fix detection of custom merge strategy on cygwin
* fg/submodule-keep-updating:
git-submodule.sh: clarify the "should we die now" logic
submodule update: continue when a checkout fails
git-sh-setup: add die_with_status
Conflicts:
git-submodule.sh
* an/shallow-doc:
Document the underlying protocol used by shallow repositories and --depth commands.
Fix documentation of fetch-pack that implies that the client can disconnect after sending wants.
The documentation for logging updates in git-update-ref, doesn't make it
clear that only a specific subset of refs are honored by this variable.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
10c4c88 (Allow add_path() to add non-existent directories to the path,
2008-07-21) introduced get_pwd_cwd() function in order to favor $PWD when
getenv("PWD") and getcwd() refer to the same directory but are different
strings (e.g. the former gives a nicer looking name via a symbolic link to
an uglier looking automounted path). The function tried to determine if
two directories are the same by running stat(2) on both and comparing
ino/dev fields.
Unfortunately, stat() does not fill any ino or dev fields in msysgit. But
there is a telltale: both ino and dev are 0 when they are not filled
correctly, so let's be extra cautious.
This happens to fix a bug in "get-receive-pack working_directory/" when
the GIT_DIR would not be set correctly due to absolute_path(".")
returning the wrong value.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This optimizes the "recency order" (see pack-heuristics.txt in
Documentation/technical/ directory) used to order objects within a
packfile in three ways:
- Commits at the tip of tags are written together, in the hope that
revision traversal done in incremental fetch (which starts by
putting them in a revision queue marked as UNINTERESTING) will see a
better locality of these objects;
- In the original recency order, trees and blobs are intermixed. Write
trees together before blobs, in the hope that this will improve
locality when running pathspec-limited revision traversal, i.e.
"git log paths...";
- When writing blob objects out, write the whole family of blobs that use
the same delta base object together, by starting from the root of the
delta chain, and writing its immediate children in a width-first
manner, in the hope that this will again improve locality when reading
blobs that belong to the same path, which are likely to be deltified
against each other.
I tried various workloads in the Linux kernel repositories (HEAD at
v3.0-rc6-71-g4dd1b49) packed with v1.7.6 and with this patch, counting how
large seeks are needed between adjacent accesses to objects in the pack,
and the result looks promising. The history has 2072052 objects, weighing
some 490MiB.
* Simple commit-only log.
$ git log >/dev/null
There are 254656 commits in total.
v1.7.6 with patch
Total number of access : 258,031 258,032
0.0% percentile : 12 12
10.0% percentile : 259 259
20.0% percentile : 294 294
30.0% percentile : 326 326
40.0% percentile : 363 363
50.0% percentile : 415 415
60.0% percentile : 513 513
70.0% percentile : 857 858
80.0% percentile : 10,434 10,441
90.0% percentile : 91,985 91,996
95.0% percentile : 260,852 260,885
99.0% percentile : 1,150,680 1,152,811
99.9% percentile : 3,148,435 3,148,435
Less than 2MiB seek: 99.70% 99.69%
95% of the pack accesses look at data that is no further than 260kB
from the previous location we accessed. The patch does not change the
order of commit objects very much, and the result is very similar.
* Pathspec-limited log.
$ git log drivers/net >/dev/null
The path is touched by 26551 commits and merges (among 254656 total).
v1.7.6 with patch
Total number of access : 559,511 558,663
0.0% percentile : 0 0
10.0% percentile : 182 167
20.0% percentile : 259 233
30.0% percentile : 357 304
40.0% percentile : 714 485
50.0% percentile : 5,046 3,976
60.0% percentile : 688,671 443,578
70.0% percentile : 319,574,732 110,370,100
80.0% percentile : 361,647,599 123,707,229
90.0% percentile : 393,195,669 128,947,636
95.0% percentile : 405,496,875 131,609,321
99.0% percentile : 412,942,470 133,078,115
99.5% percentile : 413,172,266 133,163,349
99.9% percentile : 413,354,356 133,240,445
Less than 2MiB seek: 61.71% 62.87%
With the current pack heuristics, more than 30% of accesses have to
seek further than 300MB; the updated pack heuristics ensures that less
than 0.1% of accesses have to seek further than 135MB. This is largely
due to the fact that the updated heuristics does not mix blobs and
trees together.
* Blame.
$ git blame drivers/net/ne.c >/dev/null
The path is touched by 34 commits and merges.
v1.7.6 with patch
Total number of access : 178,147 178,166
0.0% percentile : 0 0
10.0% percentile : 142 139
20.0% percentile : 222 194
30.0% percentile : 373 300
40.0% percentile : 1,168 837
50.0% percentile : 11,248 7,334
60.0% percentile : 305,121,284 106,850,130
70.0% percentile : 361,427,854 123,709,715
80.0% percentile : 388,127,343 128,171,047
90.0% percentile : 399,987,762 130,200,707
95.0% percentile : 408,230,673 132,174,308
99.0% percentile : 412,947,017 133,181,160
99.5% percentile : 413,312,798 133,220,425
99.9% percentile : 413,352,366 133,269,051
Less than 2MiB seek: 56.47% 56.83%
The result is very similar to the pathspec-limited log above, which
only looks at the tree objects.
* Packing recent history.
$ (git for-each-ref --format='^%(refname)' refs/tags; echo HEAD) |
git pack-objects --revs --stdout >/dev/null
This should pack data worth 71 commits.
v1.7.6 with patch
Total number of access : 11,511 11,514
0.0% percentile : 0 0
10.0% percentile : 48 47
20.0% percentile : 134 98
30.0% percentile : 332 178
40.0% percentile : 1,386 293
50.0% percentile : 8,030 478
60.0% percentile : 33,676 1,195
70.0% percentile : 147,268 26,216
80.0% percentile : 9,178,662 464,598
90.0% percentile : 67,922,665 965,782
95.0% percentile : 87,773,251 1,226,102
99.0% percentile : 98,011,763 1,932,377
99.5% percentile : 100,074,427 33,642,128
99.9% percentile : 105,336,398 275,772,650
Less than 2MiB seek: 77.09% 99.04%
The long-tail part of the result looks worse with the patch, but
the change helps majority of the access. 99.04% of the accesses
need less than 2MiB of seeking, compared to 77.09% with the current
packing heuristics.
* Index pack.
$ git index-pack -v .git/objects/pack/pack*.pack
v1.7.6 with patch
Total number of access : 2,791,228 2,788,802
0.0% percentile : 9 9
10.0% percentile : 140 89
20.0% percentile : 233 167
30.0% percentile : 322 235
40.0% percentile : 464 310
50.0% percentile : 862 423
60.0% percentile : 2,566 686
70.0% percentile : 25,827 1,498
80.0% percentile : 1,317,862 4,971
90.0% percentile : 11,926,385 119,398
95.0% percentile : 41,304,149 952,519
99.0% percentile : 227,613,070 6,709,650
99.5% percentile : 321,265,121 11,734,871
99.9% percentile : 382,919,785 33,155,191
Less than 2MiB seek: 81.73% 96.92%
As the index-pack command already walks objects in the delta chain
order, writing the blobs out in the delta chain order seems to
drastically improve the locality of access.
Note that a half-a-gigabyte packfile comfortably fits in the buffer cache,
and you would unlikely to see much performance difference on a modern and
reasonably beefy machine with enough memory and local disks. Benchmarking
with cold cache (or over NFS) would be interesting.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When executing an external shell script like `git foo` with a bad
shebang, e.g. "#!/usr/bin/not/existing", execvp returns 127 (ENOENT).
Since help_unknown_cmd proposes the use of all external commands similar
to the name of the "unknown" command, it suggests the just failed command
again. Stop it and give some advice to the user.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Schubert <mschub@elegosoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a workload other than "git log" (without pathspec nor any option that
causes us to inspect trees and blobs), the recency pack order is said to
cause the access jump around quite a bit. Add a hook to allow us observe
how bad it is.
"git config core.logpackaccess /var/tmp/pal.txt" will give you the log
in the specified file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>