We don't want the data being deflated and stored into loose objects
to be different from what we expect. While the deflated data is
protected by a CRC which is good enough for safe data retrieval
operations, we still want to be doubly sure that the source data used
at object creation time is still what we expected once that data has
been deflated and its CRC32 computed.
The most plausible data corruption may occur if the source file is
modified while Git is deflating and writing it out in a loose object.
Or Git itself could have a bug causing memory corruption. Or even bad
RAM could cause trouble. So it is best to make sure everything is
coherent and checksum protected from beginning to end.
To do so we compute the SHA1 of the data being deflated _after_ the
deflate operation has consumed that data, and make sure it matches
with the expected SHA1. This way we can rely on the CRC32 checked by
the inflate operation to provide a good indication that the data is still
coherent with its SHA1 hash. One pathological case we ignore is when
the data is modified before (or during) deflate call, but changed back
before it is hashed.
There is some overhead of course. Using 'git add' on a set of large files:
Before:
real 0m25.210s
user 0m23.783s
sys 0m1.408s
After:
real 0m26.537s
user 0m25.175s
sys 0m1.358s
The overhead is around 5% for full data coherency guarantee.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch adds two test cases for:
6977c25 git diff --quiet -w: check and report the status
Signed-off-by: Larry D'Anna <larry@elder-gods.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/push-sideband:
receive-pack: Send internal errors over side-band #2
t5401: Use a bare repository for the remote peer
receive-pack: Send hook output over side band #2
receive-pack: Wrap status reports inside side-band-64k
receive-pack: Refactor how capabilities are shown to the client
send-pack: demultiplex a sideband stream with status data
run-command: support custom fd-set in async
run-command: Allow stderr to be a caller supplied pipe
Using read() instead of mmap() can be 39% speed up for 1Kb files and is
1% speed up 1Mb files. For larger files, it is better to use mmap(),
because the difference between is not significant, and when there is not
enough memory, mmap() performs much better, because it avoids swapping.
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no real advantage to malloc the whole output buffer and
deflate the data in a single pass when writing loose objects. That is
like only 1% faster while using more memory, especially with large
files where memory usage is far more. It is best to deflate and write
the data out in small chunks reusing the same memory instead.
For example, using 'git add' on a few large files averaging 40 MB ...
Before:
21.45user 1.10system 0:22.57elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+142640minor)pagefaults 0swaps
After:
21.50user 1.25system 0:22.76elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+828040outputs (0major+104408minor)pagefaults 0swaps
While the runtime stayed relatively the same, the number of minor page
faults went down significantly.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Testing pagination requires (fake or real) access to a terminal so we
can see whether the pagination automatically kicks in, which makes it
hard to get good coverage when running tests without --verbose. There
are a number of ways to work around that:
- Replace all isatty calls with calls to a custom xisatty wrapper
that usually checks for a terminal but can be overridden for tests.
This would be workable, but it would require implementing xisatty
separately in three languages (C, shell, and perl) and making sure
that any code that is to be tested always uses the wrapper.
- Redirect stdout to /dev/tty. This would be problematic because
there might be no terminal available, and even if a terminal is
available, it might not be appropriate to spew output to it.
- Create a new pseudo-terminal on the fly and capture its output.
This patch implements the third approach.
The new test-terminal.perl helper uses IO::Pty from Expect.pm to create
a terminal and executes the program specified by its arguments with
that terminal as stdout. If the IO::Pty module is missing or not
working on a system, the test script will maintain its old behavior
(skipping most of its tests unless GIT_TEST_OPTS includes --verbose).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
git-p4: fix bug in symlink handling
t1450: fix testcases that were wrongly expecting failure
Documentation: Fix indentation problem in git-commit(1)
The --cherry-pick logic starts by counting the commits on each side,
so that it can filter away commits on the bigger one. However, so
far it missed an opportunity for optimization: it doesn't need to do
any work if either side is empty.
This in particular helps the common use-case 'git rebase -i HEAD~$n':
it internally uses --cherry-pick, but since HEAD~$n is a direct
ancestor the left side is always empty.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git’s automatic pagination support has some subtleties. Add some
tests to make sure we don’t break:
- when git will use a pager by default;
- the effect of the --paginate and --no-pager options;
- the effect of pagination on use of color;
- how the choice of pager is configured.
This does not yet test:
- use of pager by scripted commands (git svn and git am);
- effect of the pager.* configuration variables;
- setting of the LESS variable.
Some features involve checking whether stdout is a terminal, so many
of these tests are skipped unless output is passed through to the
terminal (i.e., unless $GIT_TEST_OPTS includes --verbose).
The immediate purpose for these tests was to avoid making things worse
after the breakage from my jn/editor-pager series (see commit 376f39,
2009-11-20). Thanks to Sebastian Celis <sebastian@sebastiancelis.com>
for the report.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
is_utf8() works by calling utf8_width() for each character at the
supplied location. In strbuf_add_wrapped_text(), we do that anyway
while wrapping the lines. So instead of checking the encoding
beforehand, optimistically assume that it's utf-8 and wrap along
until an invalid character is hit, and when that happens start over.
This pays off if the text consists only of valid utf-8 characters.
The following command was run against the Linux kernel repo with
git 1.7.0:
$ time git log --format='%b' v2.6.32 >/dev/null
real 0m2.679s
user 0m2.580s
sys 0m0.100s
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m4.342s
user 0m4.230s
sys 0m0.110s
And with this patch series:
$ time git log --format='%w(60,4,8)%b' >/dev/null
real 0m3.741s
user 0m3.630s
sys 0m0.110s
So the cost of wrapping is reduced to 70% in this case.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch before the previous one made sure that all callers of
strbuf_add_wrapped_text() supply a strbuf. Replace all calls of
strbuf_write() with regular strbuf functions and remove it.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous patch made sure that strbuf_add_wrapped_text() (and thus
strbuf_add_indented_text(), too) always get a strbuf. Make use of
this fact by adding strbuf_addchars(), a small helper that adds a
char the specified number of times to a strbuf, and use it to replace
print_spaces().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
strbuf_add_wrapped_text() is called only from print_wrapped_text()
without a strbuf (in which case it writes its results to stdout).
At its only callsite, supply a strbuf, call strbuf_add_wrapped_text()
directly and remove the wrapper function.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix inadvertent breakage from b932705 (git-p4: stream from perforce to
speed up clones, 2009-07-30) in the code that strips the trailing '\n'
from p4 print on a symlink. (In practice, contents is of the form
['target\n', ''].)
Signed-off-by: Evan Powers <evan.powers@gmail.com>
Acked-by: Pete Wyckoff <pw@padd.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Almost exactly a year ago in 02a6552 (Test fsck a bit harder), I
introduced two testcases that were expecting failure.
However, the only bug was that the testcases wrote *blobs* because I
forgot to pass -t tag to hash-object. Fix this, and then adjust the
rest of the test to properly check the result.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ever since the "See linkgit:git-config[1]..." paragraph was added to the
description for --untracked-files (d6293d1), the paragraphs for the
following options were indented at the same level as the "See
linkgit:git-config[1]" paragraph. This problem showed up in the
manpages, but not in the HTML documentation.
While this does fix the alignment of the options following
--untracked-files in the manpage, the "See linkgit..." portion of the
description does not retain its previous indentation level in the
manpages, or HTML documentation.
Signed-off-by: Jacob Helwig <jacob.helwig@gmail.com>
Acked-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we remove a path in a/deep/subdirectory, we should try to
remove as many trailing components as possible (i.e.,
subdirectory, then deep, then a). However, the test for the
return value of rmdir was reversed, so we only ever deleted
at most one level.
The fix is in remove_path, so "apply" and "merge-recursive"
also are fixed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make git-branch, git-show-branch, git-grep, and all the diff-based
programs accept an optional argument <when> for --color. The argument
is a colorbool: "always", "never", or "auto". If no argument is given,
"always" is used; --no-color is an alias for --color=never. This makes
the command-line interface consistent with other GNU tools, such as `ls'
and `grep', and with the git-config color options. Note that, without
an argument, --color and --no-color work exactly as before.
To implement this, two internal changes were made:
1. Allow the first argument of git_config_colorbool() to be NULL,
in which case it returns -1 if the argument isn't "always", "never",
or "auto".
2. Add OPT_COLOR_FLAG(), OPT__COLOR(), and parse_opt_color_flag_cb()
to the option parsing library. The callback uses
git_config_colorbool(), so color.h is now a dependency
of parse-options.c.
Signed-off-by: Mark Lodato <lodatom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The description for --thin was misleading and downright wrong. Correct
it with some inspiration from the description of index-pack's --fix-thin
and some background information from Nicolas Pitre <nico@fluxnic.net>.
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is misleading to say that we pull refs from $GIT_DIR/refs/*, because we
may also consult the packed refs mechanism. These days we tend to treat
the "refs hierarchy" as more of an abstract namespace that happens to be
represented as $GIT_DIR/refs. At best, this is a minor inaccuracy, but at
worst it can confuse users who then look in $GIT_DIR/refs and find that it
is missing some of the refs they expected to see.
This patch drops most uses of "$GIT_DIR/refs/*", changing them into just
"refs/*", under the assumption that users can handle the concept of an
abstract refs namespace. There are a few things to note:
- most cases just dropped the $GIT_DIR/ portion. But for cases where
that left _just_ the word "refs", I changed it to "refs/" to help
indicate that it was a hierarchy. I didn't do the same for longer
paths (e.g., "refs/heads" remained, instead of becoming
"refs/heads/").
- in some cases, no change was made, as the text was explicitly about
unpacked refs (e.g., the discussion in git-pack-refs).
- In some cases it made sense instead to note the existence of packed
refs (e.g., in check-ref-format and rev-parse).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following function is duplicated:
encode_header
Move this function to sha1_file.c and rename it 'encode_in_pack_object_header',
as suggested by Junio C Hamano
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/fast-import-idx-v2:
fast-import: use the diff_delta() max_delta_size argument
fast-import: honor pack.indexversion and pack.packsizelimit config vars
fast-import: make default pack size unlimited
fast-import: use write_idx_file() instead of custom code
fast-import: use sha1write() for pack data
fast-import: start using struct pack_idx_entry
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following function is duplicated:
fill_mm
Move it to xdiff-interface.c and rename it 'read_mmblob', as suggested
by Junio C Hamano.
Also, change parameters order for consistency with read_mmfile().
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following functions are (almost) identical:
verify_remote_names
update_tracking_ref
refs_pushed
print_push_status
Move common versions of these functions to transport.c and rename
them, as suggested by Jeff King and Junio C Hamano.
These functions have been removed entirely from builtin-send-pack.c,
since they are only used internally by print_push_status():
print_ref_status
status_abbrev
print_ok_ref_status
print_one_push_status
Also, move #define SUMMARY_WIDTH to transport.h and rename it
TRANSPORT_SUMMARY_WIDTH as it is used in builtin-fetch.c and
transport.c
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Acked-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following functions:
git_tcp_connect_sock (IPV6 version)
git_tcp_connect_sock (no IPV6 version),
git_proxy_connect
have common block of code. Move it to a new function 'get_host_and_port'
Signed-off-by: Michael Lukashov <michael.lukashov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/cherry-pick-reword:
cherry-pick: prettify the advice message
cherry-pick: show commit name instead of sha1
cherry-pick: format help message as strbuf
cherry-pick: refactor commit parsing code
cherry-pick: rewrap advice message
This is a bit of future-proofing esc_html and friends: when called
with undefined value they would now would return undef... which would
probably mean that error would still occur, but closer to the source
of problem.
This means that we can safely use
esc_html(shift) || "Internal Server Error"
in die_error() instead of
esc_html(shift || "Internal Server Error")
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message (second argument to die_error) is meant to be short,
one-line text description of given error. A few callers call
die_error with error message containing unescaped user supplied data
($hash, $file_name). Instead of forcing callers to escape data,
simply call esc_html on the parameter.
Note that optional third parameter, which contains detailed error
description, is meant to be HTML formatted, and therefore should be
not escaped.
While at it update esc_html synopsis/usage, and bring default error
description to read 'Internal Server Error' (titlecased).
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When invoking "git submodule summary" in an empty repo (which can be
indirectly done by setting status.submodulesummary = true), it currently
emits an error message (via "git diff-index") since HEAD points to an
unborn branch.
This patch adds handling of the HEAD-points-to-unborn-branch special case,
so that "git submodule summary" no longer emits this error message.
The patch also adds a test case that verifies the fix.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This let diff_delta() abort early if it is going to bust the given
size limit. Also, only objects larger than 20 bytes are considered
as objects smaller than that are most certainly going to produce
larger deltas than the original object due to the additional headers.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Now that fast-import is creating packs with index version 2, there is
no point limiting the pack size by default. A pack split will still
happen if off_t is not sufficiently large to hold large offsets.
While updating the doc, let's remove the "packfiles fit on CDs"
suggestion. Pack files created by fast-import are still suboptimal and
a 'git repack -a -f -d' or even 'git gc --aggressive' would be a pretty
good idea before considering storage on CDs.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows for the creation of pack index version 2 with its object
CRC and the possibility for a pack to be larger than 4 GB.
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation for using write_idx_file(). Also, by using
sha1write() we get some buffering to reduces the number of write
syscalls, and the written data is SHA1 summed which allows for the extra
data integrity validation check performed in fixup_pack_header_footer()
(details on this in commit abeb40e5aa).
Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>