"git pack-objects" in a repository with many packfiles used to
spend a lot of time looking for/at objects in them; the accesses to
the packfiles are now optimized by checking the most-recently-used
packfile first.
* jk/pack-objects-optim-mru:
pack-objects: use mru list when iterating over packs
pack-objects: break delta cycles before delta-search phase
sha1_file: make packed_object_info public
provide an initializer for "struct object_info"
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant. A new QSORT() macro lets us
omit it.
* rs/qsort:
show-branch: use QSORT
use QSORT, part 2
coccicheck: use --all-includes by default
remove unnecessary check before QSORT
use QSORT
add QSORT
When a client pushes objects to us, index-pack checks the
objects themselves and then installs them into place. If we
then reject the push due to a pre-receive hook, we cannot
just delete the packfile; other processes may be depending
on it. We have to do a normal reachability check at this
point via `git gc`.
But such objects may hang around for weeks due to the
gc.pruneExpire grace period. And worse, during that time
they may be exploded from the pack into inefficient loose
objects.
Instead, this patch teaches receive-pack to put the new
objects into a "quarantine" temporary directory. We make
these objects available to the connectivity check and to the
pre-receive hook, and then install them into place only if
it is successful (and otherwise remove them as tempfiles).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On a case-insensitive filesystem, we should realize that
"a/objects" and "A/objects" are the same path. We already
use fspathcmp() to check against the main object directory,
but until recently we couldn't use it for comparing against
other alternates (because their paths were not
NUL-terminated strings). But now we can, so let's do so.
Note that we also need to adjust count-objects to load the
config, so that it can see the setting of core.ignorecase
(this is required by the test, but is also a general bugfix
for users of count-objects).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We recursively expand alternates repositories, so that if A
borrows from B which borrows from C, A can see all objects.
For the root object database, we allow relative paths, so A
can point to B as "../B/objects". However, we currently do
not allow relative paths when recursing, so B must use an
absolute path to reach C.
That is an ancient protection from c2f493a (Transitively
read alternatives, 2006-05-07) that tries to avoid adding
the same alternate through two different paths. Since
5bdf0a8 (sha1_file: normalize alt_odb path before comparing
and storing, 2011-09-07), we use a normalized absolute path
for each alt_odb entry.
This means that in most cases the protection is no longer
necessary; we will detect the duplicate no matter how we got
there (but see below). And it's a good idea to get rid of
it, as it creates an unnecessary complication when setting
up recursive alternates (B has to know that A is going to
borrow from it and make sure to use an absolute path).
Note that our normalization doesn't actually look at the
filesystem, so it can still be fooled by crossing symbolic
links. But that's also true of absolute paths, so it's not a
good reason to disallow only relative paths (it's
potentially a reason to switch to real_path(), but that's a
separate and non-trivial change).
We adjust the test script here to demonstrate that this now
works, and add new tests to show that the normalization does
indeed suppress duplicates.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no way to get the list of alternates that git
computes internally; our tests only infer it based on which
objects are available. In addition to testing, knowing this
list may be helpful for somebody debugging their alternates
setup.
Let's add it to the "count-objects -v" output. We could give
it a separate flag, but there's not really any need.
"count-objects -v" is already a debugging catch-all for the
object database, its output is easily extensible to new data
items, and printing the alternates is not expensive (we
already had to find them to count the objects).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These tests are just trying to show that we allow recursion
up to a certain depth, but not past it. But the counting is
a bit non-intuitive, and rather than test at the edge of the
breakage, we test "OK" cases in the middle of the chain.
Let's explain what's going on, and explicitly test the
switch between "OK" and "too deep".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to the previous patch, though no user reported a bug and
I could not find a regressive behavior.
However it is a good thing to be strict on the output and for that we
always omit a trailing slash.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before 63e95beb0 (2016-04-15, submodule: port resolve_relative_url from
shell to C), it did not matter if the superprojects URL had a trailing
slash or not. It was just chopped off as one of the first steps
(The "remoteurl=${remoteurl%/}" near the beginning of
resolve_relative_url(), which was removed in said commit).
When porting this to the C version, an off-by-one error was introduced
and we did not check the actual last character to be a slash, but the
NULL delimiter.
Reintroduce the behavior from before 63e95beb0, to ignore the trailing
slash.
Reported-by: <venv21@gmail.com>
Helped-by: Dennis Kaarsemaker <dennis@kaarsemaker.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pathspecs can be a bit tricky when trying to apply them to submodules.
The main challenge is that the pathspecs will be with respect to the
superproject and not with respect to paths in the submodule. The
approach this patch takes is to pass in the identical pathspec from the
superproject to the submodule in addition to the submodule-prefix, which
is the path from the root of the superproject to the submodule, and then
we can compare an entry in the submodule prepended with the
submodule-prefix to the pathspec in order to determine if there is a
match.
This patch also permits the pathspec logic to perform a prefix match against
submodules since a pathspec could refer to a file inside of a submodule.
Due to limitations in the wildmatch logic, a prefix match is only done
literally. If any wildcard character is encountered we'll simply punt
and produce a false positive match. More accurate matching will be done
once inside the submodule. This is due to the superproject not knowing
what files could exist in the submodule.
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pass through some known-safe options when recursing into submodules.
(--cached, -v, -t, -z, --debug, --eol)
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow ls-files to recognize submodules in order to retrieve a list of
files from a repository's submodules. This is done by forking off a
process to recursively call ls-files on all submodules. Use top-level
--super-prefix option to pass a path to the submodule which it can
use to prepend to output or pathspec matching logic.
Signed-off-by: Brandon Williams <bmwill@google.com>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our ref resolution first runs lstat() on any path we try to
look up, because we want to treat symlinks specially (by
resolving them manually and considering them symrefs). But
if the results of `readlink` do _not_ look like a ref, we
fall through to treating it like a normal file, and just
read the contents of the linked path.
Since fcb7c76 (resolve_ref_unsafe(): close race condition
reading loose refs, 2013-06-19), that "normal file" code
path will stat() the file and if we see ENOENT, will jump
back to the lstat(), thinking we've seen inconsistent
results between the two calls. But for a symbolic ref, this
isn't a race: the lstat() found the symlink, and the stat()
is looking at the path it points to. We end up in an
infinite loop calling lstat() and stat().
We can fix this by avoiding the retry-on-inconsistent jump
when we know that we found a symlink. While we're at it,
let's add a comment explaining why the symlink case gets to
this code in the first place; without that, it is not
obvious that the correct solution isn't to avoid the stat()
code path entirely.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "%C(auto)" appears at the very beginning of the pretty format
string, it did not need to issue the reset sequence, but it did.
* rs/c-auto-resets-attributes:
pretty: avoid adding reset for %C(auto) if output is empty
"git log rev^..rev" is an often-used revision range specification
to show what was done on a side branch merged at rev. This has
gained a short-hand "rev^-1". In general "rev^-$n" is the same as
"^rev^$n rev", i.e. what has happened on other branches while the
history leading to nth parent was looking the other way.
* vn/revision-shorthand-for-side-branch-log:
revision: new rev^-n shorthand for rev^n..rev
When given an abbreviated object name that is not (or more
realistically, "no longer") unique, we gave a fatal error
"ambiguous argument". This error is now accompanied by hints that
lists the objects that begins with the given prefix. During the
course of development of this new feature, numerous minor bugs were
uncovered and corrected, the most notable one of which is that we
gave "short SHA1 xxxx is ambiguous." twice without good reason.
* jk/ambiguous-short-object-names:
get_short_sha1: make default disambiguation configurable
get_short_sha1: list ambiguous objects on error
for_each_abbrev: drop duplicate objects
sha1_array: let callbacks interrupt iteration
get_short_sha1: mark ambiguity error for translation
get_short_sha1: NUL-terminate hex prefix
get_short_sha1: refactor init of disambiguation code
get_short_sha1: parse tags when looking for treeish
get_sha1: propagate flags to child functions
get_sha1: avoid repeating ourselves via ONLY_TO_DIE
get_sha1: detect buggy calls with multiple disambiguators
With the preparatory steps, it has become trivial to teach the
system a new diff.wsErrorHighlight configuration that gives the
default value for --ws-error-highlight command line option.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We'd want to run this same set of test twice, once with the option
and another time with an equivalent configuration setting. Split
out the step that prepares the test data and expected output and
move the test for the command line option into a separate test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our usual style when working with subdirectories is to chdir
inside a subshell or to use "git -C", which means we do not
have to constantly return to the main test directory. Let's
convert this old test, which does not follow that style.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our normal test style these days puts the opening quote of
the body on the description line, and indents the body with
a single tab. This ancient test did not follow this.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Besides being our normal style, this correctly checks for an
error exit() versus signal death.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function makes sure that "git fsck" does not report any
errors. But "--full" has been the default since f29cd39
(fsck: default to "git fsck --full", 2009-10-20), and we can
use the exit code (instead of counting the lines) since
e2b4f63 (fsck: exit with non-zero status upon errors,
2007-03-05).
So we can just use "git fsck", which is shorter and more
flexible (e.g., we can use "git -C").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function was never used since its inception in dd05ea1
(test case for transitive info/alternates, 2006-05-07).
Which is just as well, since it mutates the repo state in a
way that would invalidate further tests, without cleaning up
after itself. Let's get rid of it so that nobody is tempted
to use it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The codepath in "git fsck" to detect malformed tree objects has
been updated not to die but keep going after detecting them.
* dt/tree-fsck:
fsck: handle bad trees like other errors
tree-walk: be more specific about corrupt tree errors
An author name, that spelled a backslash-quoted double quote in the
human readable part "My \"double quoted\" name", was not unquoted
correctly while applying a patch from a piece of e-mail.
* kd/mailinfo-quoted-string:
mailinfo: unescape quoted-pair in header fields
t5100-mailinfo: replace common path prefix with variable
"git init" tried to record core.worktree in the repository's
'config' file when GIT_WORK_TREE environment variable was set and
it was different from where GIT_DIR appears as ".git" at its top,
but the logic was faulty when .git is a "gitdir:" file that points
at the real place, causing trouble in working trees that are
managed by "git worktree". This has been corrected.
* nd/init-core-worktree-in-multi-worktree-world:
init: kill git_link variable
init: do not set unnecessary core.worktree
init: kill set_git_dir_init()
init: call set_git_dir_init() from within init_db()
init: correct re-initialization from a linked worktree
"gitweb" can spawn "highlight" to show blob contents with
(programming) language-specific syntax highlighting, but only
when the language is known. "highlight" can however be told
to make the guess itself by giving it "--force" option, which
has been enabled.
* ik/gitweb-force-highlight:
gitweb: use highlight's shebang detection
gitweb: remove unused guess_file_syntax() parameter
"git pack-objects --include-tag" was taught that when we know that
we are sending an object C, we want a tag B that directly points at
C but also a tag A that points at the tag B. We used to miss the
intermediate tag B in some cases.
* jk/pack-tag-of-tag:
pack-objects: walk tag chains for --include-tag
t5305: simplify packname handling
t5305: use "git -C"
t5305: drop "dry-run" of unpack-objects
t5305: move cleanup into test block
We emit an escape sequence for resetting color and attribute for
%C(auto) to make sure automatic coloring is displayed as intended.
Stop doing that if the output strbuf is empty, i.e. when %C(auto)
appears at the start of the format string, because then there is no
need for a reset and we save a few bytes in the output.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git format-patch --stdout" output is placed as an in-body
header and it uses the RFC2822 header folding, "git am" failed to
put the header line back into a single logical line. The
underlying "git mailinfo" was taught to handle this properly.
* jt/mailinfo-fold-in-body-headers:
mailinfo: handle in-body header continuations
mailinfo: make is_scissors_line take plain char *
mailinfo: separate in-body header processing
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.
* tg/add-chmod+x-fix:
t3700-add: do not check working tree file mode without POSIXPERM
t3700-add: create subdirectory gently
add: modify already added files when --chmod is given
read-cache: introduce chmod_index_entry
update-index: add test for chmod flags
When "git rebase -i" is given a broken instruction, it told the
user to fix it with "--edit-todo", but didn't say what the step
after that was (i.e. "--continue").
* rt/rebase-i-broken-insn-advise:
rebase -i: improve advice on bad instruction lines
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
"git checkout <word>" does not follow the usual disambiguation
rules when the <word> can be both a rev and a path, to allow
checking out a branch 'foo' in a project that happens to have a
file 'foo' in the working tree without having to disambiguate.
This was poorly documented and the check was incorrect when the
command was run from a subdirectory.
* nd/checkout-disambiguation:
checkout: fix ambiguity check in subdir
checkout.txt: document a common case that ignores ambiguation rules
checkout: add some spaces between code and comment
Even when "git pull --rebase=preserve" (and the underlying "git
rebase --preserve") can complete without creating any new commit
(i.e. fast-forwards), it still insisted on having a usable ident
information (read: user.email is set correctly), which was less
than nice. As the underlying commands used inside "git rebase"
would fail with a more meaningful error message and advice text
when the bogus ident matters, this extra check was removed.
* jk/rebase-i-drop-ident-check:
rebase-interactive: drop early check for valid ident
"git format-patch --base=..." feature that was recently added
showed the base commit information after "-- " e-mail signature
line, which turned out to be inconvenient. The base information
has been moved above the signature line.
* jt/format-patch-base-info-above-sig:
format-patch: show base info before email signature
Performance tests done via "t/perf" did not use the same set of
build configuration if the user relied on autoconf generated
configuration.
* ks/perf-build-with-autoconf:
t/perf/run: copy config.mak.autogen & friends to build area
"git diff -W" output needs to extend the context backward to
include the header line of the current function and also forward to
include the body of the entire current function up to the header
line of the next one. This process may have to merge to adjacent
hunks, but the code forgot to do so in some cases.
* rs/xdiff-merge-overlapping-hunks-for-W-context:
xdiff: fix merging of hunks with -W context and -u context
"git fetch http::/site/path" did not die correctly and segfaulted
instead.
* jk/fix-remote-curl-url-wo-proto:
remote-curl: handle URLs without protocol
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.
The only thing git currently does is removing exterior quotes, but
quotes within are left alone.
Remove exterior quotes and remove escape characters so that they don't
show up in the author field.
Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many tests need to store data in a file, and repeat the same pattern to
refer to that path:
"$TEST_DIRECTORY"/t5100/
Create a variable that contains this path, and use that instead.
While we're making this change, make sure the quotes are not just around
the variable, but around the entire string to not give the impression
we want shell splitting to affect the other variables.
Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue. Now fsck can tell the user which tree is
bad, too.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.
Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.
Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log rev^..rev" is commonly used to show all work done on and merged
from a side branch. This patch introduces a shorthand "rev^-" for this
and additionally allows "rev^-$n" to mean "reachable from rev, excluding
what is reachable from the nth parent of rev". For example, for a
two-parent merge, you can use rev^-2 to get the set of commits which were
made to the main branch while the topic branch was prepared.
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we find ambiguous short sha1s, we may get a
disambiguation rule from our caller's context. But if we
don't, we fall back to treating all sha1s the same, even
though most projects will tend to refer only to commits by
their short sha1s.
This patch introduces a configuration option that lets the
user pick a different fallback (e.g., only commits). It's
possible that we may want to make this the default, but it's
a good idea to start as a config option for two reasons:
1. It lets people experiment with this and see if it's a
good idea (i.e., the "tend to" above is an assumption;
we don't really know if this will break some obscure
cases).
2. Even if we do flip the default, it gives people an
escape hatch if it causes problems (you can sometimes
override it by asking for "1234^{tree}", but not all
combinations are possible).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rebase -i" is given a broken instruction, it told the
user to fix it with "--edit-todo", but didn't say what the step
after that was (i.e. "--continue").
* rt/rebase-i-broken-insn-advise:
rebase -i: improve advice on bad instruction lines
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.
* tg/add-chmod+x-fix:
t3700-add: do not check working tree file mode without POSIXPERM
t3700-add: create subdirectory gently
add: modify already added files when --chmod is given
read-cache: introduce chmod_index_entry
update-index: add test for chmod flags
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region. This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.
* js/regexec-buf:
regex: use regexec_buf()
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
"git checkout <word>" does not follow the usual disambiguation
rules when the <word> can be both a rev and a path, to allow
checking out a branch 'foo' in a project that happens to have a
file 'foo' in the working tree without having to disambiguate.
This was poorly documented and the check was incorrect when the
command was run from a subdirectory.
* nd/checkout-disambiguation:
checkout: fix ambiguity check in subdir
checkout.txt: document a common case that ignores ambiguation rules
checkout: add some spaces between code and comment
Even more i18n.
* va/i18n-more:
i18n: stash: mark messages for translation
i18n: notes-merge: mark die messages for translation
i18n: ident: mark hint for translation
i18n: i18n: diff: mark die messages for translation
i18n: connect: mark die messages for translation
i18n: commit: mark message for translation
In some projects, it is common to use "[RFC PATCH]" as the subject
prefix for a patch meant for discussion rather than application. A
new option "--rfc" was a short-hand for "--subject-prefix=RFC PATCH"
to help the participants of such projects.
* jt/format-patch-rfc:
format-patch: add "--rfc" for the common case of [RFC PATCH]
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same. A command line option is added to help with the
experiment to find a good heuristics.
* mh/diff-indent-heuristic:
blame: honor the diff heuristic options and config
parse-options: add parse_opt_unknown_cb()
diff: improve positioning of add/delete blocks in diffs
xdl_change_compact(): introduce the concept of a change group
recs_match(): take two xrecord_t pointers as arguments
is_blank_line(): take a single xrecord_t as argument
xdl_change_compact(): only use heuristic if group can't be matched
xdl_change_compact(): fix compaction heuristic to adjust ixo
The pretty-format specifier "%C(auto)" used by the "log" family of
commands to enable coloring of the output is taught to also issue a
color-reset sequence to the output.
* rs/c-auto-resets-attributes:
pretty: let %C(auto) reset all attributes
When the user gives us an ambiguous short sha1, we print an
error and refuse to resolve it. In some cases, the next step
is for them to feed us more characters (e.g., if they were
retyping or cut-and-pasting from a full sha1). But in other
cases, that might be all they have. For example, an old
commit message may have used a 7-character hex that was
unique at the time, but is now ambiguous. Git doesn't
provide any information about the ambiguous objects it
found, so it's hard for the user to find out which one they
probably meant.
This patch teaches get_short_sha1() to list the sha1s of the
objects it found, along with a few bits of information that
may help the user decide which one they meant. Here's what
it looks like on git.git:
$ git rev-parse b2e1
error: short SHA1 b2e1 is ambiguous
hint: The candidates are:
hint: b2e1196 tag v2.8.0-rc1
hint: b2e11d1 tree
hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
hint: b2e1759 blob
hint: b2e18954 blob
hint: b2e1895c blob
fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
We show the tagname for tags, and the date and subject for
commits. For trees and blobs, in theory we could dig in the
history to find the paths at which they were present. But
that's very expensive (on the order of 30s for the kernel),
and it's not likely to be all that helpful. Most short
references are to commits, so the useful information is
typically going to be that the object in question _isn't_ a
commit. So it's silly to spend a lot of CPU preemptively
digging up the path; the user can do it themselves if they
really need to.
And of course it's somewhat ironic that we abbreviate the
sha1s in the disambiguation hint. But full sha1s would cause
annoying line wrapping for the commit lines, and presumably
the user is going to just re-issue their command immediately
with the corrected sha1.
We also restrict the list to those that match any
disambiguation hint. E.g.:
$ git rev-parse b2e1:foo
error: short SHA1 b2e1 is ambiguous
hint: The candidates are:
hint: b2e1196 tag v2.8.0-rc1
hint: b2e11d1 tree
hint: b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
fatal: Invalid object name 'b2e1'.
does not bother reporting the blobs, because they cannot
work as a treeish.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If an object appears multiple times in the object database
(e.g., in both loose and packed form, or in two separate
packs), the disambiguation machinery may see it more than
once. The get_short_sha1() function handles this already,
but for_each_abbrev() blindly fires the callback for each
instance it finds.
We can fix this by collecting the output in a sha1 array and
de-duplicating it. As a bonus, the sort done for the
de-duplication means that our output will be stable,
regardless of the order in which the objects are found.
Note that the old code normalized the callback's output to
0/1 to store in the 1-bit ds->ambiguous flag (which both
halted the iteration and was returned from the
for_each_abbrev function). Now that we are using sha1_array,
we can return the real value. In practice, it doesn't matter
as the sole caller only ever returns 0.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callbacks for iterating a sha1_array must have a void
return. This is unlike our usual for_each semantics, where
a callback may interrupt iteration and have its value
propagated. Let's switch it to the usual form, which will
enable its use in more places (e.g., where we are replacing
an existing iteration with a different data structure).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The treeish disambiguation function tries to peel tags, but
it does so by calling:
deref_tag(lookup_object(sha1), ...);
This will only work if we have previously looked at the tag
and created a "struct tag" for it. Since parsing revision
arguments typically happens before anything else, this is
usually not the case, and we would fail to peel the tag (we
are lucky that deref_tag() gracefully handles the NULL and
does not segfault).
Instead, we can use parse_object(). Note that this is the
same fix done by 94d75d1 (get_short_sha1(): correctly
disambiguate type-limited abbreviation, 2013-07-01), but
that commit fixed only the committish disambiguator, and
left the bug in the treeish one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The get_sha1() function is actually implementation by many
sub-functions, but we do not always pass our flags around to
all of those functions. As a result, we may forget that our
caller asked us to resolve with GET_SHA1_QUIETLY and output
messages. The two triggerable cases are:
1. Resolving treeish:path will resolve the "treeish"
portion using GET_SHA1_TREEISH, dropping all other
flags.
2. The peel_onion() function did not take flags at all
but recurses to get_sha1_1(), which does.
The solution for both is to bitwise-OR their new flags with
the existing ones (after dropping any mutually exclusive
disambiguation flags).
This bug can trigger with "git rev-parse --quiet", which
asks for quiet resolution. But it can also happen in a more
vanilla code path when we do a follow-up ONLY_TO_DIE
invocation of get_sha1(), and that's what the tests check.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the revision code cannot parse an argument like
"HEAD:foo", it will call maybe_die_on_misspelt_object_name(),
which re-runs get_sha1() with an extra ONLY_TO_DIE flag. We
then spend more effort to generate a better error message.
Unfortunately, a side effect is that our second call may
repeat the same error messages from the original get_sha1()
call. You can see this with:
$ git show 0017
error: short SHA1 0017 is ambiguous.
error: short SHA1 0017 is ambiguous.
fatal: ambiguous argument '0017': unknown revision or path not in the working tree.
Use '--' to separate paths from revisions, like this:
'git <command> [<revision>...] -- [<file>...]'
where the second "error:" line comes from the ONLY_TO_DIE
call.
To fix this, we can make ONLY_TO_DIE imply QUIETLY. This is
a little odd, because the whole point of ONLY_TO_DIE is to
output error messages. But what we want to do is tell the
rest of the get_sha1() code (particularly get_sha1_1()) that
the _regular_ messages should be quiet, but the only-to-die
ones should not.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages. Make use of this autodetection for files
which syntax is not known by gitweb. In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).
Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML. In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()). Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types. Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.
Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.
Signed-off-by: Ian Kelling <ian@iankelling.org>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function needs_work_tree_config() that is called from
create_default_files() is supposed to be fed the path to ".git" that
looks as if it is at the top of the working tree, and decide if that
location matches the actual worktree being used. This comparison allows
"git init" to decide if core.worktree needs to be recorded in the
working tree.
In the current code, however, we feed the return value from
get_git_dir(), which can be totally different from what the function
expects when "gitdir" file is involved. Instead of giving the path to
the ".git" at the top of the working tree, we end up feeding the actual
path that the file points at.
This original location of ".git" however is only known to init_db().
Make init_db() save it and have it passed to create_default_files() as a
new parameter, which passes the correct location down to
needs_work_tree_config() to fix this.
Noticed-by: Max Nordlund <max.nordlund@sqore.com>
Helped-by: Michael J Gruber <git@drmicha.warpmail.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 'git init' is called from a linked worktree, we treat '.git'
dir (which is $GIT_COMMON_DIR/worktrees/something) as the main
'.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton
in there. It does not harm anything (*) but it is still wrong.
Since 'git init' calls set_git_dir() at preparation time, which
indirectly calls get_common_dir() and correctly detects multiple
worktree setup, all git_path_buf() calls in create_default_files() will
return correct paths in both single and multiple worktree setups. The
only thing left is copy_templates(), which targets $GIT_DIR, not
$GIT_COMMON_DIR.
Fix that with get_git_common_dir(). This function will return $GIT_DIR
in single-worktree setup, so we don't have to make a special case for
multiple-worktree here.
(*) It does in fact, thanks to another bug. More on that later.
Noticed-by: Max Nordlund <max.nordlund@sqore.com>
Helped-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even when "git pull --rebase=preserve" (and the underlying "git
rebase --preserve") can complete without creating any new commit
(i.e. fast-forwards), it still insisted on having a usable ident
information (read: user.email is set correctly), which was less
than nice. As the underlying commands used inside "git rebase"
would fail with a more meaningful error message and advice text
when the bogus ident matters, this extra check was removed.
* jk/rebase-i-drop-ident-check:
rebase-interactive: drop early check for valid ident
More i18n.
* va/i18n:
i18n: update-index: mark warnings for translation
i18n: show-branch: mark plural strings for translation
i18n: show-branch: mark error messages for translation
i18n: receive-pack: mark messages for translation
notes: spell first word of error messages in lowercase
i18n: notes: mark error messages for translation
i18n: merge-recursive: mark verbose message for translation
i18n: merge-recursive: mark error messages for translation
i18n: config: mark error message for translation
i18n: branch: mark option description for translation
i18n: blame: mark error messages for translation
"git format-patch --base=..." feature that was recently added
showed the base commit information after "-- " e-mail signature
line, which turned out to be inconvenient. The base information
has been moved above the signature line.
* jt/format-patch-base-info-above-sig:
format-patch: show base info before email signature
Performance tests done via "t/perf" did not use the same set of
build configuration if the user relied on autoconf generated
configuration.
* ks/perf-build-with-autoconf:
t/perf/run: copy config.mak.autogen & friends to build area
"git diff -W" output needs to extend the context backward to
include the header line of the current function and also forward to
include the body of the entire current function up to the header
line of the next one. This process may have to merge to adjacent
hunks, but the code forgot to do so in some cases.
* rs/xdiff-merge-overlapping-hunks-for-W-context:
xdiff: fix merging of hunks with -W context and -u context
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour. The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.
* jk/setup-sequence-update:
t1007: factor out repeated setup
init: reset cached config when entering new repo
init: expand comments explaining config trickery
config: only read .git/config from configured repos
test-config: setup git directory
t1302: use "git -C"
pager: handle early config
pager: use callbacks instead of configset
pager: make pager_program a file-local static
pager: stop loading git_default_config()
pager: remove obsolete comment
diff: always try to set up the repository
diff: handle --no-index prefixes consistently
diff: skip implicit no-index check when given --no-index
patch-id: use RUN_SETUP_GENTLY
hash-object: always try to set up the git repository
Some codepaths in "git pack-objects" were not ready to use an
existing pack bitmap; now they are and as the result they have
become faster.
* ks/pack-objects-bitmap:
pack-objects: use reachability bitmap index when generating non-stdout pack
pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
Even though "git hash-objects", which is a tool to take an
on-filesystem data stream and put it into the Git object store,
allowed to perform the "outside-world-to-Git" conversions (e.g.
end-of-line conversions and application of the clean-filter), and
it had the feature on by default from very early days, its reverse
operation "git cat-file", which takes an object from the Git object
store and externalize for the consumption by the outside world,
lacked an equivalent mechanism to run the "Git-to-outside-world"
conversion. The command learned the "--filters" option to do so.
* js/cat-file-filters:
cat-file: support --textconv/--filters in batch mode
cat-file --textconv/--filters: allow specifying the path separately
cat-file: introduce the --filters option
cat-file: fix a grammo in the man page
JGit can show a fake ref "capabilities^{}" to "git fetch" when it
does not advertise any refs, but "git fetch" was not prepared to
see such an advertisement. When the other side disconnects without
giving any ref advertisement, we used to say "there may not be a
repository at that URL", but we may have seen other advertisement
like "shallow" and ".have" in which case we definitely know that a
repository is there. The code to detect this case has also been
updated.
* jt/accept-capability-advertisement-when-fetching-from-void:
connect: advertized capability is not a ref
connect: tighten check for unexpected early hang up
tests: move test_lazy_prereq JGIT to test-lib.sh
A recently introduced test checks the result of 'git status' after
setting the executable bit on a file. This check does not yield the
expected result when the filesystem does not support the executable
bit.
What we care about is that a file added with "--chmod=+x" has
executable bit in the index and that "--chmod=+x" (or any other
options for that matter) does not muck with working tree files.
The former is tested by other existing tests, so let's check the
latter more explicitly and only under POSIXPERM prerequisite.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.
We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).
Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3). By converting them to use regexec_buf(), the code has
become much cleaner.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When our pickaxe code feeds file contents to regexec(), it implicitly
assumes that the file contents are read into implicitly NUL-terminated
buffers (i.e. that we overallocate by 1, appending a single '\0').
This is not so.
In particular when the file contents are simply mmap()ed, we can be
virtually certain that the buffer is preceding uninitialized bytes, or
invalid pages.
Note that the test we add here is known to be flakey: we simply cannot
know whether the byte following the mmap()ed ones is a NUL or not.
Typically, on Linux the test passes. On Windows, it fails virtually
every time due to an access violation (that's a segmentation fault for
you Unix-y people out there). And Windows would be correct: the
regexec() call wants to operate on a regular, NUL-terminated string,
there is no NUL in the mmap()ed memory range, and it is undefined
whether the next byte is even legal to access.
When run with --valgrind it demonstrates quite clearly the breakage, of
course.
Being marked with `test_expect_failure`, this test will sometimes be
declare "TODO fixed", even if it only passes by mistake.
This test case represents a Minimal, Complete and Verifiable Example of
a breakage reported by Chris Sidi.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subdirectory 'sub' is created early in the test file. Later, a test
case removes it during its clean-up actions. However, this test case is
protected by POSIXPERM. Consequently, 'sub' remains when the POSIXPERM
prerequisite is not satisfied. Later, a recently introduced test case
creates 'sub' again. Use -p with mkdir so that it does not fail if 'sub'
already exists.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mailinfo currently handles multi-line headers, but it does not handle
multi-line in-body headers. Teach it to handle such headers, for
example, for this input:
From: author <author@example.com>
Date: Fri, 9 Jun 2006 00:44:16 -0700
Subject: a very long
broken line
Subject: another very long
broken line
interpret the in-body subject to be "another very long broken line"
instead of "another very long".
An existing test (t/t5100/msg0015) has an indented line immediately
after an in-body header - it has been modified to reflect the new
functionality.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an alias for --subject-prefix='RFC PATCH', which is used
commonly in some development communities to deserve such a
short-hand.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The two functions in parse_branchname_arg(), verify_non_filename and
check_filename, need correct prefix in order to reconstruct the paths
and check for their existence. With NULL prefix, they just check paths
at top dir instead.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update a few tests that used to use GIT_CURL_VERBOSE to use the
newer GIT_TRACE_CURL.
* ep/use-git-trace-curl-in-tests:
t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
test-lib.sh: preserve GIT_TRACE_CURL from the environment
t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var
A test spawned a short-lived background process, which sometimes
prevented the test directory from getting removed at the end of the
script on some platforms.
* js/t6026-clean-up:
t6026-merge-attr: clean up background process at end of test case
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one. Teach the command
to forbid removal of HEAD.
* jc/forbid-symbolic-ref-d-HEAD:
symbolic-ref -d: do not allow removal of HEAD
Having a submodule whose ".git" repository is somehow corrupt
caused a few commands that recurse into submodules loop forever.
* jc/submodule-anchor-git-dir:
submodule: avoid auto-discovery in prepare_submodule_repo_env()
The test framework left the number of tests and success/failure
count in the t/test-results directory, keyed by the name of the
test script plus the process ID. The latter however turned out not
to serve any useful purpose. The process ID part of the filename
has been removed.
* jk/test-lib-drop-pid-from-results:
test-lib: drop PID from test-results/*.count
The "unsigned char sha1[20]" to "struct object_id" conversion
continues. Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.
It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2"). Extra sets of eyes double-checking for
mismerges are highly appreciated.
* bc/object-id:
builtin/reset: convert to use struct object_id
builtin/commit-tree: convert to struct object_id
builtin/am: convert to struct object_id
refs: add an update_ref_oid function.
sha1_name: convert get_sha1_mb to struct object_id
builtin/update-index: convert file to struct object_id
notes: convert init_notes to use struct object_id
builtin/rm: convert to use struct object_id
builtin/blame: convert file to use struct object_id
Convert read_mmblob to take struct object_id.
notes-merge: convert struct notes_merge_pair to struct object_id
builtin/checkout: convert some static functions to struct object_id
streaming: make stream_blob_to_fd take struct object_id
builtin: convert textconv_object to use struct object_id
builtin/cat-file: convert some static functions to struct object_id
builtin/cat-file: convert struct expand_data to use struct object_id
builtin/log: convert some static functions to use struct object_id
builtin/blame: convert struct origin to use struct object_id
builtin/apply: convert static functions to struct object_id
cache: convert struct cache_entry to use struct object_id
"git am" has been taught to make an internal call to "git apply"'s
innards without spawning the latter as a separate process.
* cc/apply-am: (41 commits)
builtin/am: use apply API in run_apply()
apply: learn to use a different index file
apply: pass apply state to build_fake_ancestor()
apply: refactor `git apply` option parsing
apply: change error_routine when silent
usage: add get_error_routine() and get_warn_routine()
usage: add set_warn_routine()
apply: don't print on stdout in verbosity_silent mode
apply: make it possible to silently apply
apply: use error_errno() where possible
apply: make some parsing functions static again
apply: move libified code from builtin/apply.c to apply.{c,h}
apply: rename and move opt constants to apply.h
builtin/apply: rename option parsing functions
builtin/apply: make create_one_file() return -1 on error
builtin/apply: make try_create_file() return -1 on error
builtin/apply: make write_out_results() return -1 on error
builtin/apply: make write_out_one_result() return -1 on error
builtin/apply: make create_file() return -1 on error
builtin/apply: make add_index_file() return -1 on error
...
Mark messages passed to die() in die_initial_contact().
Update test to reflect changes.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark message commit_utf8_warn for translation.
Update tests to reflect changes.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reset colors and attributes upon %C(auto) to enable full automatic
control over them; otherwise attributes like bold or reverse could
still be in effect from previous %C placeholders.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".
Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.
It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
}
if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;
The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:
--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
}
+if (!$smtp_server) {
+ $smtp_server = $repo->config('sendemail.smtpserver');
+}
if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.
The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.
Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.
This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.
Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:
1. It measures the following characteristics of a proposed splitting
position in a `struct split_measurement`:
* the number of blank lines above the proposed split
* whether the line directly after the split is blank
* the number of blank lines following that line
* the indentation of the nearest non-blank line above the split
* the indentation of the line directly below the split
* the indentation of the nearest non-blank line after that line
2. It combines the measured attributes using a bunch of
empirically-optimized weighting factors to derive a `struct
split_score` that measures the "badness" of splitting the text at
that position.
3. It combines the `split_score` for the top and the bottom of the
slider at each of its possible positions, and selects the position
that has the best `split_score`.
I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:
| repository | count | Git 2.9.0 | compaction | compaction-fixed | indent-1 | indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking | 109 | 89 (81.7%) | 37 (33.9%) | 37 (33.9%) | 2 (1.8%) | 2 (1.8%) |
| alamofire | 30 | 18 (60.0%) | 14 (46.7%) | 15 (50.0%) | 0 (0.0%) | 0 (0.0%) |
| angular | 184 | 127 (69.0%) | 39 (21.2%) | 23 (12.5%) | 5 (2.7%) | 5 (2.7%) |
| animate | 313 | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) | 2 (0.6%) |
| ant | 380 | 356 (93.7%) | 152 (40.0%) | 148 (38.9%) | 15 (3.9%) | 15 (3.9%) | *
| bugzilla | 306 | 263 (85.9%) | 109 (35.6%) | 99 (32.4%) | 14 (4.6%) | 15 (4.9%) | *
| corefx | 126 | 91 (72.2%) | 22 (17.5%) | 21 (16.7%) | 6 (4.8%) | 6 (4.8%) |
| couchdb | 78 | 44 (56.4%) | 26 (33.3%) | 28 (35.9%) | 6 (7.7%) | 6 (7.7%) | *
| cpython | 937 | 158 (16.9%) | 50 (5.3%) | 49 (5.2%) | 5 (0.5%) | 5 (0.5%) | *
| discourse | 160 | 95 (59.4%) | 42 (26.2%) | 36 (22.5%) | 18 (11.2%) | 13 (8.1%) |
| docker | 307 | 194 (63.2%) | 198 (64.5%) | 253 (82.4%) | 8 (2.6%) | 8 (2.6%) | *
| electron | 163 | 132 (81.0%) | 38 (23.3%) | 39 (23.9%) | 6 (3.7%) | 6 (3.7%) |
| git | 536 | 470 (87.7%) | 73 (13.6%) | 78 (14.6%) | 16 (3.0%) | 16 (3.0%) | *
| gitflow | 127 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| ionic | 133 | 89 (66.9%) | 29 (21.8%) | 38 (28.6%) | 1 (0.8%) | 1 (0.8%) |
| ipython | 482 | 362 (75.1%) | 167 (34.6%) | 169 (35.1%) | 11 (2.3%) | 11 (2.3%) | *
| junit | 161 | 147 (91.3%) | 67 (41.6%) | 66 (41.0%) | 1 (0.6%) | 1 (0.6%) | *
| lighttable | 15 | 5 (33.3%) | 0 (0.0%) | 2 (13.3%) | 0 (0.0%) | 0 (0.0%) |
| magit | 88 | 75 (85.2%) | 11 (12.5%) | 9 (10.2%) | 1 (1.1%) | 0 (0.0%) |
| neural-style | 28 | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |
| nodejs | 781 | 649 (83.1%) | 118 (15.1%) | 111 (14.2%) | 4 (0.5%) | 5 (0.6%) | *
| phpmyadmin | 491 | 481 (98.0%) | 75 (15.3%) | 48 (9.8%) | 2 (0.4%) | 2 (0.4%) | *
| react-native | 168 | 130 (77.4%) | 79 (47.0%) | 81 (48.2%) | 0 (0.0%) | 0 (0.0%) |
| rust | 171 | 128 (74.9%) | 30 (17.5%) | 27 (15.8%) | 16 (9.4%) | 14 (8.2%) |
| spark | 186 | 149 (80.1%) | 52 (28.0%) | 52 (28.0%) | 2 (1.1%) | 2 (1.1%) |
| tensorflow | 115 | 66 (57.4%) | 48 (41.7%) | 48 (41.7%) | 5 (4.3%) | 5 (4.3%) |
| test-more | 19 | 15 (78.9%) | 2 (10.5%) | 2 (10.5%) | 1 (5.3%) | 1 (5.3%) | *
| test-unit | 51 | 34 (66.7%) | 14 (27.5%) | 8 (15.7%) | 2 (3.9%) | 2 (3.9%) | *
| xmonad | 23 | 22 (95.7%) | 2 (8.7%) | 2 (8.7%) | 1 (4.3%) | 1 (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals | 6668 | 4391 (65.9%) | 1496 (22.4%) | 1491 (22.4%) | 150 (2.2%) | 144 (2.2%) |
| totals (training set) | 4552 | 3195 (70.2%) | 1053 (23.1%) | 1061 (23.3%) | 86 (1.9%) | 88 (1.9%) |
| totals (test set) | 2116 | 1196 (56.5%) | 443 (20.9%) | 430 (20.3%) | 64 (3.0%) | 56 (2.6%) |
In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are
* "repository" - the name of the repository used. I used the diffs
between successive non-merge commits on the HEAD branch of the
corresponding repository.
* "count" - the number of sliders that were human-rated. I chose most,
but not all, sliders to rate from those among which the various
algorithms gave different answers.
* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.
* "compaction" - the heuristic used by `git diff --compaction-heuristic`
in Git 2.9.0.
* "compaction-fixed" - the heuristic used by `git diff
--compaction-heuristic` after the fixes from earlier in this patch
series. Note that the results are not dramatically different than
those for "compaction". Both produce non-ideal diffs only about 1/3 as
often as the default `git diff`.
* "indent-1" - the new `--indent-heuristic` algorithm, using the first
set of weighting factors, determined as described above.
* "indent-2" - the new `--indent-heuristic` algorithm, using the final
set of weighting factors, determined as described below.
* `*` - indicates that repo was part of training set used to determine
the first set of weighting factors.
The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.
The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)
The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].
This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.
[1] https://github.com/mhagger/diff-slider-tools
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fetch http::/site/path" did not die correctly and segfaulted
instead.
* jk/fix-remote-curl-url-wo-proto:
remote-curl: handle URLs without protocol
"git pack-objects --include-tag" was taught that when we know that
we are sending an object C, we want a tag B that directly points at
C but also a tag A that points at the tag B. We used to miss the
intermediate tag B in some cases.
* jk/pack-tag-of-tag:
pack-objects: walk tag chains for --include-tag
t5305: simplify packname handling
t5305: use "git -C"
t5305: drop "dry-run" of unpack-objects
t5305: move cleanup into test block
Otherwise for people who use autotools-based configure in main worktree,
the performance testing results will be inconsistent as work and build
trees could be using e.g. different optimization levels.
See e.g.
http://public-inbox.org/git/20160818175222.bmm3ivjheokf2qzl@sigill.intra.peff.net/
for example.
NOTE config.status has to be copied because otherwise without it the build
would want to run reconfigure this way loosing just copied config.mak.autogen.
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That's the usual style.
Update one test to reflect these changes.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark error messages for translation passed to die() function.
Change "Cannot" to lowercase following the usual style.
Reflect changes to test by using test_i18ngrep.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the chmod option was added to git add, it was hooked up to the diff
machinery, meaning that it only works when the version in the index
differs from the version on disk.
As the option was supposed to mirror the chmod option in update-index,
which always changes the mode in the index, regardless of the status of
the file, make sure the option behaves the same way in git add.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Any text below the "-- " for the email signature gets treated as part of
the signature, and many mail clients will trim it from the quoted text
for a reply. Move it above the signature, so people can reply to it
more easily.
Similarly, when producing the patch as a MIME attachment, the
original code placed the base info after the attached part, which
would be discarded. Move the base info to the end of the part,
still inside the part boundary.
Add tests for the exact format of the email signature, and add tests
to ensure that the base info appears before the email signature when
producing a plain-text output, and that it appears before the part
boundary when producing a MIME attachment.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the function context for a hunk (with -W) reaches the beginning of
the next hunk then we need to merge these two -- otherwise we'd show
some lines twice, which looks strange and even confuses git apply. We
already do this checking and merging in xdl_emit_diff(), but forget to
consider regular context (with -u or -U).
Fix that by merging hunks already if function context of the first one
touches or overlaps regular context of the second one.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently there is no test checking the expected behaviour when multiple
chmod flags with different arguments are passed. As argument handling
is not in line with other git commands it's easy to miss and
accidentally change the current behaviour.
While there, fix the argument type of chmod_path, which takes an int,
but had a char passed in.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We have a series of 3 CRLF tests that do exactly the same
(long) setup sequence. Let's pull it out into a common setup
test, which is shorter, more efficient, and will make it
easier to add new tests.
Note that we don't have to worry about cleaning up any of
the setup which was previously per-test; we call pop_repo
after the CRLF tests, which cleans up everything.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.
Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).
There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.
But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config". This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:
- it's not sufficient, and therefore hides bugs in a
confusing way. Config will be respected if commands are
run from the top-level of the working tree, but not from
a subdirectory.
- it's not always true that we haven't set up the
repository _yet_; we may not want to do it at all. For
instance, if you run "git init /some/path" from inside
another repository, it should not load config from the
existing repository.
- there might be a path ".git/config", but it is not the
actual repository we would find via setup_git_directory().
This may happen, e.g., if you are storing a git
repository inside another git repository, but have
munged one of the files in such a way that the
inner repository is not valid (e.g., by removing HEAD).
We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any). This shows up in two ways:
1. If we happen to be in an existing repository directory,
we'll read and respect core.sharedrepository from it,
even though it should have no bearing on the new
repository. A new test in t1301 covers this.
2. Similarly, if we're in an existing repo that sets
core.logallrefupdates, that will cause init to fail to
set it in a newly created repository (because it thinks
that the user's templates already did so). A new test
in t0001 covers this.
We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.
That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.
After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice. But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we see an explicit "--no-index", we do not bother calling
setup_git_directory_gently() at all. This means that we may
miss out on reading repo-specific config.
It's arguable whether this is correct or not. If we were
designing from scratch, making "git diff --no-index"
completely ignore the repository makes some sense. But we
are nowhere near scratch, so let's look at the existing
behavior:
1. If you're in the top-level of a repository and run an
explicit "diff --no-index", the config subsystem falls
back to reading ".git/config", and we will respect repo
config.
2. If you're in a subdirectory of a repository, then we
still try to read ".git/config", but it generally
doesn't exist. So "diff --no-index" there does not
respect repo config.
3. If you have $GIT_DIR set in the environment, we read
and respect $GIT_DIR/config,
4. If you run "git diff /tmp/foo /tmp/bar" to get an
implicit no-index, we _do_ run the repository setup,
and set $GIT_DIR (or respect an existing $GIT_DIR
variable). We find the repo config no matter where we
started, and respect it.
So we already respect the repository config in a number of
common cases, and case (2) is the only one that does not.
And at least one of our tests, t4034, depends on case (1)
behaving as it does now (though it is just incidental, not
an explicit test for this behavior).
So let's bring case (2) in line with the others by always
running the repository setup, even with an explicit
"--no-index". We shouldn't need to change anything else, as the
implicit case already handles the prefix.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we see an explicit "git diff --no-index ../foo ../bar",
then we do not set up the git repository at all (we already
know we are in --no-index mode, so do not have to check "are
we in a repository?"), and hence have no "prefix" within the
repository. A patch generated by this command will have the
filenames "a/../foo" and "b/../bar", no matter which
directory we are in with respect to any repository.
However, in the implicit case, where we notice that the
files are outside the repository, we will have chdir()'d to
the top-level of the repository. We then feed the prefix
back to the diff machinery. As a result, running the same
diff from a subdirectory will result in paths that look like
"a/subdir/../../foo".
Besides being unnecessarily long, this may also be confusing
to the user: they don't care about the subdir or the
repository at all; it's just where they happened to be when
running the command. We should treat this the same as the
explicit --no-index case.
One way to address this would be to chdir() back to the
original path before running our diff. However, that's a bit
hacky, as we would also need to adjust $GIT_DIR, which could
be a relative path from our top-level.
Instead, we can reuse the diff machinery's RELATIVE_NAME
option, which automatically strips off the prefix. Note that
this _also_ restricts the diff to this relative prefix, but
that's OK for our purposes: we queue our own diff pairs
manually, and do not rely on that part of the diff code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Patch-id does not require a repository because it is just
processing the incoming diff on stdin, but it may look at
git config for keys like patchid.stable.
Even though we do not setup_git_directory(), this works from
the top-level of a repository because we blindly look at
".git/config" in this case. But as the included test
demonstrates, it does not work from a subdirectory.
We can fix it by using RUN_SETUP_GENTLY. We do not take any
filenames from the user on the command line, so there's no
need to adjust them via prefix_filename().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "hash-object" is run without "-w", we don't need to be
in a git repository at all; we can just hash the object and
write its sha1 to stdout. However, if we _are_ in a git
repository, we would want to know that so we can follow the
normal rules for respecting config, .gitattributes, etc.
This happens to work at the top-level of a git repository
because we blindly read ".git/config", but as the included
test shows, it does not work when you are in a subdirectory.
The solution is to just do a "gentle" setup in this case. We
already take care to use prefix_filename() on any filename
arguments we get (to handle the "-w" case), so we don't need
to do anything extra to handle the side effects of repo
setup.
An alternative would be to specify RUN_SETUP_GENTLY for this
command in git.c, and then die if "-w" is set but we are not
in a repository. However, the error messages generated at
the time of setup_git_directory() are more detailed, so it's
better to find out which mode we are in, and then call the
appropriate function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update a few tests that used to use GIT_CURL_VERBOSE to use the
newer GIT_TRACE_CURL.
* ep/use-git-trace-curl-in-tests:
t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
test-lib.sh: preserve GIT_TRACE_CURL from the environment
t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var
A test spawned a short-lived background process, which sometimes
prevented the test directory from getting removed at the end of the
script on some platforms.
* js/t6026-clean-up:
t6026-merge-attr: clean up background process at end of test case
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one. Teach the command
to forbid removal of HEAD.
* jc/forbid-symbolic-ref-d-HEAD:
symbolic-ref -d: do not allow removal of HEAD
Having a submodule whose ".git" repository is somehow corrupt
caused a few commands that recurse into submodules loop forever.
* jc/submodule-anchor-git-dir:
submodule: avoid auto-discovery in prepare_submodule_repo_env()
The test framework left the number of tests and success/failure
count in the t/test-results directory, keyed by the name of the
test script plus the process ID. The latter however turned out not
to serve any useful purpose. The process ID part of the filename
has been removed.
* jk/test-lib-drop-pid-from-results:
test-lib: drop PID from test-results/*.count
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.
* jk/diff-submodule-diff-inline:
diff: teach diff to display submodule difference with an inline diff
submodule: refactor show_submodule_summary with helper function
submodule: convert show_submodule_summary to use struct object_id *
allow do_submodule_path to work even if submodule isn't checked out
diff: prepare for additional submodule formats
graph: add support for --line-prefix on all graph-aware output
diff.c: remove output_prefix_length field
cache: add empty_tree_oid object and helper function
Starting from 6b8fda2d (pack-objects: use bitmaps when packing objects)
if a repository has bitmap index, pack-objects can nicely speedup
"Counting objects" graph traversal phase. That however was done only for
case when resultant pack is sent to stdout, not written into a file.
The reason here is for on-disk repack by default we want:
- to produce good pack (with bitmap index not-yet-packed objects are
emitted to pack in suboptimal order).
- to use more robust pack-generation codepath (avoiding possible
bugs in bitmap code and possible bitmap index corruption).
Jeff King further explains:
The reason for this split is that pack-objects tries to determine how
"careful" it should be based on whether we are packing to disk or to
stdout. Packing to disk implies "git repack", and that we will likely
delete the old packs after finishing. We want to be more careful (so
as not to carry forward a corruption, and to generate a more optimal
pack), and we presumably run less frequently and can afford extra CPU.
Whereas packing to stdout implies serving a remote via "git fetch" or
"git push". This happens more frequently (e.g., a server handling many
fetching clients), and we assume the receiving end takes more
responsibility for verifying the data.
But this isn't always the case. One might want to generate on-disk
packfiles for a specialized object transfer. Just using "--stdout" and
writing to a file is not optimal, as it will not generate the matching
pack index.
So it would be useful to have some way of overriding this heuristic:
to tell pack-objects that even though it should generate on-disk
files, it is still OK to use the reachability bitmaps to do the
traversal.
So we can teach pack-objects to use bitmap index for initial object
counting phase when generating resultant pack file too:
- if we take care to not let it be activated under git-repack:
See above about repack robustness and not forward-carrying corruption.
- if we know bitmap index generation is not enabled for resultant pack:
The current code has singleton bitmap_git, so it cannot work
simultaneously with two bitmap indices.
We also want to avoid (at least with current implementation)
generating bitmaps off of bitmaps. The reason here is: when generating
a pack, not-yet-packed objects will be emitted into pack in
suboptimal order and added to tail of the bitmap as "extended entries".
When the resultant pack + some new objects in associated repository
are in turn used to generate another pack with bitmap, the situation
repeats: new objects are again not emitted optimally and just added to
bitmap tail - not in recency order.
So the pack badness can grow over time when at each step we have
bitmapped pack + some other objects. That's why we want to avoid
generating bitmaps off of bitmaps, not to let pack badness grow.
- if we keep pack reuse enabled still only for "send-to-stdout" case:
Because pack-to-file needs to generate index for destination pack, and
currently on pack reuse raw entries are directly written out to the
destination pack by write_reused_pack(), bypassing needed for pack index
generation bookkeeping done by regular codepath in write_one() and
friends.
( In the future we might teach pack-reuse code about cases when index
also needs to be generated for resultant pack and remove
pack-reuse-only-for-stdout limitation )
This way for pack-objects -> file we get nice speedup:
erp5.git[1] (~230MB) extracted from ~ 5GB lab.nexedi.com backup
repository managed by git-backup[2] via
time echo 0186ac99 | git pack-objects --revs erp5pack
before: 37.2s
after: 26.2s
And for `git repack -adb` packed git.git
time echo 5c589a73 | git pack-objects --revs gitpack
before: 7.1s
after: 3.6s
i.e. it can be 30% - 50% speedup for pack extraction.
git-backup extracts many packs on repositories restoration. That was my
initial motivation for the patch.
[1] https://lab.nexedi.com/nexedi/erp5
[2] https://lab.nexedi.com/kirr/git-backup
NOTE
Jeff also suggests that pack.useBitmaps was probably a mistake to
introduce originally. This way we are not adding another config point,
but instead just always default to-file pack-objects not to use bitmap
index: Tools which need to generate on-disk packs with using bitmap, can
pass --use-bitmap-index explicitly. And git-repack does never pass
--use-bitmap-index, so this way we can be sure regular on-disk repacking
remains robust.
NOTE2
`git pack-objects --stdout >file.pack` + `git index-pack file.pack` is much slower
than `git pack-objects file.pack`. Extracting erp5.git pack from
lab.nexedi.com backup repository:
$ time echo 0186ac99 | git pack-objects --stdout --revs >erp5pack-stdout.pack
real 0m22.309s
user 0m21.148s
sys 0m0.932s
$ time git index-pack erp5pack-stdout.pack
real 0m50.873s <-- more than 2 times slower than time to generate pack itself!
user 0m49.300s
sys 0m1.360s
So the time for
`pack-object --stdout >file.pack` + `index-pack file.pack` is 72s,
while
`pack-objects file.pack` which does both pack and index is 27s.
And even
`pack-objects --no-use-bitmap-index file.pack` is 37s.
Jeff explains:
The packfile does not carry the sha1 of the objects. A receiving
index-pack has to compute them itself, including inflating and applying
all of the deltas.
that's why for `git-backup restore` we want to teach `git pack-objects
file.pack` to use bitmaps instead of using `git pack-objects --stdout
>file.pack` + `git index-pack file.pack`.
NOTE3
The speedup is now tracked via t/perf/p5310-pack-bitmaps.sh
Test 56dfeb62 this tree
--------------------------------------------------------------------------------
5310.2: repack to disk 8.98(8.05+0.29) 9.05(8.08+0.33) +0.8%
5310.3: simulated clone 2.02(2.27+0.09) 2.01(2.25+0.08) -0.5%
5310.4: simulated fetch 0.81(1.07+0.02) 0.81(1.05+0.04) +0.0%
5310.5: pack to file 7.58(7.04+0.28) 7.60(7.04+0.30) +0.3%
5310.6: pack to file (bitmap) 7.55(7.02+0.28) 3.25(2.82+0.18) -57.0%
5310.8: clone (partial bitmap) 1.83(2.26+0.12) 1.82(2.22+0.14) -0.5%
5310.9: pack to file (partial bitmap) 6.86(6.58+0.30) 2.87(2.74+0.20) -58.2%
More context:
http://marc.info/?t=146792101400001&r=1&w=2http://public-inbox.org/git/20160707190917.20011-1-kirr@nexedi.com/T/#t
Cc: Vicent Marti <tanoku@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 6b8fda2d (pack-objects: use bitmaps when packing objects) there
are two codepaths in pack-objects: with & without using bitmap
reachability index.
However add_object_entry_from_bitmap(), despite its non-bitmapped
counterpart add_object_entry(), in no way does check for whether --local
or --honor-pack-keep or --incremental should be respected. In
non-bitmapped codepath this is handled in want_object_in_pack(), but
bitmapped codepath has simply no such checking at all.
The bitmapped codepath however was allowing to pass in all those options
and with bitmap indices still being used under such conditions -
potentially giving wrong output (e.g. including objects from non-local or
.keep'ed pack).
We can easily fix this by noting the following: when an object comes to
add_object_entry_from_bitmap() it can come for two reasons:
1. entries coming from main pack covered by bitmap index, and
2. object coming from, possibly alternate, loose or other packs.
"2" can be already handled by want_object_in_pack() and to cover
"1" we can teach want_object_in_pack() to expect that *found_pack can be
non-NULL, meaning calling client already found object's pack entry.
In want_object_in_pack() we care to start the checks from already found
pack, if we have one, this way determining the answer right away
in case neither --local nor --honour-pack-keep are active. In
particular, as p5310-pack-bitmaps.sh shows (3 consecutive runs), we do
not do harm to served-with-bitmap clones performance-wise:
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.08(8.20+0.25) 9.09(8.14+0.32) +0.1%
5310.3: simulated clone 1.92(2.12+0.08) 1.93(2.12+0.09) +0.5%
5310.4: simulated fetch 0.82(1.07+0.04) 0.82(1.06+0.04) +0.0%
5310.6: partial bitmap 1.96(2.42+0.13) 1.95(2.40+0.15) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.11(8.16+0.32) 9.11(8.19+0.28) +0.0%
5310.3: simulated clone 1.93(2.14+0.07) 1.92(2.11+0.10) -0.5%
5310.4: simulated fetch 0.82(1.06+0.04) 0.82(1.04+0.05) +0.0%
5310.6: partial bitmap 1.95(2.38+0.16) 1.94(2.39+0.14) -0.5%
Test 56dfeb62 this tree
-----------------------------------------------------------------
5310.2: repack to disk 9.13(8.17+0.31) 9.07(8.13+0.28) -0.7%
5310.3: simulated clone 1.92(2.13+0.07) 1.91(2.12+0.06) -0.5%
5310.4: simulated fetch 0.82(1.08+0.03) 0.82(1.08+0.03) +0.0%
5310.6: partial bitmap 1.96(2.43+0.14) 1.96(2.42+0.14) +0.0%
with delta timings showing they are all within noise from run to run.
In the general case we do not want to call find_pack_entry_one() more than
once, because it is expensive. This patch splits the loop in
want_object_in_pack() into two parts: finding the object and seeing if it
impacts our choice to include it in the pack. We may call the inexpensive
want_found_object() twice, but we will never call find_pack_entry_one() if we
do not need to.
I appreciate help and discussing this change with Junio C Hamano and
Jeff King.
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, --batch can be combined with --textconv or --filters.
For this to work, the input needs to have the form
<object name><single white space><path>
so that the filters can be chosen appropriately.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are circumstances when it is relatively easy to figure out the
object name for a given path, but not the name of the containing tree.
For example, when looking at a diff generated by Git, the object names
are recorded, but not the revision. As a matter of fact, the revisions
from which the diff was generated may not even exist locally.
In such a case, the user would have to generate a fake revision just to
be able to use --textconv or --filters.
Let's simplify this dramatically, because we do not really need that
revision at all: all we care about is that we know the path. In the
scenario described above, we do know the path, and we just want to
specify it separately from the object name.
Example usage:
git cat-file --textconv --path=main.c 0f1937fd
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The --filters option applies the convert_to_working_tree() filter for
the path when showing the contents of a regular file blob object;
the contents are written out as-is for other types of objects.
This feature comes in handy when a 3rd-party tool wants to work with
the contents of files from past revisions as if they had been checked
out, but without detouring via temporary files.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When cloning an empty repository served by standard git, "git clone" produces
the following reassuring message:
$ git clone git://localhost/tmp/empty
Cloning into 'empty'...
warning: You appear to have cloned an empty repository.
Checking connectivity... done.
Meanwhile when cloning an empty repository served by JGit, the output is more
haphazard:
$ git clone git://localhost/tmp/empty
Cloning into 'empty'...
Checking connectivity... done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.
This is a common command to run immediately after creating a remote repository
as preparation for adding content to populate it and pushing. The warning is
confusing and needlessly worrying.
The cause is that, since v3.1.0.201309270735-rc1~22 (Advertise capabilities
with no refs in upload service., 2013-08-08), JGit's ref advertisement includes
a ref named capabilities^{} to advertise its capabilities on, while git's ref
advertisement is empty in this case. This allows the client to learn about the
server's capabilities and is needed, for example, for fetch-by-sha1 to work
when no refs are advertised.
This also affects "ls-remote". For example, against an empty repository served
by JGit:
$ git ls-remote git://localhost/tmp/empty
0000000000000000000000000000000000000000 capabilities^{}
Git advertises the same capabilities^{} ref in its ref advertisement for push
but since it never did so for fetch, the client didn't need to handle this
case. Handle it.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This enables JGIT to be used as a prereq in invocations of
test_expect_success (and other functions) in other test scripts.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.
* sb/submodule-clone-rr:
clone: recursive and reference option triggers submodule alternates
clone: implement optional references
clone: clarify option_reference as required
clone: factor out checking for an alternate path
submodule--helper update-clone: allow multiple references
submodule--helper module-clone: allow multiple references
t7408: merge short tests, factor out testing method
t7408: modernize style
Enhance "git status --porcelain" output by collecting more data on
the state of the index and the working tree files, which may
further be used to teach git-prompt (in contrib/) to make fewer
calls to git.
* jh/status-v2-porcelain:
status: unit tests for --porcelain=v2
test-lib-functions.sh: add lf_to_nul helper
git-status.txt: describe --porcelain=v2 format
status: print branch info with --porcelain=v2 --branch
status: print per-file porcelain v2 status data
status: collect per-file data for --porcelain=v2
status: support --porcelain[=<version>]
status: cleanup API to wt_status_print
status: rename long-format print routines
"git nosuchcommand --help" said "No manual entry for gitnosuchcommand",
which was not intuitive, given that "git nosuchcommand" said "git:
'nosuchcommand' is not a git command".
* rt/help-unknown:
help: make option --help open man pages only for Git commands
help: introduce option --exclude-guides
An incoming "git push" that attempts to push too many bytes can now
be rejected by setting a new configuration variable at the receiving
end.
* cc/receive-pack-limit:
receive-pack: allow a maximum input size to be specified
unpack-objects: add --max-input-size=<size> option
index-pack: add --max-input-size=<size> option
"git format-patch --cover-letter HEAD^" to format a single patch
with a separate cover letter now numbers the output as [PATCH 0/1]
and [PATCH 1/1] by default.
* jk/format-patch-number-singleton-patch-with-cover:
format-patch: show 0/1 and 1/1 for singleton patch with cover letter
The delta-base-cache mechanism has been a key to the performance in
a repository with a tightly packed packfile, but it did not scale
well even with a larger value of core.deltaBaseCacheLimit.
* jk/delta-base-cache:
t/perf: add basic perf tests for delta base cache
delta_base_cache: use hashmap.h
delta_base_cache: drop special treatment of blobs
delta_base_cache: use list.h for LRU
release_delta_base_cache: reuse existing detach function
clear_delta_base_cache_entry: use a more descriptive name
cache_or_unpack_entry: drop keep_cache parameter
A small test clean-up for a topic introduced in v2.9.1 and later.
* sg/reflog-past-root:
t1410: remove superfluous 'git reflog' from the 'walk past root' test
The tempfile (hence its user lockfile) API lets the caller to open
a file descriptor to a temporary file, write into it and then
finalize it by first closing the filehandle and then either
removing or renaming the temporary file. When the process spawns a
subprocess after obtaining the file descriptor, and if the
subprocess has not exited when the attempt to remove or rename is
made, the last step fails on Windows, because the subprocess has
the file descriptor still open. Open tempfile with O_CLOEXEC flag
to avoid this (on Windows, this is mapped to O_NOINHERIT).
* bw/mingw-avoid-inheriting-fd-to-lockfile:
mingw: ensure temporary file handles are not inherited by child processes
t6026-merge-attr: child processes must not inherit index.lock handles
"git difftool" by default ignores the error exit from the backend
commands it spawns, because often they signal that they found
differences by exiting with a non-zero status code just like "diff"
does; the exit status codes 126 and above however are special in
that they are used to signal that the command is not executable,
does not exist, or killed by a signal. "git difftool" has been
taught to notice these exit status codes.
* jk/difftool-command-not-found:
difftool: always honor fatal error exit codes
"git checkout --detach <branch>" used to give the same advice
message as that is issued when "git checkout <tag>" (or anything
that is not a branch name) is given, but asking with "--detach" is
an explicit enough sign that the user knows what is going on. The
advice message has been squelched in this case.
* sb/checkout-explit-detach-no-advice:
checkout: do not mention detach advice for explicit --detach option
When "git merge-recursive" works on history with many criss-cross
merges in "verbose" mode, the names the command assigns to the
virtual merge bases could have overwritten each other by unintended
reuse of the same piece of memory.
* rs/pull-signed-tag:
commit: use FLEX_ARRAY in struct merge_remote_desc
merge-recursive: fix verbose output for multiple base trees
commit: factor out set_merge_remote_desc()
commit: use xstrdup() in get_merge_parent()
The "t/" hierarchy is prone to get an unusual pathname; "make test"
has been taught to make sure they do not contain paths that cannot
be checked out on Windows (and the mechanism can be reusable to
catch pathnames that are not portable to other platforms as need
arises).
* js/test-lint-pathname:
t/Makefile: ensure that paths are valid on platforms we care
"git push --force-with-lease" already had enough logic to allow
ensuring that such a push results in creation of a ref (i.e. the
receiving end did not have another push from sideways that would be
discarded by our force-pushing), but didn't expose this possibility
to the users. It does so now.
* jk/push-force-with-lease-creation:
t5533: make it pass on case-sensitive filesystems
push: allow pushing new branches with --force-with-lease
push: add shorthand for --force-with-lease branch creation
Documentation/git-push: fix placeholder formatting
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.
* jk/reflog-date:
date: clarify --date=raw description
date: add "unix" format
date: document and test "raw-local" mode
doc/pretty-formats: explain shortening of %gd
doc/pretty-formats: describe index/time formats for %gd
doc/rev-list-options: explain "-g" output formats
doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.
* jc/renormalize-merge-kill-safer-crlf:
merge: avoid "safer crlf" during recording of merge results
convert: unify the "auto" handling of CRLF
There are certain house-keeping tasks that need to be performed at
the very beginning of any Git program, and programs that are not
built-in commands had to do them exactly the same way as "git"
potty does. It was easy to make mistakes in one-off standalone
programs (like test helpers). A common "main()" function that
calls cmd_main() of individual program has been introduced to
make it harder to make mistakes.
* jk/common-main:
mingw: declare main()'s argv as const
common-main: call git_setup_gettext()
common-main: call restore_sigpipe_to_default()
common-main: call sanitize_stdfds()
common-main: call git_extract_argv0_path()
add an extra level of indirection to main()
Generally remote-curl would never see a URL that did not
have "proto:" at the beginning, as that is what tells git to
run the "git-remote-proto" helper (and git-remote-http, etc,
are aliases for git-remote-curl).
However, the special syntax "proto::something" will run
git-remote-proto with only "something" as the URL. So a
malformed URL like:
http::/example.com/repo.git
will feed the URL "/example.com/repo.git" to
git-remote-http. The resulting URL has no protocol, but the
code added by 372370f (http: use credential API to handle
proxy authentication, 2016-01-26) does not handle this case
and segfaults.
For the purposes of this code, we don't really care what the
exact protocol; only whether or not it is https. So let's
just assume that a missing protocol is not, and curl will
handle the real error (which is that the URL is nonsense).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we found bad instruction lines in the instruction sheet
of interactive rebase, we give the user advice on how to
fix it. However, we don't tell the user what to do afterwards.
Give the user advice to run 'git rebase --continue' after
the fix.
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pack-objects is given --include-tag, it peels each tag
ref down to a non-tag object, and if that non-tag object is
going to be packed, we include the tag, too. But what
happens if we have a chain of tags (e.g., tag "A" points to
tag "B", which points to commit "C")?
We'll peel down to "C" and realize that we want to include
tag "A", but we do not ever consider tag "B", leading to a
broken pack (assuming "B" was not otherwise selected).
Instead, we have to walk the whole chain, adding any tags we
find to the pack.
Interestingly, it doesn't seem possible to trigger this
problem with "git fetch", but you can with "git clone
--single-branch". The reason is that we generate the correct
pack when the client explicitly asks for "A" (because we do
a real reachability analysis there), and "fetch" is more
willing to do so. There are basically two cases:
1. If "C" is already a ref tip, then the client can deduce
that it needs "A" itself (via find_non_local_tags), and
will ask for it explicitly rather than relying on the
include-tag capability. Everything works.
2. If "C" is not already a ref tip, then we hope for
include-tag to send us the correct tag. But it doesn't;
it generates a broken pack. However, the next step is
to do a follow-up run of find_non_local_tags(),
followed by fetch_refs() to backfill any tags we
learned about.
In the normal case, fetch_refs() calls quickfetch(),
which does a connectivity check and sees we have no
new objects to fetch. We just write the refs.
But for the broken-pack case, the connectivity check
fails, and quickfetch will follow-up with the remote,
asking explicitly for each of the ref tips. This picks
up the missing object in a new pack.
For a regular "git clone", we are similarly OK, because we
explicitly request all of the tag refs, and get a correct
pack. But with "--single-branch", we kick in tag
auto-following via "include-tag", but do _not_ do a
follow-up backfill. We just take whatever the server sent us
via include-tag and write out tag refs for any tag objects
we were sent. So prior to c6807a4 (clone: open a shortcut
for connectivity check, 2013-05-26), we actually claimed the
clone was a success, but the result was silently
corrupted! Since c6807a4, index-pack's connectivity
check catches this case, and we correctly complain.
The included test directly checks that pack-objects does not
generate a broken pack, but also confirms that "clone
--single-branch" does not hit the bug.
Note that tag chains introduce another interesting question:
if we are packing the tag "B" but not the commit "C", should
"A" be included?
Both before and after this patch, we do not include "A",
because the initial peel_ref() check only knows about the
bottom-most level, "C". To realize that "B" is involved at
all, we would have to switch to an incremental peel, in
which we examine each tagged object, asking if it is being
packed (and including the outer tag if so).
But that runs contrary to the optimizations in peel_ref(),
which avoid accessing the objects at all, in favor of using
the value we pull from packed-refs. It's OK to walk the
whole chain once we know we're going to include the tag (we
have to access it anyway, so the effort is proportional to
the pack we're generating). But for the initial selection,
we have to look at every ref. If we're only packing a few
objects, we'd still have to parse every single referenced
tag object just to confirm that it isn't part of a tag
chain.
This could be addressed if packed-refs stored the complete
tag chain for each peeled ref (in most cases, this would be
the same cost as now, as each "chain" is only a single
link). But given the size of that project, it's out of scope
for this fix (and probably nobody cares enough anyway, as
it's such an obscure situation). This commit limits itself
to just avoiding the creation of a broken pack.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generate a series of packfiles test-1-$pack,
test-2-$pack, with different properties and then examine
them. However we always store the packname generated by
pack-objects in the variable packname_1. This probably was
meant to be packname_2 in the second test, but it turns out
that it doesn't matter: once we are done with the first
pack, we can just keep using the same $packname variable.
So let's drop the confusing "_1" parameter. At the same
time, let's give test-1 and test-2 more descriptive names,
which can help keep them straight (note that we _could_
likewise overwrite the packfiles in each test, but by using
separate filenames, we are sure that test 2 does not
accidentally use the packfile from test 1).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test unpacks objects into a separate repository, and
accesses it by setting GIT_DIR in a subshell. We can do the
same thing these days by using "git init <repo>" and "git
-C". In most cases this is shorter, though when there are
multiple commands, we may end up repeating the "-C".
However, this repetition can actually be a good thing. This
patch also fixes a bug introduced by 512477b (tests: use
"env" to run commands with temporary env-var settings,
2014-03-18). That commit essentially converted:
(GIT_DIR=...; export GIT_DIR
cmd1 &&
cmd2)
into:
(GIT_DIR=... cmd1 &&
cmd2)
which obviously loses the GIT_DIR setting for cmd2 (we never
noticed the bug because it simply runs "cmd2" in the parent
repo, which means we were simply failing to test anything
interesting). By using "git -C" rather than a subshell, it
becomes quite obvious where each command is supposed to be
running.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For each test we do a dry-run of unpack-objects, followed by
a real run, followed by confirming that it contained the
objects we expected. The dry-run is telling us nothing, as
any errors it encounters would be found in the real run.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We usually try to avoid doing any significant actions
outside of test blocks. Although "rm -rf" is unlikely to
either fail or to generate output, moving these to the
point of use makes it more clear that they are part of the
overall setup of "clone.git".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Turning on this variable can be useful when debugging http
tests. It can break a few tests in t5541 if not set
to an absolute path but it is not a variable
that the user is likely to have enabled accidentally.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new GIT_TRACE_CURL environment variable instead
of the deprecated GIT_CURL_VERBOSE.
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The process spawned in the hook uses the test's trash directory as CWD.
As long as it is alive, the directory cannot be removed on Windows.
Although the test succeeds, the 'test_done' that follows produces an
error message and leaves the trash directory around. Kill the process
before the test case advances.
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We might wonder why our && chain check does not catch this case:
The && chain check uses a strange exit code with the expectation that
the second or later part of a broken && chain would not exit with this
particular code.
This expectation does not work in this case because __git_ps1, being
the first command in the second part of the broken && chain, records
the current exit code, does its work, and finally returns to the caller
with the recorded exit code. This fools our && chain check.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you delete the symbolic-ref HEAD from a repository, Git no longer
considers the repository valid, and even "git symbolic-ref HEAD
refs/heads/master" would not be able to recover from that state
(although "git init" can, but that is a sure sign that you are
talking about a "broken" repository).
In the spirit similar to afe5d3d5 ("symbolic ref: refuse non-ref
targets in HEAD", 2009-01-29), forbid removal of HEAD to avoid
corrupting a repository.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function is used to set up the environment variable used in a
subprocess we spawn in a submodule directory. The callers set up a
child_process structure, find the working tree path of one submodule
and set .dir field to it, and then use start_command() API to spawn
the subprocess like "status", "fetch", etc.
When this happens, we expect that the ".git" (either a directory or
a gitfile that points at the real location) in the current working
directory of the subprocess MUST be the repository for the submodule.
If this ".git" thing is a corrupt repository, however, because
prepare_submodule_repo_env() unsets GIT_DIR and GIT_WORK_TREE, the
subprocess will see ".git", thinks it is not a repository, and
attempt to find one by going up, likely to end up in finding the
repository of the superproject. In some codepaths, this will cause
a command run with the "--recurse-submodules" option to recurse
forever.
By exporting GIT_DIR=.git, disable the auto-discovery logic in the
subprocess, which would instead stop it and report an error.
The test illustrates existing problems in a few callsites of this
function. Without this fix, "git fetch --recurse-submodules", "git
status" and "git diff" keep recursing forever.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.
Add tests for the new format and option.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on <path>/.git. If this fails it just assumes the
<path>/.git is actually a git directory.
This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the superproject.
Unfortunately this fails if the <path> is not actually checked out any
longer, such as by removing the directory.
Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ directory of
the superproject.
If we can't locate the submodule configuration, this might occur because
for example a submodule gitlink was added but the corresponding
.gitmodules file was not properly updated. A die() here would not be
pleasant to the users of submodule diff formats, so instead, modify
do_submodule_path() to return an error code:
- git_pathdup_submodule() returns NULL when we fail to find a path.
- strbuf_git_path_submodule() propagates the error code to the caller.
Modify the callers of these functions to check the error code and fail
properly. This ensures we don't attempt to use a bad path that doesn't
match the corresponding submodule.
Because this change fixes add_submodule_odb() to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"
Add tests to ensure this change works as expected.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=<string>" will print the
additional line-prefix on every line of output.
To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.
This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.
This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.
Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If option --help is passed to a Git command, we try to open
the man page of that command. However, we do it for both commands
and concepts. Make sure it is an actual command.
This makes "git <concept> --help" not working anymore, while
"git help <concept>" still works.
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce option --exclude-guides to the help command. With this option
being passed, "git help" will open man pages only for actual commands.
Since we know it is a command, we can use function help_unknown_command
to give the user advice on typos.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each test run generates a "count" file in t/test-results
that stores the number of successful, failed, etc tests.
If you run "t1234-foo.sh", that file is named as
"t/test-results/t1234-foo-$$.count"
The addition of the PID there is serving no purpose, and
makes analysis of the count files harder.
The presence of the PID dates back to 2d84e9f (Modify
test-lib.sh to output stats to t/test-results/*,
2008-06-08), but no reasoning is given there. Looking at the
current code, we can see that other files we write to
test-results (like *.exit and *.out) do _not_ have the PID
included. So the presence of the PID does not meaningfully
allow one to store the results from multiple runs anyway.
Moreover, anybody wishing to read the *.count files to
aggregate results has to deal with the presence of multiple
files for a given test (and figure out which one is the most
recent based on their timestamps!). The only consumer of
these files is the aggregate.sh script, which arguably gets
this wrong. If a test is run multiple times, its counts will
appear multiple times in the total (I say arguably only
because the desired semantics aren't documented anywhere,
but I have trouble seeing how this behavior could be
useful).
So let's just drop the PID, which fixes aggregate.sh, and
will make new features based around the count files easier
to write.
Note that since the count-file may already exist (when
re-running a test), we also switch the "cat" from appending
to truncating. The use of append here was pointless in the
first place, as we expected to always write to a unique file.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
2016-07-16) git blame also looks at the index to determine if there is a
file that was freshly added to the index.
cache_name_pos returns -pos - 1 in case there is no match is found, or
if the name matches, but the entry has a stage other than 0. As git
blame should work for unmerged files, it uses strcmp to determine
whether the name of the returned position matches, in which case the
file exists, but is merely unmerged, or if the file actually doesn't
exist in the index.
If the repository is empty, or if the file would lexicographically be
sorted as the last file in the repository, -cache_name_pos - 1 is
outside of the length of the active_cache array, causing git blame to
segfault. Guard against that, and die() normally to restore the old
behaviour.
Reported-by: Simon Ruderich <simon@ruderich.org>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tempfile (hence its user lockfile) API lets the caller to open
a file descriptor to a temporary file, write into it and then
finalize it by first closing the filehandle and then either
removing or renaming the temporary file. When the process spawns a
subprocess after obtaining the file descriptor, and if the
subprocess has not exited when the attempt to remove or rename is
made, the last step fails on Windows, because the subprocess has
the file descriptor still open. Open tempfile with O_CLOEXEC flag
to avoid this (on Windows, this is mapped to O_NOINHERIT).
* bw/mingw-avoid-inheriting-fd-to-lockfile:
mingw: ensure temporary file handles are not inherited by child processes
t6026-merge-attr: child processes must not inherit index.lock handles
Receive-pack feeds its input to either index-pack or
unpack-objects, which will happily accept as many bytes as
a sender is willing to provide. Let's allow an arbitrary
cutoff point where we will stop writing bytes to disk.
Cleaning up what has already been written to disk is a
related problem that is not addressed by this patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the default behavior of git-format-patch to generate numbered
sequence of 0/1 and 1/1 when generating both a cover-letter and a single
patch. This standardizes the cover letter to have 0/N which helps
distinguish the cover letter from the patch itself. Since the behavior
is easily changed via configuration as well as the use of -n and -N this
should be acceptable default behavior.
Add tests for the new default behavior.
Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This just shows off the improvements done by the last few
patches, and gives us a baseline for noticing regressions in
the future. Here are the results with linux.git as the perf
"large repo":
Test origin HEAD
-------------------------------------------------------------------
0003.1: log --raw 43.41(40.36+2.69) 33.86(30.96+2.41) -22.0%
0003.2: log -S 313.61(309.74+3.78) 298.75(295.58+3.00) -4.7%
(for a large repo, the "log -S" improvements are greater if
you bump the delta base cache limit, but I think it makes
sense to test the "stock" behavior, since that is what most
people will see).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is locked and child processes inherit the handle to
said lock and the parent process wants to remove the lock before the
child process exits, on Windows there is a problem: it won't work
because files cannot be deleted if a process holds a handle on them.
The symptom:
Rename from 'xxx/.git/index.lock' to 'xxx/.git/index' failed.
Should I try again? (y/n)
Spawning child processes with bInheritHandles==FALSE would not work
because no file handles would be inherited, not even the hStdXxx
handles in STARTUPINFO (stdin/stdout/stderr).
Opening every file with O_NOINHERIT does not work, either, as e.g.
git-upload-pack expects inherited file handles.
This leaves us with the only way out: creating temp files with the
O_NOINHERIT flag. This flag is Windows-specific, however. For our
purposes, it is equivalent to O_CLOEXEC (which does not exist on
Windows), so let's just open temporary files with the O_CLOEXEC flag and
map that flag to O_NOINHERIT on Windows.
As Eric Wong pointed out, we need to be careful to handle the case where
the Linux headers used to compile Git support O_CLOEXEC but the Linux
kernel used to run Git does not: it returns an EINVAL.
This fixes the test that we just introduced to demonstrate the problem.
Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rev-parse --git-path hooks/<hook>" learned to take
core.hooksPath configuration variable (introduced during 2.9 cycle)
into account.
* ab/hooks:
rev-parse: respect core.hooksPath in --git-path
"git difftool" by default ignores the error exit from the backend
commands it spawns, because often they signal that they found
differences by exiting with a non-zero status code just like "diff"
does; the exit status codes 126 and above however are special in
that they are used to signal that the command is not executable,
does not exist, or killed by a signal. "git difftool" has been
taught to notice these exit status codes.
* jk/difftool-command-not-found:
difftool: always honor fatal error exit codes
"git checkout --detach <branch>" used to give the same advice
message as that is issued when "git checkout <tag>" (or anything
that is not a branch name) is given, but asking with "--detach" is
an explicit enough sign that the user knows what is going on. The
advice message has been squelched in this case.
* sb/checkout-explit-detach-no-advice:
checkout: do not mention detach advice for explicit --detach option
The t0027 test for CRLF conversion was timing dependent and flaky.
* tb/t0027-raciness-fix:
convert: Correct NNO tests and missing `LF will be replaced by CRLF`
When "git merge-recursive" works on history with many criss-cross
merges in "verbose" mode, the names the command assigns to the
virtual merge bases could have overwritten each other by unintended
reuse of the same piece of memory.
* rs/pull-signed-tag:
commit: use FLEX_ARRAY in struct merge_remote_desc
merge-recursive: fix verbose output for multiple base trees
commit: factor out set_merge_remote_desc()
commit: use xstrdup() in get_merge_parent()
On Windows, a file cannot be removed unless all file handles to it have
been released. Hence it is particularly important to close handles when
spawning children (which would probably not even know that they hold on
to those handles).
The example chosen for this test is a custom merge driver that indeed
has no idea that it blocks the deletion of index.lock. The full use case
is a daemon that lives on after the merge, with subsequent invocations
handing off to the daemon, thereby avoiding hefty start-up costs. We
simulate this behavior by simply sleeping one second.
Note that the test only fails on Windows, due to the file locking issue.
Since we have no way to say "expect failure with MINGW, success
otherwise", we simply skip this test on Windows for now.
Signed-off-by: Ben Wijen <ben@wijen.net>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When `--recursive` and `--reference` is given, it is reasonable to
expect that the submodules are created with references to the submodules
of the given alternate for the superproject.
An initial attempt to do this was presented to the mailing list, which
used flags that are passed around ("--super-reference") that instructed
the submodule clone to look for a reference in the submodules of the
referenced superproject. This is not well thought out, as any further
`submodule update` should also respect the initial setup.
When a new submodule is added to the superproject and the alternate
of the superproject does not know about that submodule yet, we rather
error out informing the user instead of being unclear if we did or did
not use a submodules alternate.
To solve this problem introduce new options that store the configuration
for what the user wanted originally.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "t/" hierarchy is prone to get an unusual pathname; "make test"
has been taught to make sure they do not contain paths that cannot
be checked out on Windows (and the mechanism can be reusable to
catch pathnames that are not portable to other platforms as need
arises).
* js/test-lint-pathname:
t/Makefile: ensure that paths are valid on platforms we care
A small test clean-up for a topic introduced in v2.9.1 and later.
* sg/reflog-past-root:
t1410: remove superfluous 'git reflog' from the 'walk past root' test
The idea of the --git-path option is not only to avoid having to
prefix paths with the output of --git-dir all the time, but also to
respect overrides for specific common paths inside the .git directory
(e.g. `git rev-parse --git-path objects` will report the value of the
environment variable GIT_OBJECT_DIRECTORY, if set).
When introducing the core.hooksPath setting, we forgot to adjust
git_path() accordingly. This patch fixes that.
While at it, revert the special-casing of core.hooksPath in
run-command.c, as it is now no longer needed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some pathnames that are okay on ext4 and on HFS+ cannot be checked
out on Windows. Tests that want to see operations on such paths on
filesystems that support them must do so behind appropriate test
prerequisites, and must not include them in the source tree (instead
they should create them when they run). Otherwise, the source tree
cannot even be checked out.
Make sure that double-quotes, asterisk, colon, greater/less-than,
question-mark, backslash, tab, vertical-bar, as well as any non-ASCII
characters never appear in the pathnames with a new test-lint-* target
as part of a `make test`. To that end, we call `git ls-files` (ensuring
that the paths are quoted properly), relying on the fact that paths
containing non-ASCII characters are quoted within double-quotes.
In case that the source code does not actually live in a Git
repository (e.g. when extracted from a .zip file), or that the `git`
executable cannot be executed, we simply ignore the error for now; In
that case, our trusty Continuous Integration will be the last line of
defense and catch any problematic file name.
Noticed when a topic wanted to add a pathname with '>' in it. A
check like this will prevent a similar problems from happening in the
future.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
At the moment difftool's "trust exit code" logic always suppresses the
exit status of the diff utility we invoke. This is useful because we
don't want to exit just because diff returned "1" because the files
differ, but it's confusing if the shell returns an error because the
selected diff utility is not found.
POSIX specifies 127 as the exit status for "command not found", 126 for
"command found but is not executable" and values greater than 128 if the
command terminated because it received a signal [1] and at least bash
and dash follow this specification, while diff utilities generally use
"1" for the exit status we want to ignore.
Handle any value of 126 or greater as a special value indicating that
some form of fatal error occurred.
[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_08_02
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a user asked for a detached HEAD specifically with `--detach`,
we do not need to give advice on what a detached HEAD state entails as
we can assume they know what they're getting into as they asked for it.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test added in 71abeb753f (reflog: continue walking the reflog
past root commits, 2016-06-03) contains an unnecessary 'git reflog'
execution, which was part of my debug/tracing instrumentation that I
somehow didn't manage to remove before submitting.
Signed-off-by: SZEDER Gábor <szeder@ira.uka.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a non-reversible CRLF conversion is done in "git add",
a warning is printed on stderr (or Git dies, depending on checksafe)
The function commit_chk_wrnNNO() in t0027 was written to test this,
but did the wrong thing: Instead of looking at the warning
from "git add", it looked at the warning from "git commit".
This is racy because "git commit" may not have to do CRLF conversion
at all if it can use the sha1 value from the index (which depends on
whether "add" and "commit" run in a single second).
Correct t0027 and replace the commit for each and every file with a commit
of all files in one go.
The function commit_chk_wrnNNO() should be renamed in a separate commit.
Now that t0027 does the right thing, it detects a bug in covert.c:
This sequence should generate the warning `LF will be replaced by CRLF`,
but does not:
$ git init
$ git config core.autocrlf false
$ printf "Line\r\n" >file
$ git add file
$ git commit -m "commit with CRLF"
$ git config core.autocrlf true
$ printf "Line\n" >file
$ git add file
"git add" calls crlf_to_git() in convert.c, which calls check_safe_crlf().
When has_cr_in_index(path) is true, crlf_to_git() returns too early and
check_safe_crlf() is not called at all.
Factor out the code which determines if "git checkout" converts LF->CRLF
into will_convert_lf_to_crlf().
Update the logic around check_safe_crlf() and "simulate" the possible
LF->CRLF conversion at "git checkout" with help of will_convert_lf_to_crlf().
Thanks to Jeff King <peff@peff.net> for analyzing t0027.
Reported-By: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the indirect callers of make_virtual_commit() passes the result of
oid_to_hex() as the name, i.e. a pointer to a static buffer. Since the
function uses that string pointer directly in building a struct
merge_remote_desc, multiple entries can end up sharing the same name
inadvertently.
Fix that by calling set_merge_remote_desc(), which creates a copy of the
string, instead of building the struct by hand.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The concerned test greps the error message in git_parse_source() which
contains "bad config line %d in submodule-blob %s".
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use test_i18ngrep function instead of grep for grepping strings.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The concerned test greps the output of exit_with_patch() in
git-rebase--interactive.sh script.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tests consisting of one line each can be consolidated to have fewer tests
to run as well as fewer lines of code.
When having just a few git commands, do not create a new shell but
use the -C flag in Git to execute in the correct directory.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
No functional change intended. This commit only changes formatting
to the style we recently use, e.g. starting the body of a test with a
single quote on the same line as the header, and then having the test
indented in the following lines.
Whenever we change directories, we do that in subshells.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git rebase" tries to compare set of changes on the updated
upstream and our own branch, it computes patch-id for all of these
changes and attempts to find matches. This has been optimized by
lazily computing the full patch-id (which is expensive) to be
compared only for changes that touch the same set of paths.
* kw/patch-ids-optim:
rebase: avoid computing unnecessary patch IDs
patch-ids: add flag to create the diff patch id using header only data
patch-ids: replace the seen indicator with a commit pointer
patch-ids: stop using a hand-rolled hashmap implementation
"git difftool <paths>..." started in a subdirectory failed to
interpret the paths relative to that directory, which has been
fixed.
* jk/difftool-in-subdir:
difftool: use Git::* functions instead of passing around state
difftool: avoid $GIT_DIR and $GIT_WORK_TREE
difftool: fix argument handling in subdirs
The `rebase` family of Git commands avoid applying patches that were
already integrated upstream. They do that by using the revision walking
option that computes the patch IDs of the two sides of the rebase
(local-only patches vs upstream-only ones) and skipping those local
patches whose patch ID matches one of the upstream ones.
In many cases, this causes unnecessary churn, as already the set of
paths touched by a given commit would suffice to determine that an
upstream patch has no local equivalent.
This hurts performance in particular when there are a lot of upstream
patches, and/or large ones.
Therefore, let's introduce the concept of a "diff-header-only" patch ID,
compare those first, and only evaluate the "full" patch ID lazily.
Please note that in contrast to the "full" patch IDs, those
"diff-header-only" patch IDs are prone to collide with one another, as
adjacent commits frequently touch the very same files. Hence we now
have to be careful to allow multiple hash entries with the same hash.
We accomplish that by using the hashmap_add() function that does not even
test for hash collisions. This also allows us to evaluate the full patch ID
lazily, i.e. only when we found commits with matching diff-header-only
patch IDs.
We add a performance test that demonstrates ~1-6% improvement. In
practice this will depend on various factors such as how many upstream
changes and how big those changes are along with whether file system
caches are cold or warm. As Git's test suite has no way of catching
performance regressions, we also add a regression test that verifies
that the full patch ID computation is skipped when the diff-header-only
computation suffices.
Signed-off-by: Kevin Willford <kcwillford@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.
To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, parse_single_patch() should return a negative
integer instead of calling die().
Let's do that by using error() and let's adjust the related test
cases accordingly.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To libify `git apply` functionality we have to signal errors to the
caller instead of die()ing.
To do that in a compatible manner with the rest of the error handling
in builtin/apply.c, let's make find_header() return -128 instead of
calling die().
We could make it return -1, unfortunately find_header() already
returns -1 when no header is found.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add lf_to_nul helper function to test-lib-functions.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.
There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:
1. By this time, it's too late to find another delta for
the object, so the resulting pack is larger than it
otherwise could be.
2. The warning is there because this is something that
_shouldn't_ ever happen. If it does, then either:
a. a pack we are reusing deltas from had its own
cycle
b. we are reusing deltas from multiple packs, and
we found a cycle among them (i.e., A is a delta of
B in one pack, but B is a delta of A in another,
and we choose to use both deltas).
c. there is a bug in the delta-search code
So this code serves as a final check that none of these
things has happened, warns the user, and prevents us
from writing a bogus pack.
Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.
However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.
We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).
This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it. However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.
We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few updates to "git submodule update".
Use of "| wc -l" break with BSD variant of 'wc'.
* sb/submodule-update-dot-branch:
t7406: fix breakage on OSX
submodule update: allow '.' for branch value
submodule--helper: add remote-branch helper
submodule-config: keep configured branch around
submodule--helper: fix usage string for relative-path
submodule update: narrow scope of local variable
submodule update: respect depth in subsequent fetches
t7406: future proof tests with hard coded depth
"git am -3" calls "git merge-recursive" when it needs to fall back
to a three-way merge; this call has been turned into an internal
subroutine call instead of spawning a separate subprocess.
* js/am-3-merge-recursive-direct:
merge-recursive: flush output buffer even when erroring out
merge_trees(): ensure that the callers release output buffer
merge-recursive: offer an option to retain the output in 'obuf'
merge-recursive: write the commit title in one go
merge-recursive: flush output buffer before printing error messages
am -3: use merge_recursive() directly again
merge-recursive: switch to returning errors instead of dying
merge-recursive: handle return values indicating errors
merge-recursive: allow write_tree_from_memory() to error out
merge-recursive: avoid returning a wholesale struct
merge_recursive: abort properly upon errors
prepare the builtins for a libified merge_recursive()
merge-recursive: clarify code in was_tracked()
die(_("BUG")): avoid translating bug messages
die("bug"): report bugs consistently
t5520: verify that `pull --rebase` shows the helpful advice when failing
"git format-patch" learned format.from configuration variable to
specify the default settings for its "--from" option.
* jt/format-patch-from-config:
format-patch: format.from gives the default for --from
"git push --force-with-lease" already had enough logic to allow
ensuring that such a push results in creation of a ref (i.e. the
receiving end did not have another push from sideways that would be
discarded by our force-pushing), but didn't expose this possibility
to the users. It does so now.
* jk/push-force-with-lease-creation:
t5533: make it pass on case-sensitive filesystems
push: allow pushing new branches with --force-with-lease
push: add shorthand for --force-with-lease branch creation
Documentation/git-push: fix placeholder formatting
FreeBSD can lie when asked mtime of a directory, which made the
untracked cache code to fall back to a slow-path, which in turn
caused tests in t7063 to fail because it wanted to verify the
behaviour of the fast-path.
* nd/fbsd-lazy-mtime:
t7063: work around FreeBSD's lazy mtime update feature
Windows port was failing some tests in t4130, due to the lack of
inum in the returned values by its lstat(2) emulation.
* js/t4130-rename-without-ino:
t4130: work around Windows limitation
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.
* jc/grep-commandline-vs-configuration:
grep: further simplify setting the pattern type
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
"git status" learned to suggest "merge --abort" during a conflicted
merge, just like it already suggests "rebase --abort" during a
conflicted rebase.
* mm/status-suggest-merge-abort:
status: suggest 'git merge --abort' when appropriate
On OSX `wc` prefixes the output of numbers with whitespace, such
that the `commit_count` would be "SP <NUMBER>". When using that in
git submodule update --init --depth=$commit_count
the depth would be empty and the number is interpreted as the
pathspec. Fix this by not using `wc` and rather instruct rev-list
to count.
Another way to fix this is to remove the `=` sign after the
`--depth` argument as then we are allowed to have more than just one
whitespace between `--depth` and the actual number. Prefer the
solution of rev-list counting as that is expected to be slightly
faster and more self-contained within Git.
Reported-by: Lars Schneider <larsxschneider@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>,
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The build procedure learned PAGER_ENV knob that lists what default
environment variable settings to export for popular pagers. This
mechanism is used to tweak the default settings to MORE on FreeBSD.
* ew/build-time-pager-tweaks:
pager: move pager-specific setup into the build
FreeBSD can lie when asked mtime of a directory, which made the
untracked cache code to fall back to a slow-path, which in turn
caused tests in t7063 to fail because it wanted to verify the
behaviour of the fast-path.
* nd/fbsd-lazy-mtime:
t7063: work around FreeBSD's lazy mtime update feature
An entry "git log --decorate" for the tip of the current branch is
shown as "HEAD -> name" (where "name" is the name of the branch);
paint the arrow in the same color as "HEAD", not in the color for
commits.
* nd/log-decorate-color-head-arrow:
log: decorate HEAD -> branch with the same color for arrow and HEAD
The t3700 test about "add --chmod=-x" have been made a bit more
robust and generally cleaned up.
* ib/t3700-add-chmod-x-updates:
t3700: add a test_mode_in_index helper function
t3700: merge two tests into one
t3700: remove unwanted leftover files before running new tests
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs. The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.
* jk/pack-objects-optim:
pack-objects: compute local/ignore_pack_keep early
pack-objects: break out of want_object loop early
find_pack_entry: replace last_found_pack with MRU cache
add generic most-recently-used list
sha1_file: drop free_pack_by_name
t/perf: add tests for many-pack scenarios
"git difftool <paths>..." started in a subdirectory failed to
interpret the paths relative to that directory, which has been
fixed.
* jk/difftool-in-subdir:
difftool: use Git::* functions instead of passing around state
difftool: avoid $GIT_DIR and $GIT_WORK_TREE
difftool: fix argument handling in subdirs
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.
* jk/reflog-date:
date: clarify --date=raw description
date: add "unix" format
date: document and test "raw-local" mode
doc/pretty-formats: explain shortening of %gd
doc/pretty-formats: describe index/time formats for %gd
doc/rev-list-options: explain "-g" output formats
doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
Tests for "git svn" have been taught to reuse the lib-httpd test
infrastructure when testing the subversion integration that
interacts with subversion repositories served over the http://
protocol.
* ew/git-svn-http-tests:
git svn: migrate tests to use lib-httpd
t/t91*: do not say how to avoid the tests
Windows port was failing some tests in t4130, due to the lack of
inum in the returned values by its lstat(2) emulation.
* js/t4130-rename-without-ino:
t4130: work around Windows limitation
Code cleanup.
* rs/submodule-config-code-cleanup:
submodule-config: fix test binary crashing when no arguments given
submodule-config: combine early return code into one goto
submodule-config: passing name reference for .gitmodule blobs
submodule-config: use explicit empty string instead of strbuf in config_from()
Build clean-up.
* nd/test-helpers:
t/test-lib.sh: fix running tests with --valgrind
Makefile: use VCSSVN_LIB to refer to svn library
Makefile: drop extra dependencies for test helpers
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
An age old bug that caused "git diff --ignore-space-at-eol"
misbehave has been fixed.
* js/ignore-space-at-eol:
diff: fix a double off-by-one with --ignore-space-at-eol
diff: demonstrate a bug with --patience and --ignore-space-at-eol
"git fetch http://user:pass@host/repo..." scrubbed the userinfo
part, but "git push" didn't.
* jk/push-scrub-url:
t5541: fix url scrubbing test when GPG is not set
push: anonymize URL in status output
"git add -N dir/file && git write-tree" produced an incorrect tree
when there are other paths in the same directory that sorts after
"file".
* nd/cache-tree-ita:
cache-tree: do not generate empty trees as a result of all i-t-a subentries
cache-tree.c: fix i-t-a entry skipping directory updates sometimes
test-lib.sh: introduce and use $EMPTY_BLOB
test-lib.sh: introduce and use $EMPTY_TREE
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit. When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.
* mh/blame-worktree:
t/t8003-blame-corner-cases.sh: Use here documents
blame: allow to blame paths freshly added to the index
Update --porcelain argument to take optional version parameter
to allow multiple porcelain formats to be supported in the future.
The token "v1" is the default value and indicates the traditional
porcelain format. (The token "1" is an alias for that.)
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git -c grep.patternType=extended log --basic-regexp" misbehaved
because the internal API to access the grep machinery was not
designed well.
* jc/grep-commandline-vs-configuration:
grep: further simplify setting the pattern type
Allowing PAGER_ENV to be set at build-time allows us to move
pager-specific knowledge out of our build. This allows us to
set a better default for FreeBSD more(1), which pretends not to
understand ANSI color escapes if the MORE environment variable
is left empty, but accepts the same variables as less(1)
Originally-from:
https://public-inbox.org/git/xmqq61piw4yf.fsf@gitster.dls.corp.google.com/
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The newly-added test case wants to commit a file "c.t" (note the lower
case) when a previous test case already committed a file "C.t". This
confuses Git to the point that it thinks "c.t" was not staged when "git
add c.t" was called.
Simply make the naming of the test commits consistent with the previous
test cases: use upper-case, and advance in the alphabet.
This came up in local work to rebase the Windows-specific patches to the
current `next` branch. An identical fix was suggested by John Keeping.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Let's start with the commit message of [1] from freebsd.git [2]
Sync timestamp changes for inodes of special files to disk as late
as possible (when the inode is reclaimed). Temporarily only do
this if option UFS_LAZYMOD configured and softupdates aren't
enabled. UFS_LAZYMOD is intentionally left out of
/sys/conf/options.
This is mainly to avoid almost useless disk i/o on battery powered
machines. It's silly to write to disk (on the next sync or when
the inode becomes inactive) just because someone hit a key or
something wrote to the screen or /dev/null.
PR: 5577 [3]
The short version of that, in the context of t7063, is that when a
directory is updated, its mtime may be updated later, not
immediately. This can be shown with a simple command sequence
date; sleep 1; touch abc; rm abc; sleep 10; ls -lTd .
One would expect that the date shown in `ls` would be one second from
`date`, but it's 10 seconds later. If we put another `ls -lTd .` in
front of `sleep 10`, then the date of the last `ls` comes as
expected. The first `ls` somehow forces mtime to be updated.
t7063 is really sensitive to directory mtime. When mtime is too "new",
git code suspects racy timestamps and will not trigger the shortcut in
untracked cache, in t7063.24 and eventually be detected in t7063.27
We have two options thanks to this special FreeBSD feature:
1) Stop supporting untracked cache on FreeBSD. Skip t7063 entirely
when running on FreeBSD
2) Work around this problem (using the same 'ls' trick) and continue
to support untracked cache on FreeBSD
I initially wanted to go with 1) because I didn't know the exact
nature of this feature and feared that it would make untracked cache
work unreliably, using the cached version when it should not.
Since the behavior of this thing is clearer now. The picture is not
that bad. If this indeed happens often, untracked cache would assume
racy condition more often and _fall back_ to non-untracked cache code
paths. Which means it may be less effective, but it will not show
wrong things.
This patch goes with option 2.
PS. For those who want to look further in FreeBSD source code, this
flag is now called IN_LAZYMOD. I can see it's effective in ext2 and
ufs. zfs is not affected.
[1] 660e6408e6df99a20dacb070c5e7f9739efdf96d
[2] git://github.com/freebsd/freebsd.git
[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=5577
Reported-by: Eric Wong <e@80x24.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Gerrit has a "superproject subscription" feature[1], that triggers a
commit in a superproject that is subscribed to its submodules.
Conceptually this Gerrit feature can be done on the client side with
Git via (except for raciness, error handling etc):
while [ true ]; do
git -C <superproject> submodule update --remote --force
git -C <superproject> commit -a -m "Update submodules"
git -C <superproject> push
done
for each branch in the superproject. To ease the configuration in Gerrit
a special value of "." has been introduced for the submodule.<name>.branch
to mean the same branch as the superproject[2], such that you can create a
new branch on both superproject and the submodule and this feature
continues to work on that new branch.
Now we find projects in the wild with such a .gitmodules file.
The .gitmodules used in these Gerrit projects do not conform
to Gits understanding of how .gitmodules should look like.
This teaches Git to deal gracefully with this syntax as well.
The redefinition of "." does no harm to existing projects unaware of
this change, as "." is an invalid branch name in Git, so we do not
expect such projects to exist.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is an optimization used in "git diff $treeA $treeB" to borrow
an already checked-out copy in the working tree when it is known to
be the same as the blob being compared, expecting that open/mmap of
such a file is faster than reading it from the object store, which
involves inflating and applying delta. This however kicked in even
when the checked-out copy needs to go through the convert-to-git
conversion (including the clean filter), which defeats the whole
point of the optimization. The optimization has been disabled when
the conversion is necessary.
* jk/diff-do-not-reuse-wtf-needs-cleaning:
diff: do not reuse worktree files that need "clean" conversion
Code cleanup.
* rs/submodule-config-code-cleanup:
submodule-config: fix test binary crashing when no arguments given
submodule-config: combine early return code into one goto
submodule-config: passing name reference for .gitmodule blobs
submodule-config: use explicit empty string instead of strbuf in config_from()
"git status" learned to suggest "merge --abort" during a conflicted
merge, just like it already suggests "rebase --abort" during a
conflicted rebase.
* mm/status-suggest-merge-abort:
status: suggest 'git merge --abort' when appropriate
"git push" learned to accept and pass extra options to the
receiving end so that hooks can read and react to them.
* sb/push-options:
add a test for push options
push: accept push options
receive-pack: implement advertising and receiving push options
push options: {pre,post}-receive hook learns about push options
On Windows, it is already pretty expensive to try to recreate the stat()
data that Git assumes is cheap to obtain. To make things halfway decent
in performance, we even have to skip emulating the inode and to
determine the number of hard links.
This is not a huge problem, usually, as either the size or the mtime or
the ctime are tell-tale enough to say when a file has changed, and even
if not, those changes are typically made after the index file was
written, triggering a rehashing of the files' contents.
The t4130-apply-criss-cross-rename test case, however, requires the
inode to determine that files of equal size were swapped, as renaming
files does not update their mtime. Every once in a while, t4130 fails
on Windows because of this missing piece.
Equal file sizes are not crucial for the test cases, however. Hence,
generate files with different sizes so that there is some property that
the swapped files can be discovered reliably even on Windows.
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When depth is given the user may have a reasonable expectation that
any remote operation is using the given depth. Add a test to demonstrate
we still get the desired sha1 even if the depth is too short to
include the actual commit.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The prior hard coded depth was chosen to be exactly the length from the
recorded gitlink to the tip of the remote, so if you add more commits
to the remote before, this test will not test its intention any more.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The case statement to check the file mode of a staged file appears
a number of times.
Simplify the test by utilizing a test_mode_in_index helper function.
Signed-off-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Depending on the underlying platform a chmod may be a noop. Although it
wouldn't harm the result of the '--chmod=-x' test, there is a more
robust way to make sure the --chmod option works both ways.
Merge the two separate tests for the --chmod option into one, checking
both permissions on the same file.
Signed-off-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When an earlier test that has prerequisite is skipped, files
used by later tests may be left in the working tree in an
unexpected state. For example, a test runs this sequence:
echo foo >xfoo1 && chmod 755 xfoo1
to create an executable file xfoo1, expecting that xfoo1
does not exist before it runs in the test sequence.
However, the absence of this file depends on "git reset
--hard" done in an earlier test, that is skipped when SANITY
prerequisite is not met, and worse yet, xfoo1 originally is
created as a symbolic link, which means the chmod does not
affect the modes of xfoo1 as this test expects.
Fix this by starting the test with "rm -f xfoo1" to make
sure the file is created from scratch, and do the same to
other similar tests.
Signed-off-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace uses of strbuf_addf() for adding strings with more lightweight
strbuf_addstr() calls.
In http-push.c it becomes easier to see what's going on without having
to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain
any format specifiers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps users who would prefer format-patch to default to --from,
and makes it easier to change the default in the future.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the very inception of interactive-rebase in 1b1dce4
(Teach rebase an interactive mode, 2007-06-25), there has
been a preemptive check, before looking at any commits, to
see whether the user has a valid name/email combination.
This is convenient, because it means that we abort the
operation before even beginning (rather than just
complaining that we are unable to pick a particular commit).
However, it does the wrong thing when the rebase does not
actually need to generate any new commits (e.g., a
fast-forward with no commits to pick, or one where the base
stays the same, and we just pick the same commits without
rewriting anything). In this case it may complain about the
lack of ident, even though one would not be needed to
complete the operation.
This may seem like mere nit-picking, but because interactive
rebase underlies the "preserve-merges" rebase, somebody who
has set "pull.rebase" to "preserve" cannot make even a
fast-forward pull without a valid ident, as we bail before
even realizing the fast-forward nature.
This commit drops the extra ident check entirely. This means
we rely on individual commands that generate commit objects
to complain. So we will continue to notice and prevent cases
that actually do create commits, but with one important
difference: we fail while actually executing the "pick"
operations, and leave the rebase in a conflicted, half-done
state.
In some ways this is less convenient, but in some ways it is
more so; the user can then manually commit or even "git
rebase --continue" after setting up their ident (or
providing it as a one-off on the command line).
Reported-by: Dakota Hawkins <dakotahawkins@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's pack storage does efficient (log n) lookups in a
single packfile's index, but if we have multiple packfiles,
we have to linearly search each for a given object. This
patch introduces some timing tests for cases where we have a
large number of packs, so that we can measure any
improvements we make in the following patches.
The main thing we want to time is object lookup. To do this,
we measure "git rev-list --objects --all", which does a
fairly large number of object lookups (essentially one per
object in the repository).
However, we also measure the time to do a full repack, which
is interesting for two reasons. One is that in addition to
the usual pack lookup, it has its own linear iteration over
the list of packs. And two is that because it it is the tool
one uses to go from an inefficient many-pack situation back
to a single pack, we care about its performance not only at
marginal numbers of packs, but at the extreme cases (e.g.,
if you somehow end up with 5,000 packs, it is the only way
to get back to 1 pack, so we need to make sure it performs
well).
We measure the performance of each command in three
scenarios: 1 pack, 50 packs, and 1,000 packs.
The 1-pack case is a baseline; any optimizations we do to
handle multiple packs cannot possibly perform better than
this.
The 50-pack case is as far as Git should generally allow
your repository to go, if you have auto-gc enabled with the
default settings. So this represents the maximum performance
improvement we would expect under normal circumstances.
The 1,000-pack case is hopefully rare, though I have seen it
in the wild where automatic maintenance was broken for some
time (and the repository continued to receive pushes). This
represents cases where we care less about general
performance, but want to make sure that a full repack
command does not take excessively long.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Environment variables are global and hard to reason about.
Use the `--git-dir` and `--work-tree` arguments when invoking `git`
instead of relying on the environment.
Add a test to ensure that difftool's dir-diff feature works when these
variables are present in the environment.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 959b5455 (submodule: implement a config API for lookup of
.gitmodules values, 2015-08-18) implemented the initial version of the
submodule config cache. During development of that initial version we
extracted the function gitmodule_sha1_from_commit(). During that process
we missed that the strbuf rev was still used in config_from() and now is
left empty. Lets fix this by also returning this string.
This means that now when reading .gitmodules from revisions, the error
messages also contain a reference to the blob they are from.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Reviewed-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A test that unconditionally used "mktemp" learned that the command
is not necessarily available everywhere.
* ak/lazy-prereq-mktemp:
t7610: test for mktemp before test execution
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
The test framework learned a new helper test_match_signal to
check an exit code from getting killed by an expected signal.
* jk/test-match-signal:
t/lib-git-daemon: use test_match_signal
test_must_fail: use test_match_signal
t0005: use test_match_signal as appropriate
tests: factor portable signal check out of t0005
"git rebase -i --autostash" did not restore the auto-stashed change
when the operation was aborted.
* ps/rebase-i-auto-unstash-upon-abort:
rebase -i: restore autostash on abort
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files. But we did so by mistake, which has been corrected.
* nd/ita-cleanup:
grep: fix grepping for "intent to add" files
t7810-grep.sh: fix a whitespace inconsistency
t7810-grep.sh: fix duplicated test name
A helper function that takes the contents of a commit object and
finds its subject line did not ignore leading blank lines, as is
commonly done by other codepaths. Make it ignore leading blank
lines to match.
* js/find-commit-subject-ignore-leading-blanks:
reset --hard: skip blank lines when reporting the commit subject
sequencer: use skip_blank_lines() to find the commit subject
commit -C: skip blank lines at the beginning of the message
commit.c: make find_commit_subject() more robust
pretty: make the skip_blank_lines() function public
Add a test to specify the desired behaviour that currently is not
available in "git rebase -Xsubtree=...".
* dg/subtree-rebase-test:
contrib/subtree: Add a test for subtree rebase that loses commits
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.
* nd/pack-ofs-4gb-limit:
fsck: use streaming interface for large blobs in pack
pack-objects: do not truncate result in-pack object size on 32-bit systems
index-pack: correct "offset" type in unpack_entry_data()
index-pack: report correct bad object offsets even if they are large
index-pack: correct "len" type in unpack_data()
sha1_file.c: use type off_t* for object_info->disk_sizep
pack-objects: pass length to check_pack_crc() without truncation
"git worktree prune" protected worktrees that are marked as
"locked" by creating a file in a known location. "git worktree"
command learned a dedicated command pair to create and remove such
a file, so that the users do not have to do this with editor.
* nd/worktree-lock:
worktree.c: find_worktree() search by path suffix
worktree: add "unlock" command
worktree: add "lock" command
worktree.c: add is_worktree_locked()
worktree.c: add is_main_worktree()
worktree.c: add find_worktree()
A few tests that specifically target "git rebase -i" have been
added.
* js/rebase-i-tests:
rebase -i: we allow extra spaces after fixup!/squash!
rebase -i: demonstrate a bug with --autosquash
t3404: add a test for the --gpg-sign option
We already have "--date=raw", which is a Unix epoch
timestamp plus a contextual timezone (either the author's or
the local). But one may not care about the timezone and just
want the epoch timestamp by itself. It's not hard to parse
the two apart, but if you are using a pretty-print format,
you may want git to show the "finished" form that the user
will see.
We can accomodate this by adding a new date format, "unix",
which is basically "raw" without the timezone.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "raw" format shows a Unix epoch timestamp, but with a
timezone tacked on. The timestamp is not _in_ that zone, but
it is extra information about the time (by default, the zone
the author was in).
The documentation claims that "raw-local" does not work. It
does, but the end result is rather subtle. Let's describe it
in better detail, and test to make sure it works (namely,
the epoch time doesn't change, but the zone does).
While we are rewording the documentation in this area, let's
not use the phrase "does not work" for the remaining option,
"--date=relative". It's vague; do we accept it or not? We do
accept it, but it has no effect (which is a reasonable
outcome). We should also refer to the option not as
"--relative" (which is the historical synonym, and does not
take "-local" at all), but as "--date=relative".
Helped-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our usual style in the test scripts is to indent here
documents with tabs, and use "<<-" to strip the tabs. The
result is easier to read.
This old test script did not do so in its inception, and
further tests added onto it followed the local style. Let's
bring it in line with our usual style.
Some of the tests actually care quite a bit about
whitespace, but none of them do so at the beginning of the
line (because they use things like qz_to_tab_space to avoid
depending on the literal whitespace), so we can do a fairly
mechanical conversion.
Most of the here-docs also use interpolation, so they have
been left as "<<-EOF". In a few cases, though, where
interpolation was not in use, I've converted them to
"<<-\EOF" to match our usual "don't interpolate unless you
need to" style.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test currently does something like:
do_one() &&
do_two() &&
test_expect_success ...
We generally avoid performing actions at the top-level of
the script (outside of a test_expect block) for two reasons:
1. The test harness is not checking and reporting if they
fail.
2. Their output is not handled correctly (not hidden by
default, nor shown with "-v").
Using &&-chains seems like it should help with (1), but it
doesn't. If either of the commands fails, we simply skip
running the follow-on test entirely, and the test harness
has no idea.
We can fix this by pushing that setup into its own block.
It _could_ go into the following test block, but since the
result in this case is used by multiple tests, it's more
clear to mark it explicitly as a distinct setup step.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If there is no upstream information for a branch, it is likely that it
is newly created and can safely be pushed under the normal fast-forward
rules. Relax the --force-with-lease check so that we do not reject
these branches immediately but rather attempt to push them as new
branches, using the null SHA-1 as the expected value.
In fact, it is already possible to push new branches using the explicit
--force-with-lease=<branch>:<expect> syntax, so all we do here is make
this behaviour the default if no explicit "expect" value is specified.
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow the empty string to stand in for the null SHA-1 when pushing a new
branch, like we do when deleting branches.
This means that the following command ensures that `new-branch` is
created on the remote (that is, is must not already exist):
git push --force-with-lease=new-branch: origin new-branch
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was noticed by Brendan Forster last October that the builtin `git am`
regressed on that. Our hot fix reverted to spawning the recursive merge
instead of using it as a library function.
As we are about to revert that hot fix, after making the recursive merge a
true library function (i.e. a function that does not die() in case of
"normal" errors), let's add a test that verifies that we do not regress on
the same problem which made the hot fix necessary in the first place.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Skip tests when running under GETTEXT_POISON build and run them with
C_LOCALE_OUTPUT prerequisite.
These tests are irrelevant under GETTEXT_POISON because they test text
output alignment which GETTEXT_POISON turns useless.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame file" allowed the lineage of lines in the uncommitted,
unadded contents of "file" to be inspected, but it refused when
"file" did not appear in the current commit. When "file" was
created by renaming an existing file (but the change has not been
committed), this restriction was unnecessarily tight.
* mh/blame-worktree:
t/t8003-blame-corner-cases.sh: Use here documents
blame: allow to blame paths freshly added to the index
When "git fsck" reports a broken link (e.g. a tree object contains
a blob that does not exist), both containing object and the object
that is referred to were reported with their 40-hex object names.
The command learned the "--name-objects" option to show the path to
the containing object from existing refs (e.g. "HEAD~24^2:file.txt").
* js/fsck-name-object:
fsck: optionally show more helpful info for broken links
fsck: give the error function a chance to see the fsck_options
fsck_walk(): optionally name objects on the go
fsck: refactor how to describe objects
"git add -N dir/file && git write-tree" produced an incorrect tree
when there are other paths in the same directory that sorts after
"file".
* nd/cache-tree-ita:
cache-tree: do not generate empty trees as a result of all i-t-a subentries
cache-tree.c: fix i-t-a entry skipping directory updates sometimes
test-lib.sh: introduce and use $EMPTY_BLOB
test-lib.sh: introduce and use $EMPTY_TREE
"git fetch http://user:pass@host/repo..." scrubbed the userinfo
part, but "git push" didn't.
* jk/push-scrub-url:
t5541: fix url scrubbing test when GPG is not set
push: anonymize URL in status output
Build clean-up.
* nd/test-helpers:
t/test-lib.sh: fix running tests with --valgrind
Makefile: use VCSSVN_LIB to refer to svn library
Makefile: drop extra dependencies for test helpers
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.
* jc/renormalize-merge-kill-safer-crlf:
merge: avoid "safer crlf" during recording of merge results
convert: unify the "auto" handling of CRLF
An age old bug that caused "git diff --ignore-space-at-eol"
misbehave has been fixed.
* js/ignore-space-at-eol:
diff: fix a double off-by-one with --ignore-space-at-eol
diff: demonstrate a bug with --patience and --ignore-space-at-eol
Error handling in the codepaths that updates refs has been
improved.
* mh/update-ref-errors:
lock_ref_for_update(): avoid a symref resolution
lock_ref_for_update(): make error handling more uniform
t1404: add more tests of update-ref error handling
t1404: document function test_update_rejected
t1404: remove "prefix" argument to test_update_rejected
t1404: rename file to t1404-update-ref-errors.sh
Further preparatory work on the refs API before the pluggable
backend series can land.
* mh/split-under-lock: (33 commits)
lock_ref_sha1_basic(): only handle REF_NODEREF mode
commit_ref_update(): remove the flags parameter
lock_ref_for_update(): don't resolve symrefs
lock_ref_for_update(): don't re-read non-symbolic references
refs: resolve symbolic refs first
ref_transaction_update(): check refname_is_safe() at a minimum
unlock_ref(): move definition higher in the file
lock_ref_for_update(): new function
add_update(): initialize the whole ref_update
verify_refname_available(): adjust constness in declaration
refs: don't dereference on rename
refs: allow log-only updates
delete_branches(): use resolve_refdup()
ref_transaction_commit(): correctly report close_ref() failure
ref_transaction_create(): disallow recursive pruning
refs: make error messages more consistent
lock_ref_sha1_basic(): remove unneeded local variable
read_raw_ref(): move docstring to header file
read_raw_ref(): improve docstring
read_raw_ref(): rename symref argument to referent
...
This allows us to use common test infrastructure and parallelize
the tests. For now, GIT_SVN_TEST_HTTPD=true needs to be set to
enable the SVN HTTP tests because we reuse the same test cases
for both file:// and http:// SVN repositories. SVN_HTTPD_PORT
is no longer honored.
Tested under Apache 2.2 and 2.4 on Debian 7.x (wheezy) and
8.x (jessie), respectively.
Cc: Clemens Buchacher <drizzd@aon.at>
Cc: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some of the tests "say" how to stop the svn tests from running, some do
not.
The test suite is directed at people reading t/README where we keep all
information about running the test suite (partly, with options etc.).
Remove said "say" occurences.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When c5c31d33 (grep: move pattern-type bits support to top-level
grep.[ch], 2012-10-03) introduced grep_commit_pattern_type() helper
function, the intention was to allow the users of grep API to having
to fiddle only with .pattern_type_option (which can be set to "fixed",
"basic", "extended", and "pcre"), and then immediately before compiling
the pattern strings for use, call grep_commit_pattern_type() to have
it prepare various bits in the grep_opt structure (like .fixed,
.regflags, etc.).
However, grep_set_pattern_type_option() helper function the grep API
internally uses were left as an external function by mistake. This
function shouldn't have been made callable by the users of the API.
Later when the grep API was used in revision traversal machinery,
the caller then mistakenly started calling the function around
34a4ae55 (log --grep: use the same helper to set -E/-F options as
"git grep", 2012-10-03), instead of setting the .pattern_type_option
field and letting the grep_commit_pattern_type() to take care of the
details.
This caused an unnecessary bug that made a configured
grep.patternType take precedence over the command line options
(e.g. --basic-regexp, --fixed-strings) in "git log" family of
commands.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Checking the version of the installed SVN libraries should not
require a git repository at all. This matches the behavior of
"git --version".
Add a test for "git svn help" for the same behavior while we're
at it, too.
Signed-off-by: Eric Wong <e@80x24.org>
When accessing a blob for a diff, we may try to reuse file
contents in the working tree, under the theory that it is
faster to mmap those file contents than it would be to
extract the content from the object database.
When we have to filter those contents, though, that
assumption does not hold. Even for our internal conversions
like CRLF, we have to allocate and fill a new buffer anyway.
But much worse, for external clean filters we have to exec
an arbitrary script, and we have no idea how expensive it
may be to run.
So let's skip this optimization when conversion into git's
"clean" form is required. This applies whenever the
"want_file" flag is false. When it's true, the caller
actually wants the smudged worktree contents, which the
reused file by definition already has (in fact, this is a
key optimization going the other direction, since reusing
the worktree file there lets us skip smudge filters).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already suggest 'git rebase --abort' during a conflicted rebase.
Similarly, suggest 'git merge --abort' during conflict resolution on
'git merge'.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the GPG prereq is not set, we do not run test 34. That
test changes the directory of the test script as a side
effect (something we usually frown on, but which matches the
style of the rest of this script). When test 35 (the
url-scrubbing test) runs, it expects to be in the directory
from test 34. If it's not, the test fails; we are in a
different sub-repo, our test-commit is built on a different
history, and the push becomes a non-fast-forward.
We can fix this by unconditionally moving to the directory
we expect (again, against our usual style but matching how
the rest of the script operates).
As an additional protection, let's also switch from "make a
new commit and push to master" to just "push to a new
branch". We don't care about the branch name; we just want
_some_ ref update to trigger the status output. Pushing to a
new branch is less likely to run into problems with
force-updates, changing the checked-out branch, etc.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git p4" used a location outside $GIT_DIR/refs/ to place its
temporary branches, which has been moved to refs/git-p4-tmp/.
* ls/p4-tmp-refs:
git-p4: place temporary refs used for branch import under refs/git-p4-tmp
Improve the look of the way "git fetch" reports what happened to
each ref that was fetched.
* nd/fetch-ref-summary:
fetch: reduce duplicate in ref update status lines with placeholder
fetch: align all "remote -> local" output
fetch: change flag code for displaying tag update and deleted ref
fetch: refactor ref update status formatting code
git-fetch.txt: document fetch output
The test framework learned a new helper test_match_signal to
check an exit code from getting killed by an expected signal.
* jk/test-match-signal:
t/lib-git-daemon: use test_match_signal
test_must_fail: use test_match_signal
t0005: use test_match_signal as appropriate
tests: factor portable signal check out of t0005
There are certain house-keeping tasks that need to be performed at
the very beginning of any Git program, and programs that are not
built-in commands had to do them exactly the same way as "git"
potty does. It was easy to make mistakes in one-off standalone
programs (like test helpers). A common "main()" function that
calls cmd_main() of individual program has been introduced to
make it harder to make mistakes.
* jk/common-main:
mingw: declare main()'s argv as const
common-main: call git_setup_gettext()
common-main: call restore_sigpipe_to_default()
common-main: call sanitize_stdfds()
common-main: call git_extract_argv0_path()
add an extra level of indirection to main()
A test that unconditionally used "mktemp" learned that the command
is not necessarily available everywhere.
* ak/lazy-prereq-mktemp:
t7610: test for mktemp before test execution
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
The commands in the "log/diff" family have had an FILE* pointer in the
data structure they pass around for a long time, but some codepaths
used to always write to the standard output. As a preparatory step
to make "git format-patch" available to the internal callers, these
codepaths have been updated to consistently write into that FILE*
instead.
* js/log-to-diffopt-file:
mingw: fix the shortlog --output=<file> test
diff: do not color output when --color=auto and --output=<file> is given
t4211: ensure that log respects --output=<file>
shortlog: respect the --output=<file> setting
format-patch: use stdout directly
format-patch: avoid freopen()
format-patch: explicitly switch off color when writing to files
shortlog: support outputting to streams other than stdout
graph: respect the diffopt.file setting
line-log: respect diffopt's configured output file stream
log-tree: respect diffopt's configured output file stream
log: prepare log/log-tree to reuse the diffopt.close_file attribute
When reporting broken links between commits/trees/blobs, it would be
quite helpful at times if the user would be told how the object is
supposed to be reachable.
With the new --name-objects option, git-fsck will try to do exactly
that: name the objects in a way that shows how they are reachable.
For example, when some reflog got corrupted and a blob is missing that
should not be, the user might want to remove the corresponding reflog
entry. This option helps them find that entry: `git fsck` will now
report something like this:
broken link from tree b5eb6ff... (refs/stash@{<date>}~37:)
to blob ec5cf80...
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Somehow, this test was using:
{
echo A
echo B
} > file
block to feed file contents. This changes those to the form most common
in git test scripts:
cat >file <<-\EOF
A
B
EOF
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When blaming files, changes in the work tree are taken into account
and displayed as being "Not Committed Yet".
However, when blaming a file that is not known to the current HEAD,
git blame fails with `no such path 'foo' in HEAD`, even when the file
was git add'ed.
Allowing such a blame is useful when the new file added to the index
(not yet committed) was created by renaming an existing file. It
also is useful when the new file was created from pieces already in
HEAD, moved or copied from other files and blaming with copy
detection (i.e. "-C").
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a subdirectory contains nothing but i-t-a entries, we generate an
empty tree object and add it to its parent tree. Which is wrong. Such
a subdirectory should not be added.
Note that this has a cascading effect. If subdir 'a/b/c' contains
nothing but i-t-a entries, we ignore it. But then if 'a/b' contains
only (the non-existing) 'a/b/c', then we should ignore 'a/b' while
building 'a' too. And it goes all the way up to top directory.
Noticed-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 3cf773e (cache-tree: fix writing cache-tree when CE_REMOVE is
present - 2012-12-16) skips i-t-a entries when building trees objects
from the index. Unfortunately it may skip too much.
The code in question checks if an entry is an i-t-a one, then no tree
entry will be written. But it does not take into account that
directories can also be written with the same code. Suppose we have
this in the index.
a-file
subdir/file1
subdir/file2
subdir/file3
the-last-file
We write an entry for a-file as normal and move on to subdir/file1,
where we realize the entry name for this level is simply just
"subdir", write down an entry for "subdir" then jump three items ahead
to the-last-file.
That is what happens normally when the first file in subdir is not an
i-t-a entry. If subdir/file1 is an i-t-a, because of the broken
condition in this code, we still think "subdir" is an i-t-a file and
not writing "subdir" down and jump to the-last-file. The result tree
now only has two items: a-file and the-last-file. subdir should be
there too (even though it only records two sub-entries, file2 and
file3).
If the i-t-a entry is subdir/file2 or subdir/file3, this is not a
problem because we jump over them anyway. Which may explain why the
bug is hidden for nearly four years.
Fix it by making sure we only skip i-t-a entries when the entry in
question is actual an index entry, not a directory.
Reported-by: Yuri Kanivetsky <yuri.kanivetsky@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to $EMPTY_TREE this makes it easier to recognize this special
SHA-1 and change hash later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a special SHA1. Let's keep it at one place, easier to replace
later when the hash change comes, easier to recognize.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we are not yet moving everything to size_t but still using ulong
internally when talking about the size of object, platforms with
32-bit long will not be able to produce tar archive with 4GB+ file,
and cannot grok 077777777777UL as a constant. Disable the extended
header feature and do not test it on them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Skip tests that are unrunnable on platforms without 64-bit long
to avoid unnecessary test failures.
* jk/tzoffset-fix:
t0006: skip "far in the future" test when unsigned long is not long enough
Git's source code refers to timestamps as unsigned longs. On 32-bit
platforms, as well as on Windows, unsigned long is not large enough
to capture dates that are "absurdly far in the future".
While we can fix this issue properly by replacing unsigned long with
a larger type, we want to be a bit more conservative and just skip
those tests on the maint track.
Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The functions `mk_repo_pair` as well as `test_refs` are borrowed from
t5543-atomic-push, with additional hooks installed.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we tried to fix in 58461bd (t1308: do not get fooled by symbolic
links to the source tree, 2016-06-02) an obscure case where the user
cd's into Git's source code via a symbolic link, a regression was
introduced that affects all test runs on Windows.
The original patch introducing the test case in question was careful to
use `$(pwd)` instead of `$PWD`.
This was done to account for the fact that Git's test suite uses shell
scripting even on Windows, where the shell's Unix-y paths are
incompatible with the main Git executable's idea of paths: it only
accepts Windows paths.
It is an awkward but necessary thing, then, to use `$(pwd)` (which gives
us a Windows path) when interacting with the Git executable and `$PWD`
(which gives the shell's idea of the current working directory in Unix-y
form) for shell scripts, including the test suite itself.
Obviously this broke the use case of the Git maintainer when changing
the working directory into Git's source code directory via a symlink,
i.e. when `$(pwd)` does not agree with `$PWD`.
However, we must not fix that use case at the expense of regressing
another use case.
Let's special-case Windows here, even if it is ugly, for lack of a more
elegant solution.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 47abd85 (fetch: Strip usernames from url's before
storing them, 2009-04-17) taught fetch to anonymize URLs.
The primary purpose there was to avoid sticking passwords in
merge-commit messages, but as a side effect, we also avoid
printing them to stderr.
The push side does not have the merge-commit problem, but it
probably should avoid printing them to stderr. We can reuse
the same anonymizing function.
Note that for this to come up, the credentials would have to
appear either on the command line or in a git config file,
neither of which is particularly secure. So people _should_
be switching to using credential helpers instead, which
makes this problem go away. But that's no excuse not to
improve the situation for people who for whatever reason end
up using credentials embedded in the URL.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git archive" learned to handle files that are larger than 8GB and
commits far in the future than expressible by the traditional US-TAR
format.
* jk/big-and-future-archive-tar:
archive-tar: drop return value
archive-tar: write extended headers for far-future mtime
archive-tar: write extended headers for file sizes >= 8GB
t5000: test tar files that overflow ustar headers
t9300: factor out portable "head -c" replacement
Git does not know what the contents in the index should be for a
path added with "git add -N" yet, so "git grep --cached" should not
show hits (or show lack of hits, with -L) in such a path, but that
logic does not apply to "git grep", i.e. searching in the working
tree files. But we did so by mistake, which has been corrected.
* nd/ita-cleanup:
grep: fix grepping for "intent to add" files
t7810-grep.sh: fix a whitespace inconsistency
t7810-grep.sh: fix duplicated test name
"git rebase -i --autostash" did not restore the auto-stashed change
when the operation was aborted.
* ps/rebase-i-auto-unstash-upon-abort:
rebase -i: restore autostash on abort
Add a test to specify the desired behaviour that currently is not
available in "git rebase -Xsubtree=...".
* dg/subtree-rebase-test:
contrib/subtree: Add a test for subtree rebase that loses commits
More markings of messages for i18n, with updates to various tests
to pass GETTEXT_POISON tests.
One patch from the original submission dropped due to conflicts
with jk/upload-pack-hook, which is still in flux.
* va/i18n-even-more: (38 commits)
t5541: become resilient to GETTEXT_POISON
i18n: branch: mark comment when editing branch description for translation
i18n: unmark die messages for translation
i18n: submodule: escape shell variables inside eval_gettext
i18n: submodule: join strings marked for translation
i18n: init-db: join message pieces
i18n: remote: allow translations to reorder message
i18n: remote: mark URL fallback text for translation
i18n: standardise messages
i18n: sequencer: add period to error message
i18n: merge: change command option help to lowercase
i18n: merge: mark messages for translation
i18n: notes: mark options for translation
i18n: notes: mark strings for translation
i18n: transport-helper.c: change N_() call to _()
i18n: bisect: mark strings for translation
t5523: use test_i18ngrep for negation
t4153: fix negated test_i18ngrep call
t9003: become resilient to GETTEXT_POISON
tests: unpack-trees: update to use test_i18n* functions
...
For blobs, we want to make sure the on-disk data is not corrupted
(i.e. can be inflated and produce the expected SHA-1). Blob content is
opaque, there's nothing else inside to check for.
For really large blobs, we may want to avoid unpacking the entire blob
in memory, just to check whether it produces the same SHA-1. On 32-bit
systems, we may not have enough virtual address space for such memory
allocation. And even on 64-bit where it's not a problem, allocating a
lot more memory could result in kicking other parts of systems to swap
file, generating lots of I/O and slowing everything down.
For this particular operation, not unpacking the blob and letting
check_sha1_signature, which supports streaming interface, do the job
is sufficient. check_sha1_signature() is not shown in the diff,
unfortunately. But if will be called when "data_valid && !data" is
false.
We will call the callback function "fn" with NULL as "data". The only
callback of this function is fsck_obj_buffer(), which does not touch
"data" at all if it's a blob.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 76c61fb (log: decorate HEAD with branch name under
--decorate=full, too - 2015-05-13) adds "HEAD -> branch" decoration to
show current branch vs detached HEAD. The sign of whether HEAD is
detached or not is "->" (vs ",") because the branch is always colored
by type. Color the arrow the same as HEAD to visually emphasize that
the following branch is HEAD, without paying too much attention to the
actual separators "->" or ","
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When merge_recursive() decides what the correct blob object merge
result for a path should be, it uses update_file_flags() helper
function to write it out to a working tree file and then calls
add_cacheinfo(). The add_cacheinfo() function in turn calls
make_cache_entry() to create a new cache entry to replace the
higher-stage entries for the path that represents the conflict.
The make_cache_entry() function calls refresh_cache_entry() to fill
in the cached stat information. To mark a cache entry as
up-to-date, the data is re-read from the file in the working tree,
and goes through convert_to_git() conversion to be compared with the
blob object name the new cache entry records.
It is important to note that this happens while the higher-stage
entries, which are going to be replaced with the new entry, are
still in the index. Unfortunately, the convert_to_git() conversion
has a misguided "safer crlf" mechanism baked in, and looks at the
existing cache entry for the path to decide how to convert the
contents in the working tree file. If our side (i.e. stage#2)
records a text blob with CRLF in it, even when the system is
configured to record LF in blobs and convert them to CRLF upon
checkout (and back to LF upon checkin), the "safer crlf" mechanism
stops us doing so.
This especially poses a problem during a renormalizing merge, where
the merge result for the path is computed by first "normalizing" the
blobs involved in the merge by using convert_to_working_tree()
followed by convert_to_git() with "safer crlf" disabled. The merge
result that is computed correctly and fed to add_cacheinfo() via
update_file_flags() does _not_ match what refresh_cache_entry() sees
by converting the working tree file via convert_to_git().
We can work this around by not refreshing the new cache entry in
make_cache_entry() called by add_cacheinfo(). After add_cacheinfo()
adds the new entry, we can call refresh_cache_entry() on that,
knowing that addition of this new cache entry would have removed the
stale cache entries that had CRLF in stage #2 that were carried over
before the renormalizing merge started and will not interfere with
the correct recording of the result.
The test update was taken from a series by Torsten Bögershausen
that attempted to fix this with a different approach.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
Adjust t4201 to pass on Windows; a couple of test cases need to be
skipped on Windows which leads to a different shortlog than on Linux.
Let's just fix that by limiting the shortlog's commit range to traverse
only one commit: that guarantees that it does not matter how many test
cases were skipped.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We forgot to adjust this code path after moving the test helpers to
t/helper/.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When comparing two lines, ignoring any whitespace at the end, we first
try to match as many bytes as possible and break out of the loop only
upon mismatch, to let the remainder be handled by the code shared with
the other whitespace-ignoring code paths.
When comparing the bytes, however, we incremented the counters always,
even if the bytes did not match. And because we fall through to the
space-at-eol handling at that point, it is as if that mismatch never
happened.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a single character is added to a line, the combination of these
two options results in an empty diff.
This bug was noticed and reported by Naja Melan.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One among four invocations of readlink(1) in our test suite has
been rewritten so that the test can run on systems without the
command (others are in valgrind test framework and t9802).
* ak/t7800-wo-readlink:
t7800: readlink may not be available
The internal code used to show local timezone offset is not
prepared to handle timestamps beyond year 2100, and gave a
bogus offset value to the caller. Use a more benign looking
+0000 instead and let "git log" going in such a case, instead
of aborting.
* jk/tzoffset-fix:
local_tzoffset: detect errors from tm_to_time_t
t0006: test various date formats
t0006: rename test-date's "show" to "relative"
Fix an unintended regression in v2.9 that breaks "clone --depth"
that recurses down to submodules by forcing the submodules to also
be cloned shallowly, which many server instances that host upstream
of the submodules are not prepared for.
* sb/clone-shallow-passthru:
clone: do not let --depth imply --shallow-submodules
"log --graph --format=" learned that "%>|(N)" specifies the width
relative to the terminal's left edge, not relative to the area to
draw text that is to the right of the ancestry-graph section. It
also now accepts negative N that means the column limit is relative
to the right border.
* nd/graph-width-padded:
pretty.c: support <direction>|(<negative number>) forms
pretty: pass graph width to pretty formatting for use in '%>|(N)'
"git log" learns log.showSignature configuration variable, and a
command line option "--no-show-signature" to countermand it.
* mj/log-show-signature-conf:
log: add log.showSignature configuration variable
log: add "--no-show-signature" command line option
t4202: refactor test
A helper function that takes the contents of a commit object and
finds its subject line did not ignore leading blank lines, as is
commonly done by other codepaths. Make it ignore leading blank
lines to match.
* js/find-commit-subject-ignore-leading-blanks:
reset --hard: skip blank lines when reporting the commit subject
sequencer: use skip_blank_lines() to find the commit subject
commit -C: skip blank lines at the beginning of the message
commit.c: make find_commit_subject() more robust
pretty: make the skip_blank_lines() function public
Allow t/perf framework to use the features from the most recent
version of Git even when testing an older installed version.
* jk/perf-any-version:
p4211: explicitly disable renames in no-rename test
t/perf: fix regression in testing older versions of git
The output coloring scheme learned two new attributes, italic and
strike, in addition to existing bold, reverse, etc.
* jk/ansi-color:
color: support strike-through attribute
color: support "italic" attribute
color: allow "no-" for negating attributes
color: refactor parse_attr
add skip_prefix_mem helper
doc: refactor description of color format
color: fix max-size comment
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git-P4 used to place temporary refs under "git-p4-tmp". Since 3da1f37
Git checks that all refs are placed under "refs". Instruct Git-P4 to
place temporary refs under "refs/git-p4-tmp". There are no backwards
compatibility considerations as these refs are transient.
Use "git show-ref --verify" to check the (non-)existience of the refs
instead of file checks assuming the file-based ref backend.
All refs under "refs" are shared across all worktrees. This is not
desired for temporary Git-P4 refs and will be adressed in a later patch.
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
Reviewed-by: Vitor Antunes <vitor.hda@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new test case ensures that we handle commit messages that start
with fixup! or squash! followed by more than one space. While we do
not generate such messages when committing with --fixup/--squash, it
is perfectly legal for users to hand-craft their own fixup messages,
and we heed Postel's law by being lenient.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When rearranging the edit script, we happily mistake the comment
character for a command, and the command for a SHA-1. As a consequence,
when we move fixup! and squash! commits, our logic to skip lines with
already handled SHA-1s mistakenly skips anything but the first
commented-out pick line, too.
The upcoming rebase--helper patches will address this bug, therefore we
do not need to make the current autosquash code even more complex than
it already is, just to fix this bug.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For the upcoming rebase--helper work (which will accelerate the
interactive rebase noticably), it is important to verify that the
--gpg-sign option is handled properly.
Please note that this patch does this on the cheap, by verifying that
the expected option is printed in the message of the 'edit' operation.
We really should test that the interactive rebase signs the commits
properly, iff GPG is available. This test is left for later.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One among four invocations of readlink(1) in our test suite has
been rewritten so that the test can run on systems without the
command (others are in valgrind test framework and t9802).
* ak/t7800-wo-readlink:
t7800: readlink may not be available
The internal code used to show local timezone offset is not
prepared to handle timestamps beyond year 2100, and gave a
bogus offset value to the caller. Use a more benign looking
+0000 instead and let "git log" going in such a case, instead
of aborting.
* jk/tzoffset-fix:
local_tzoffset: detect errors from tm_to_time_t
t0006: test various date formats
t0006: rename test-date's "show" to "relative"
Fix an unintended regression in v2.9 that breaks "clone --depth"
that recurses down to submodules by forcing the submodules to also
be cloned shallowly, which many server instances that host upstream
of the submodules are not prepared for.
* sb/clone-shallow-passthru:
clone: do not let --depth imply --shallow-submodules
A new run-command API function pipe_command() is introduced to
sanely feed data to the standard input while capturing data from
the standard output and the standard error of an external process,
which is cumbersome to hand-roll correctly without deadlocking.
The codepath to sign data in a prepared buffer with GPG has been
updated to use this API to read from the status-fd to check for
errors (instead of relying on GPG's exit status).
* jk/gpg-interface-cleanup:
gpg-interface: check gpg signature creation status
sign_buffer: use pipe_command
verify_signed_buffer: use pipe_command
run-command: add pipe_command helper
verify_signed_buffer: use tempfile object
verify_signed_buffer: drop pbuf variable
gpg-interface: use child_process.args
"log --graph --format=" learned that "%>|(N)" specifies the width
relative to the terminal's left edge, not relative to the area to
draw text that is to the right of the ancestry-graph section. It
also now accepts negative N that means the column limit is relative
to the right border.
* nd/graph-width-padded:
pretty.c: support <direction>|(<negative number>) forms
pretty: pass graph width to pretty formatting for use in '%>|(N)'
"git repack" learned the "--keep-unreachable" option, which sends
loose unreachable objects to a pack instead of leaving them loose.
This helps heuristics based on the number of loose objects
(e.g. "gc --auto").
* jk/repack-keep-unreachable:
repack: extend --keep-unreachable to loose objects
repack: add --keep-unreachable option
repack: document --unpack-unreachable option
Teach format-patch and mailsplit (hence "am") how a line that
happens to begin with "From " in the e-mail message is quoted with
">", so that these lines can be restored to their original shape.
* ew/mboxrd-format-am:
am: support --patch-format=mboxrd
mailsplit: support unescaping mboxrd messages
pretty: support "mboxrd" output format
"upload-pack" allows a custom "git pack-objects" replacement when
responding to "fetch/clone" via the uploadpack.packObjectsHook.
* jk/upload-pack-hook:
upload-pack: provide a hook for running pack-objects
t1308: do not get fooled by symbolic links to the source tree
config: add a notion of "scope"
config: return configset value for current_config_ functions
config: set up config_source for command-line config
git_config_parse_parameter: refactor cleanup code
git_config_with_options: drop "found" counting
HTTPd tests learned to show the server error log to help diagnosing
a failing tests.
* nd/test-lib-httpd-show-error-log-in-verbose:
lib-httpd.sh: print error.log on error
Instead of taking advantage of a struct string_list that is
allocated with all NULs happens to be STRING_LIST_INIT_NODUP kind,
initialize them explicitly as such, to document their behaviour
better.
* jk/string-list-static-init:
use string_list initializer consistently
blame,shortlog: don't make local option variables static
interpret-trailers: don't duplicate option strings
parse_opt_string_list: stop allocating new strings
"git update-index --add --chmod=+x file" may be usable as an escape
hatch, but not a friendly thing to force for people who do need to
use it regularly. "git add --chmod=+x file" can be used instead.
* et/add-chmod-x:
add: add --chmod=+x / --chmod=-x options
"git reflog" stopped upon seeing an entry that denotes a branch
creation event (aka "unborn"), which made it appear as if the
reflog was truncated.
* sg/reflog-past-root:
reflog: continue walking the reflog past root commits
mktemp is not available on all platforms, so the test
'temporary filenames are used with mergetool.writeToTemp'
fails there.
This patch does not replace mktemp but just disables
the test that otherwise would fail.
mergetool checks itself before executing mktemp and
reports an error.
Signed-off-by: Armin Kunaschik <megabreit@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before this change,
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
would have the same effect as
$ echo "* text" >.gitattributes
$ git config core.eol crlf
Since the 'eol' attribute had higher priority than 'text=auto', this may
corrupt binary files and is not what most users expect to happen.
Make the 'eol' attribute to obey 'text=auto' and now
$ echo "* text=auto" >.gitattributes
$ echo "* eol=crlf" >>.gitattributes
behaves the same as
$ echo "* text=auto" >.gitattributes
$ git config core.eol crlf
In other words,
$ echo "* text=auto eol=crlf" >.gitattributes
has the same effect as
$ git config core.autocrlf true
and
$ echo "* text=auto eol=lf" >.gitattributes
has the same effect as
$ git config core.autocrlf input
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the "remote -> local" line, if either ref is a substring of the
other, the common part in the other string is replaced with "*". For
example
abc -> origin/abc
refs/pull/123/head -> pull/123
become
abc -> origin/*
refs/*/head -> pull/123
Activated with fetch.output=compact.
For the record, this output is not perfect. A single giant ref can
push all refs very far to the right and likely be wrapped around. We
may have a few options:
- exclude these long lines smarter
- break the line after "->", exclude it from column width calculation
- implement a new format, { -> origin/}foo, which makes the problem
go away at the cost of a bit harder to read
- reverse all the arrows so we have "* <- looong-ref", again still
hard to read.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do align "remote -> local" output by allocating 10 columns to
"remote". That produces aligned output only for short refs. An extra
pass is performed to find the longest remote ref name (that does not
produce a line longer than terminal width) to produce better aligned
output.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git-daemon exits, we expect it to be with the SIGTERM
we just sent it. If we see anything else, we'll complain.
But our check against exit code "143" is not portable. For
example:
$ ksh93 t5570-git-daemon.sh
[...]
error: git daemon exited with status: 271
We can fix this by using test_match_signal.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 8bf4bec (add "ok=sigpipe" to test_must_fail and use it to
fix flaky tests, 2015-11-27), test_must_fail learned to
recognize "141" as a sigpipe failure. However, testing for
a signal is more complicated than that; we should use
test_match_signal to implement more portable checking.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first test already uses this more portable construct
(that was where it was factored from initially), but the
later tests do a raw comparison against 141 to look for
SIGPIPE, which can fail on some shells and platforms.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In POSIX shells, a program which exits due to a signal
generally has an exit code of 128 plus the signal number.
However, ksh uses 256 plus the signal number. We've
accounted for that in t0005, but not in other tests. Let's
pull out the logic so we can use it elsewhere.
It would be nice for debugging if this additionally printed
errors to stderr, like our other test_* helpers. But we're
going to need to use it in other places besides the innards
of a test_expect block. So let's leave it as generic as
possible.
Note that we also leave the magic "3" for Windows out of the
generic helper. This is an artifact of the way we use
raise() to kill ourselves in test-sigchain.c, and will not
necessarily apply to all programs. So it's better to keep it
out of the helper, to reduce the chance of confusing it with
a real call to exit(3).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 4d5520053 (grep: make it clear i-t-a entries are
ignored, 2015-12-27) and adds an alternative fix to maintain the -L
--cached behavior.
4d5520053 caused 'git grep' to no longer find matches in new files in
the working tree where the corresponding index entry had the "intent to
add" bit set, despite the fact that these files are tracked.
The content in the index of a file for which the "intent to add" bit is
set is considered indeterminate and not empty. For most grep queries we
want these to behave the same, however for -L --cached (files without a
match) we don't want to respond positively for "intent to add" files as
their contents are indeterminate. This is in contrast to files with
empty contents in the index (no lines implies no matches for any grep
query expression) which should be reported in the output of a grep -L
--cached invocation.
Add tests to cover this case and a few related cases which previously
lacked coverage.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Charles Bailey <cbailey32@bloomberg.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use test_i18n* functions for testing text already marked for
translation.
Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the "grep -F -i" case, we can't use kws on icase search
outside ascii range, so we quote the string and pass it to regcomp as
a basic regexp and let regex engine deal with case sensitivity.
The new test is put in t7812 instead of t4209-log-pickaxe because
lib-gettext.sh might cause problems elsewhere, probably.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous change in this function, we add locale support for
single-byte encodings only. It looks like pcre only supports utf-* as
multibyte encodings, the others are left in the cold (which is
fine).
We need to enable PCRE_UTF8 so pcre can find character boundary
correctly. It's needed for case folding (when --ignore-case is used)
or '*', '+' or similar syntax is used.
The "has_non_ascii()" check is to be on the conservative side. If
there's non-ascii in the pattern, the searched content could still be
in utf-8, but we can treat it just like a byte stream and everything
should work. If we force utf-8 based on locale only and pcre validates
utf-8 and the file content is in non-utf8 encoding, things break.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: Plamen Totev <plamen.totev@abv.bg>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default tables are usually built with C locale and only suitable
for LANG=C or similar. This should make case insensitive search work
correctly for all single-byte charsets.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous commit, we can't use kws on icase search
outside ascii range. But we can't simply pass the pattern to
regcomp/pcre like the previous commit because it may contain regex
special characters, so we need to quote the regex first.
To avoid misquote traps that could lead to undefined behavior, we
always stick to basic regex engine in this case. We don't need fancy
features for grepping a literal string anyway.
basic_regex_quote_buf() assumes that if the pattern is in a multibyte
encoding, ascii chars must be unambiguously encoded as single
bytes. This is true at least for UTF-8. For others, let's wait until
people yell up. Chances are nobody uses multibyte, non utf-8 charsets
anymore.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits. To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.
This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't deemed
worth dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to fix
this.
Given that this is just around the corner (geologically
speaking, anyway), and because it's easy to fix, let's just
make it work. Unlike the previous fix for "size", where we
had to write an individual extended header for each file, we
can write one global header (since we have only one mtime
for the whole archive).
There's a slight bit of trickiness there. We may already be
writing a global header with a "comment" field for the
commit sha1. So we need to write our new field into the same
header. To do this, we push the decision of whether to write
such a header down into write_global_extended_header(),
which will now assemble the header as it sees fit, and will
return early if we have nothing to write (in practice, we'll
only have a large mtime if it comes from a commit, but this
makes it also work if you set your system clock ahead such
that time() returns a huge value).
Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.
After writing the extended header, we munge the timestamp in
the ustar headers to the maximum-allowable size. This is
wrong, but it's the least-wrong thing we can provide to a
tar implementation that doesn't understand pax headers (it's
also what GNU tar does).
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.
These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).
This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.
Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.
If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.
But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.
This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:
1. You have a file between 8GB and 64GB.
2. Your tar implementation _doesn't_ know about pax
extended headers.
3. Your tar implementation _does_ parse 12-byte sizes from
the ustar header without a delimiter.
It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. For values larger than this, we have to add pax
extended headers to specify the real data, and git does not
yet know how to do so.
Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.
We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:
1. being able to read pax headers
2. being able to handle huge sizes or mtimes
3. supporting a "t" format we can parse
So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:
dd if=/dev/zero seek=64G bs=1 count=1 of=huge
touch -d @68719476737 huge
tar cf - --format=pax |
head -c 2048
using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.
Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then took only
the first 2k of it. That should be enough to generate 2MB of
data before hitting an inflate error, which is plenty for us
to generate the tar header (and then die of SIGPIPE while
streaming the rest out).
The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.
Helped-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is sometimes useful to be able to read exactly N bytes from a
pipe. Doing this portably turns out to be surprisingly difficult
in shell scripts.
We want a solution that:
- is portable
- never reads more than N bytes due to buffering (which
would mean those bytes are not available to the next
program to read from the same pipe)
- handles partial reads by looping until N bytes are read
(or we see EOF)
- is resilient to stray signals giving us EINTR while
trying to read (even though we don't send them, things
like SIGWINCH could cause apparently-random failures)
Some possible solutions are:
- "head -c" is not portable, and implementations may
buffer (though GNU head does not)
- "read -N" is a bash-ism, and thus not portable
- "dd bs=$n count=1" does not handle partial reads. GNU dd
has iflags=fullblock, but that is not portable
- "dd bs=1 count=$n" fixes the partial read problem (all
reads are 1-byte, so there can be no partial response).
It does make a lot of write() calls, but for our tests
that's unlikely to matter. It's fairly portable. We
already use it in our tests, and it's unlikely that
implementations would screw up any of our criteria. The
most unknown one would be signal handling.
- perl can do a sysread() loop pretty easily. On my Linux
system, at least, it seems to restart the read() call
automatically. If that turns out not to be portable,
though, it would be easy for us to handle it.
That makes the perl solution the least bad (because we
conveniently omitted "length of code" as a criterion).
It's also what t9300 is currently using, so we can just pull
the implementation from there.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we abort an interactive rebase we do so by calling
`die_abort`, which cleans up after us by removing the rebase
state directory. If the user has requested to use the autostash
feature, though, the state directory may also contain a reference
to the autostash, which will now be deleted.
Fix the issue by trying to re-apply the autostash in `die_abort`.
This will also handle the case where the autostash does not apply
cleanly anymore by recording it in a user-visible stash.
Reported-by: Daniel Hahler <git@thequod.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
hashcpy with null_sha1 as the source is equivalent to hashclr. In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better. Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.
This transformation was implemented using the following semantic patch:
@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test merges an external tree in as a subtree, makes some commits
on top of it and splits it back out. In the process the added commits
are lost or the rebase aborts with an internal error. The tests are
marked to expect failure so that we don't forget to fix it.
Signed-off-by: David A. Greene <greened@obbligato.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-index --add --chmod=+x file" may be usable as an escape
hatch, but not a friendly thing to force for people who do need to
use it regularly. "git add --chmod=+x file" can be used instead.
* et/add-chmod-x:
add: add --chmod=+x / --chmod=-x options
"git reflog" stopped upon seeing an entry that denotes a branch
creation event (aka "unborn"), which made it appear as if the
reflog was truncated.
* sg/reflog-past-root:
reflog: continue walking the reflog past root commits
"git show -W" (extend hunks to cover the entire function, delimited
by lines that match the "funcname" pattern) used to show the entire
file when a change added an entire function at the end of the file,
which has been fixed.
* rs/xdiff-hunk-with-func-line:
xdiff: fix merging of appended hunk with -W
grep: -W: don't extend context to trailing empty lines
t7810: add test for grep -W and trailing empty context lines
xdiff: don't trim common tail with -W
xdiff: -W: don't include common trailing empty lines in context
xdiff: ignore empty lines before added functions with -W
xdiff: handle appended chunks better with -W
xdiff: factor out match_func_rec()
t4051: rewrite, add more tests
"git rev-list --count" whose walk-length is limited with "-n"
option did not work well with the counting optimized to look at the
bitmap index.
* jk/rev-list-count-with-bitmap:
rev-list: disable bitmaps when "-n" is used with listing objects
rev-list: "adjust" results of "--count --use-bitmap-index -n"
The commands in `git log` family take %C(auto) in a custom format
string. This unconditionally turned the color on, ignoring
--no-color or with --color=auto when the output is not connected to
a tty; this was corrected to make the format truly behave as
"auto".
* et/pretty-format-c-auto:
format_commit_message: honor `color=auto` for `%C(auto)`
When we detect the pattern is just a literal string, we avoid heavy
regex engine and use fast substring search implemented in kwsset.c.
But kws uses git-ctype which is locale-independent so it does not know
how to fold case properly outside ascii range. Let regcomp or pcre
take care of this case instead. Slower, but accurate.
Noticed-by: Plamen Totev <plamen.totev@abv.bg>
Helped-by: René Scharfe <l.s.r@web.de>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in preparation to turn test-regex into some generic regex
testing command.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test script t4202-log.sh is already pretty long, and it is a good
idea to test --output with a more obscure option, anyway. So let's
test it in conjunction with line-log.
The most important part of this test, of course, is to ensure that the
file is not closed after writing the diff, but only at the very end
of the log output. That is the entire reason why the test tries to
generate a log that covers more than one commit.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Thanks to the diff option parsing, we already know about this option.
We just have to make use of it.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users may want to always use "--show-signature" while using git-log and
related commands.
When log.showSignature is set to true, git-log and related commands will
behave as if "--show-signature" was given to them.
Note that this config variable is meant to affect git-log, git-show,
git-whatchanged and git-reflog. Other commands like git-format-patch,
git-rev-list are not to be affected by this config variable.
Signed-off-by: Mehul Jain <mehul.jain2029@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>