When we want to get the list of modified files, we first
expand any user-provided pathspecs with "ls-files", and then
feed the resulting list of paths as arguments to
"diff-index" and "diff-files". If your pathspec expands into
a large number of paths, you may run into one of two
problems:
1. The OS may complain about the size of the argument
list, and refuse to run. For example:
$ (ulimit -s 128 && git add -p drivers)
Can't exec "git": Argument list too long at .../git-add--interactive line 177.
Died at .../git-add--interactive line 177.
That's on the linux.git repository, which has about 20K
files in the "drivers" directory (none of them modified
in this case). The "ulimit -s" trick is necessary to
show the problem on Linux even for such a gigantic set
of paths. Other operating systems have much smaller
limits (e.g., a real-world case was seen with only 5K
files on OS X).
2. Even when it does work, it's really slow. The pathspec
code is not optimized for huge numbers of paths. Here's
the same case without the ulimit:
$ time git add -p drivers
No changes.
real 0m16.559s
user 0m53.140s
sys 0m0.220s
We can improve this by skipping "ls-files" completely, and
just feeding the original pathspecs to the diff commands.
This solution was discussed in 2010:
http://public-inbox.org/git/20100105041438.GB12574@coredump.intra.peff.net/
but at the time the diff code's pathspecs were more
primitive than those used by ls-files (e.g., they did not
support globs). Making the change would have caused a
user-visible regression, so we didn't.
Since then, the pathspec code has been unified, and the diff
commands natively understand pathspecs like '*.c'.
This patch implements that solution. That skips the
argument-list limits, and the result runs much faster:
$ time git add -p drivers
No changes.
real 0m0.149s
user 0m0.116s
sys 0m0.080s
There are two new tests. The first just exercises the
globbing behavior to confirm that we are not causing a
regression there. The second checks the actual argument
behavior using GIT_TRACE. We _could_ do it with the "ulimit
-s" trick, as above. But that would mean the test could only
run where "ulimit -s" works. And tests of that sort are
expensive, because we have to come up with enough files to
actually bust the limit (we can't just shrink the "128" down
infinitely, since it is also the in-program stack size).
Finally, two caveats and possibilities for future work:
a. This fixes one argument-list expansion, but there may
be others. In fact, it's very likely that if you run
"git add -i" and select a large number of modified
files that the script would try to feed them all to a
single git command.
In practice this is probably fine. The real issue here
is that the argument list was growing with the _total_
number of files, not the number of modified or selected
files.
b. If the repository contains filenames with literal wildcard
characters (e.g., "foo*"), the original code expanded
them via "ls-files" and then fed those wildcard names
to "diff-index", which would have treated them as
wildcards. This was a bug, which is now fixed (though
unless you really go through some contortions with
":(literal)", it's likely that your original pathspec
would match whatever the accidentally-expanded wildcard
would anyway).
So this takes us one step closer to working correctly
with files whose names contain wildcard characters, but
it's likely that others remain (e.g., if "git add -i"
feeds the selected paths to "git add").
Reported-by: Wincent Colaiuta <win@wincent.com>
Reported-by: Mislav Marohnić <mislav.marohnic@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git status provides a porcelain mode for porcelain writers with a
supposedly stable (plumbing) interface.
7a76c28ff2 ("status: disable translation when --porcelain is used", 2014-03-20)
made sure that ahead/behind info is not translated (i.e. is stable).
Make sure that the remaining two strings (initial commit, detached head)
are stable, too.
These changes are for the v1 porcelain interface. While we do have a perfectly
stable v2 porcelain interface now, some tools (such as
powerline-gitstatus) are written against v1 and profit from fixing v1
without any changes on their side.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In our own .gitattributes file we have attributes such as:
*.[ch] whitespace=indent,trail,space
When querying for attributes we want to be able to ask for the exact
value, i.e.
git ls-files :(attr:whitespace=indent,trail,space)
should work, but the commas are used in the attr magic to introduce
the next attr, such that this query currently fails with
fatal: Invalid pathspec magic 'trail' in ':(attr:whitespace=indent,trail,space)'
This change allows escaping characters by a backslash, such that the query
git ls-files :(attr:whitespace=indent\,trail\,space)
will match all path that have the value "indent,trail,space" for the
whitespace attribute. To accomplish this, we need to modify two places.
First `parse_long_magic` needs to not stop early upon seeing a comma or
closing paren that is escaped. As a second step we need to remove any
escaping from the attr value.
Based on a patch by Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.
Based on a patch by Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We already have xdg_config_home to format paths relative to
XDG_CONFIG_HOME. Let's provide a similar function xdg_cache_home to do
the same for paths relative to XDG_CACHE_HOME.
Signed-off-by: Devin Lehmacher <lehmacdj@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we parse a remote alternates (or http-alternates), we
expect relative lines like:
../../foo.git/objects
which we convert into "$URL/../foo.git/" (and then use that
as a base for fetching more objects).
But if the remote feeds us nonsense like just:
../
we will try to blindly strip the last 7 characters, assuming
they contain the string "objects". Since we don't _have_ 7
characters at all, this results in feeding a small negative
value to strbuf_add(), which converts it to a size_t,
resulting in a big positive value. This should consistently
fail (since we can't generall allocate the max size_t minus
7 bytes), so there shouldn't be any security implications.
Let's fix this by using strbuf_strip_suffix() to drop the
characters we want. If they're not present, we'll ignore the
alternate (in theory we could use it as-is, but the rest of
the http-walker code unconditionally tacks "objects/" back
on, so it is it not prepared to handle such a case).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff --quiet" relies on the size field in diff_filespec to be
correctly populated, but diff_populate_filespec() helper function
made an incorrect short-cut when asked only to populate the size
field for paths that need to go through convert_to_git() (e.g. CRLF
conversion).
* jc/diff-populate-filespec-size-only-fix:
diff: do not short-cut CHECK_SIZE_ONLY check in diff_populate_filespec()
The command-line parsing of "git log -L" copied internal data
structures using incorrect size on ILP32 systems.
* vn/line-log-memcpy-size-fix:
line-log: use COPY_ARRAY to fix mis-sized memcpy
The code to parse "git log -L..." command line was buggy when there
are many ranges specified with -L; overrun of the allocated buffer
has been fixed.
* ax/line-log-range-merge-fix:
line-log.c: prevent crash during union of too many ranges
There is no need for Python only to give a few messages to the
standard error stream, but we somehow did.
* ss/remote-bzr-hg-placeholder-wo-python:
contrib: git-remote-{bzr,hg} placeholders don't need Python
Git v2.12 was shipped with an embarrassing breakage where various
operations that verify paths given from the user stopped dying when
seeing an issue, and instead later triggering segfault.
* js/realpath-pathdup-fix:
real_pathdup(): fix callsites that wanted it to die on error
t1501: demonstrate NULL pointer access with invalid GIT_WORK_TREE
The patch subcommand of "git add -i" was meant to have paths
selection prompt just like other subcommand, unlike "git add -p"
directly jumps to hunk selection. Recently, this was broken and
"add -i" lost the paths selection dialog, but it now has been
fixed.
* jk/add-i-patch-do-prompt:
add--interactive: fix missing file prompt for patch mode with "-i"
All callers of add_blame_entry() allocate and copy the second argument.
Let the function do it for them, reducing code duplication.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes a set of repositories want to share configuration settings
among themselves that are distinct from other such sets of repositories.
A user may work on two projects, each of which have multiple
repositories, and use one user.email for one project while using another
for the other.
Setting $GIT_DIR/.config works, but if the penalty of forgetting to
update $GIT_DIR/.config is high (especially when you end up cloning
often), it may not be the best way to go. Having the settings in
~/.gitconfig, which would work for just one set of repositories, would
not well in such a situation. Having separate ${HOME}s may add more
problems than it solves.
Extend the include.path mechanism that lets a config file include
another config file, so that the inclusion can be done only when some
conditions hold. Then ~/.gitconfig can say "include config-project-A
only when working on project-A" for each project A the user works on.
In this patch, the only supported grouping is based on $GIT_DIR (in
absolute path), so you would need to group repositories by directory, or
something like that to take advantage of it.
We already have include.path for unconditional includes. This patch goes
with includeIf.<condition>.path to make it clearer that a condition is
required. The new config has the same backward compatibility approach as
include.path: older git versions that don't understand includeIf will
simply ignore them.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The phrasing in this paragraph may give an impression that you can only
use it once. Rephrase it a bit.
Helped-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This test just checks that old clients can clone and fetch
from a newer git-daemon. The opposite should also be true,
but it's hard to test ancient versions of git-daemon because
they lack basic options like "--listen".
Note that we have to make a slight tweak to the
lib-git-daemon helper from the regular tests, so that it
starts the daemon with our correct git.a version.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current test suite is good at letting you test a
particular version of Git. But it's not very good at letting
you test _two_ versions and seeing how they interact (e.g.,
one cloning from the other).
This commit adds a test harness that will build two
arbitrary versions of git and make it easy to call them from
inside your tests. See the README and the example script for
details.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git stash save" takes a pathspec so that the local changes can be
stashed away only partially.
* tg/stash-push:
stash: allow pathspecs in the no verb form
stash: use stash_push for no verb form
stash: teach 'push' (and 'create_stash') to honor pathspec
stash: refactor stash_create
stash: add test for the create command line arguments
stash: introduce push verb
When "git submodule init" decides that the submodule in the working
tree is its upstream, it now gives a warning as it is not a very
common setup.
* sb/submodule-init-url-selection:
submodule init: warn about falling back to a local path
When a redirected http transport gets an error during the
redirected request, we ignored the error we got from the server,
and ended up giving a not-so-useful error message.
* jt/http-base-url-update-upon-redirect:
http: attempt updating base URL only if no error
Reduce authentication round-trip over HTTP when the server supports
just a single authentication method.
* jk/http-auth:
http: add an "auto" mode for http.emptyauth
http: restrict auth methods to what the server advertises
"Cc:" on the trailer part does not have to conform to RFC strictly,
unlike in the e-mail header. "git send-email" has been updated to
ignore anything after '>' when picking addresses, to allow non-address
cruft like " # stable 4.4" after the address.
* jh/send-email-one-cc:
send-email: only allow one address per body tag
An helper function to make it easier to append the result from
real_path() to a strbuf has been added.
* rs/strbuf-add-real-path:
strbuf: add strbuf_add_real_path()
cocci: use ALLOC_ARRAY
A leak in a codepath to read from a packed object in (rare) cases
has been plugged.
* rs/sha1-file-plug-fallback-base-leak:
sha1_file: release fallback base's memory in unpack_entry()
The code that parses header fields in the commit object has been
updated for (micro)performance and code hygiene.
* rs/commit-parsing-optim:
commit: don't check for space twice when looking for header
commit: be more precise when searching for headers
The "parse_config_key()" API function has been cleaned up.
* jk/parse-config-key-cleanup:
parse_hide_refs_config: tell parse_config_key we don't want a subsection
parse_config_key: allow matching single-level config
parse_config_key: use skip_prefix instead of starts_with
"git upload-pack", which is a counter-part of "git fetch", did not
report a request for a ref that was not advertised as invalid.
This is generally not a problem (because "git fetch" will stop
before making such a request), but is the right thing to do.
* jt/upload-pack-error-report:
upload-pack: report "not our ref" to client
user.email that consists of only cruft chars should consistently
error out, but didn't.
* jk/ident-empty:
ident: do not ignore empty config name/email
ident: reject all-crud ident name
ident: handle NULL email when complaining of empty name
ident: mark error messages for translation
The code to parse "git -c VAR=VAL cmd" and set configuration
variable for the duration of cmd had two small bugs, which have
been fixed.
* jc/config-case-cmdline-take-2:
config: use git_config_parse_key() in git_config_parse_parameter()
config: move a few helper functions up
The algorithm which powers "tag --contains" uses the
TMP_MARK and UNINTERESTING bits, but never cleans up after
itself. As a result, stale UNINTERESTING bits may impact
later traversals (like "--merged").
We could fix this by clearing the bits after we're done with
the --contains traversal. That would be enough to fix the
existing problem, but it leaves future developers in a bad
spot: they cannot add other traversals that operate
simultaneously with --contains (e.g., if you wanted to add
"--no-contains" and use both filters at the same time).
Instead, we can use a commit slab to store our cached
results, which will store the bits outside of the commit
structs entirely. This adds an extra level of indirection,
but in my tests (running "git tag --contains HEAD" on
linux.git), there was no measurable slowdown.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tag-contains algorithm quietly returns "does not
contain" when parse_commit() fails. But a parse failure is
an indication that the repository is corrupt. We should die
loudly rather than producing a bogus result.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit cbc60b672 (git tag --contains: avoid stack overflow,
2014-04-24) adapted the -1/0/1 contains status into a
tri-state enum. However, some of the code still used the
numeric values, or assumed that no/yes correspond to C's
boolean true/false.
Let's switch to using the symbolic values everywhere, which
will make it easier to change them.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is an implementation detail of how filter_refs() works,
and does not need to be exposed to the outside world. This
will become more important in future patches as we add new
private data types to it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the "branch --list" command was converted to use the --format
facility from the ref-filter API, we forgot to honor the --abbrev
setting in the default output format and instead used a hardcoded
"7".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some situations it is useful to know if the given repository
is a submodule of another repository.
Add the flag --show-superproject-working-tree to git-rev-parse
to make it easy to find out if there is a superproject. When no
superproject exists, the output will be empty.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 4ac9006f83 (real_path: have callers use real_pathdup and
strbuf_realpath, 2016-12-12), we changed the xstrdup(real_path())
pattern to use real_pathdup() directly.
The problem with this change is that real_path() calls
strbuf_realpath() with die_on_error = 1 while real_pathdup() calls
it with die_on_error = 0. Meaning that in cases where real_path()
causes Git to die() with an error message, real_pathdup() is silent
and returns NULL instead.
The callers, however, are ill-prepared for that change, as they expect
the return value to be non-NULL (and otherwise the function died
with an appropriate error message).
Fix this by extending real_pathdup()'s signature to accept the
die_on_error flag and simply pass it through to strbuf_realpath(),
and then adjust all callers after a careful audit whether they would
handle NULLs well.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When GIT_WORK_TREE does not specify a valid path, we should error
out, instead of crashing.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is okay in practice to test for forward slashes in the output of
getcwd(), because we go out of our way to convert backslashes to forward
slashes in getcwd()'s output on Windows.
Still, the correct way to test for a dir separator is by using the
helper function we introduced for that very purpose. It also serves as a
good documentation what the code tries to do (not "how").
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>