Commit Graph

33090 Commits

Author SHA1 Message Date
Nguyễn Thái Ngọc Duy
0940a76db6 pretty: get the correct encoding for --pretty:format=%e
parse_commit_header() provides the commit encoding for '%e' and it
reads it from the re-encoded message, which contains the new encoding,
not the original one in the commit object. This never happens because
--pretty=format:xxx never respects i18n.logoutputencoding. But that's
a different story.

Get the commit encoding from logmsg_reencode() instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Nguyễn Thái Ngọc Duy
5a10d23658 pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 16:28:27 -07:00
Junio C Hamano
1468a58393 Update draft release notes to 1.8.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 12:03:09 -07:00
Junio C Hamano
c5926ac377 Merge branch 'maint'
* maint:
  remote-hg: fix commit messages
2013-04-18 12:03:01 -07:00
Junio C Hamano
ded56521bd Merge branch 'jk/test-trash'
Fix longstanding issues with the test harness when used with --root=<there>
option.

* jk/test-trash:
  t/test-lib.sh: drop "$test" variable
  t/test-lib.sh: fix TRASH_DIRECTORY handling
2013-04-18 11:49:45 -07:00
Junio C Hamano
da89885c6d Merge branch 'th/t9903-symlinked-workdir'
* th/t9903-symlinked-workdir:
  t9903: Don't fail when run from path accessed through symlink
2013-04-18 11:49:42 -07:00
Junio C Hamano
e7e656c09a Merge branch 'jk/merge-tree-added-identically'
The resolution of some corner cases by "git merge-tree" were
inconsistent between top-of-the-tree and in a subdirectory.

* jk/merge-tree-added-identically:
  merge-tree: don't print entries that match "local"
2013-04-18 11:49:31 -07:00
Junio C Hamano
77354d8cdc Merge branch 'jk/http-dumb-namespaces'
Allow smart-capable HTTP servers to be restricted via the
GIT_NAMESPACE mechanism when talking with commit-walker clients
(they already do so when talking with smart HTTP clients).

* jk/http-dumb-namespaces:
  http-backend: respect GIT_NAMESPACE with dumb clients
2013-04-18 11:49:21 -07:00
Junio C Hamano
1931f6d6ea Merge branch 'rs/empty-archive'
Implementations of "tar" of BSD descend have found to have trouble
with reading an otherwise empty tar archive with pax headers and
causes an unnecessary test failure.

* rs/empty-archive:
  t5004: fix issue with empty archive test and bsdtar
2013-04-18 11:49:17 -07:00
Junio C Hamano
288aa7534a Merge branch 'fc/send-email-annotate'
Allows format-patch --cover-letter to be configurable; the most
notable is the "auto" mode to create cover-letter only for multi
patch series.

* fc/send-email-annotate:
  rebase-am: explicitly disable cover-letter
  format-patch: trivial cleanups
  format-patch: add format.coverLetter configuration variable
  log: update to OPT_BOOL
  format-patch: refactor branch name calculation
  format-patch: improve head calculation for cover-letter
  send-email: make annotate configurable
2013-04-18 11:49:11 -07:00
Junio C Hamano
54a3c67375 Merge branch 'jc/push-2.0-default-to-simple' (early part)
Adjust our tests for upcoming migration of the default value for the
"push.default" configuration variable to "simple" from "mixed".

* 'jc/push-2.0-default-to-simple' (early part):
  t5570: do not assume the "matching" push is the default
  t5551: do not assume the "matching" push is the default
  t5550: do not assume the "matching" push is the default
  t9401: do not assume the "matching" push is the default
  t9400: do not assume the "matching" push is the default
  t7406: do not assume the "matching" push is the default
  t5531: do not assume the "matching" push is the default
  t5519: do not assume the "matching" push is the default
  t5517: do not assume the "matching" push is the default
  t5516: do not assume the "matching" push is the default
  t5505: do not assume the "matching" push is the default
  t5404: do not assume the "matching" push is the default
2013-04-18 11:47:59 -07:00
Junio C Hamano
8dd28584a5 Merge branch 'jk/daemon-user-doc'
Document where the configuration is read by the git-daemon when its --user
option is used.

* jk/daemon-user-doc:
  doc: clarify that "git daemon --user=<user>" option does not export HOME=~user
2013-04-18 11:47:23 -07:00
Junio C Hamano
5734fa4608 Merge branch 'fc/completion'
In addition to a user visible change to offer more options to cherry-pick,
generally cleans up and simplifies the code.

* fc/completion:
  completion: small optimization
  completion: inline __gitcomp_1 to its sole callsite
  completion: get rid of compgen
  completion: add __gitcomp_nl tests
  completion: add new __gitcompadd helper
  completion: get rid of empty COMPREPLY assignments
  completion: trivial test improvement
  completion: add more cherry-pick options
2013-04-18 11:46:42 -07:00
Junio C Hamano
bd1184c6de Merge branch 'kb/co-orphan-suggestion-short-sha1'
Update the informational message when "git checkout" leaves the
detached head state.

* kb/co-orphan-suggestion-short-sha1:
  checkout: abbreviate hash in suggest_reattach
2013-04-18 11:46:33 -07:00
Junio C Hamano
cd797c7e6b Merge branch 'jc/detached-head-doc'
* jc/detached-head-doc:
  glossary: extend "detached HEAD" description
2013-04-18 11:46:29 -07:00
Junio C Hamano
193e28f050 Merge branch 'tr/packed-object-info-wo-recursion'
Attempts to reduce the stack footprint of sha1_object_info()
and unpack_entry() codepaths.

* tr/packed-object-info-wo-recursion:
  sha1_file: remove recursion in unpack_entry
  Refactor parts of in_delta_base_cache/cache_or_unpack_entry
  sha1_file: remove recursion in packed_object_info
2013-04-18 11:46:23 -07:00
Junio C Hamano
80292f2104 Merge branch 'jk/http-error-messages'
A regression fix for the recently graduated topic.

* jk/http-error-messages:
  http: set curl FAILONERROR each time we select a handle
2013-04-18 11:42:08 -07:00
Johannes Sixt
16a794de88 t6200: avoid path mangling issue on Windows
MSYS bash interprets the slash in the argument core.commentchar="/"
as root directory and mangles it into a Windows style path. Use a
different core.commentchar to dodge the issue.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 09:56:37 -07:00
Felipe Contreras
845241d544 remote-hg: fix commit messages
git fast-import expects an extra newline after the commit message data,
but we are adding it only on hg-git compat mode, which is why the
bidirectionality tests pass.

We should add it unconditionally.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 23:41:25 -07:00
Junio C Hamano
d226b14d47 git add: rework the logic to warn "git add <pathspec>..." default change
The earlier logic to warn against "git add subdir" that is run
without "-A" or "--no-all" was only to check any <pathspec> given
exactly spells a directory name that (still) exists on the
filesystem.  This had number of problems:

 * "git add '*dir'" (note that the wildcard is hidden from the
   shell) would not trigger the warning.

 * "git add '*.py'" would behave differently between the current
   version of Git and Git 2.0 for the same reason as "subdir", but
   would not trigger the warning.

 * "git add dir" for a submodule "dir" would just update the index
   entry for the submodule "dir" without ever recursing into it, and
   use of "-A" or "--no-all" would matter.  But the logic only
   checks the directory-ness of "dir" and gives an unnecessary
   warning.

Rework the logic to detect the case where the behaviour will be
different in Git 2.0, and issue a warning only when it matters.
Even with the code before this warning, "git add subdir" will have
to traverse the directory in order to find _new_ files the index
does not know about _anyway_, so we can do this check without adding
an extra pass to find if <pathspec> matches any removed file.

This essentially updates the "add_files_to_cache()" public API to
"update_files_in_cache()" API that is internal to "git add", because
with the "--all" option, the function is no longer about "adding"
paths to the cache, but is also used to remove them.

There are other callers of the former from "checkout" (used when
"checkout -m" prepares the temporary tree that represents the local
modifications to be merged) and "commit" ("commit --include" that
picks up local changes in addition to what is in the index).  Since
ADD_CACHE_IGNORE_ERRORS (aka "--no-all") is not used by either of
them, once dust settles after Git 2.0 and the warning becomes
unnecessary, we may want to unify these two functions again.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 17:42:48 -07:00
Jonathan Nieder
1a39b72787 gitweb/INSTALL: GITWEB_CONFIG_SYSTEM is for backward compatibility
Highlight that CONFIG_SYSTEM and /etc/gitweb.conf are meant to be
the fallback configuration file in BUGS section of gitweb.conf
documentation.  This will hopefully help people who expect them to
be a common default, which unfortunately came later in the history.
2013-04-17 15:18:12 -07:00
René Scharfe
de5abe9fe9 blame: handle broken commit headers gracefully
split_ident_line() can leave us with the pointers date_begin, date_end,
tz_begin and tz_end all set to NULL.  Check them before use and supply
the same fallback values as in the case of a negative return code from
split_ident_line().

The "(unknown)" is not actually shown in the output, though, because it
will be converted to a number (zero) eventually.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:50:45 -07:00
René Scharfe
9dbe7c3d7f pretty: handle broken commit headers gracefully
Centralize the parsing of the date and time zone strings in the new
helper function show_ident_date() and make sure it checks the pointers
provided by split_ident_line() for NULL before use.

Reported-by: Ivan Lyapunov <dront78@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:50:36 -07:00
Jeff King
9cfa5126a0 cat-file: print tags raw for "cat-file -p"
When "cat-file -p" prints commits, it shows them in their
raw format, since git's format is already human-readable.
For tags, however, we print the whole thing raw except for
one thing: we convert the timestamp on the tagger line into a
human-readable date.

This dates all the way back to a0f15fa (Pretty-print tagger
dates, 2006-03-01). At that time there was no other way to
pretty-print a tag.  These days, however, neither of those
matters much. The normal way to pretty-print a tag is with
"git show", which is much more flexible than "cat-file -p".

Commit a0f15fa also built "verify-tag --verbose" (and
subsequently "tag -v") around the "cat-file -p" output.
However, that behavior was lost in commit 62e09ce (Make git
tag a builtin, 2007-07-20), and we went back to printing
the raw tag contents. Nobody seems to have noticed the bug
since then (and it is arguably a saner behavior anyway, as
it shows the actual bytes for which we verified the
signature).

Let's drop the tagger-date formatting for "cat-file -p". It
makes us more consistent with cat-file's commit
pretty-printer, and as a bonus, we can drop the hand-rolled
tag parsing code in cat-file (which happened to behave
inconsistently with the tag pretty-printing code elsewhere).

This is a change of output format, so it's possible that
some callers could considered this a regression. However,
the original behavior was arguably a bug (due to the
inconsistency with commits), likely nobody was relying on it
(even we do not use it ourselves these days), and anyone
relying on the "-p" pretty-printer should be able to expect
a change in the output format (i.e., while "cat-file" is
plumbing, the output format of "-p" was never guaranteed to
be stable).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:48:45 -07:00
Lukas Fleischer
4982fd78f6 convert.c: remove duplicate code
The has_cr_in_index() function is an almost 1:1 copy of
read_blob_data_from_index() with some additions.  Use the
latter instead of using copy-pasted code.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:52:33 -07:00
Lukas Fleischer
ff36682505 read_blob_data_from_index(): optionally return the size of blob data
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:51:47 -07:00
Lukas Fleischer
29fb37b272 attr.c: extract read_index_data() as read_blob_data_from_index()
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.

This allows for reusing the function in convert.c later.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:49:11 -07:00
Junio C Hamano
dcd8c09e4d Merge branch 'maint'
* maint:
  help.c: add a compatibility comment to cmd_version()
2013-04-16 15:14:44 -07:00
Jeff King
1ece66bc9e run-command: use thread-aware die_is_recursing routine
If we die from an async thread, we do not actually exit the
program, but just kill the thread. This confuses the static
counter in usage.c's default die_is_recursing function; it
updates the counter once for the thread death, and then when
the main program calls die() itself, it erroneously thinks
we are recursing. The end result is that we print "recursion
detected in die handler" instead of the real error in such a
case (the easiest way to trigger this is having a remote
connection hang up while running a sideband demultiplexer).

This patch solves it by using a per-thread counter when the
async_die function is installed; we detect recursion in each
thread (including the main one), but they do not step on
each other's toes.

Other threaded code does not need to worry about this, as
they do not install specialized die handlers; they just let
a die() from a sub-thread take down the whole program.

Since we are overriding the default recursion-check
function, there is an interesting corner case that is not a
problem, but bears some explanation. Imagine the main thread
calls die(), and then in the die_routine starts an async
call. We will switch to using thread-local storage, which
starts at 0, for the main thread's counter, even though
the original counter was actually at 1. That's OK, though,
for two reasons:

  1. It would miss only the first level of recursion, and
     would still find recursive failures inside the async
     helper.

  2. We do not currently and are not likely to start doing
     anything as heavyweight as starting an async routine
     from within a die routine or helper function.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 15:02:48 -07:00
Jeff King
c19a490e37 usage: allow pluggable die-recursion checks
When any git code calls die or die_errno, we use a counter
to detect recursion into the die functions from any of the
helper functions. However, such a simple counter is not good
enough for threaded programs, which may call die from a
sub-thread, killing only the sub-thread (but incrementing
the counter for everyone).

Rather than try to deal with threads ourselves here, let's
just allow callers to plug in their own recursion-detection
function. This is similar to how we handle the die routine
(the caller plugs in a die routine which may kill only the
sub-thread).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 15:02:46 -07:00
David Aguilar
f2de0b9793 help.c: add a compatibility comment to cmd_version()
External projects have been known to parse the output of
"git version".  Help prevent future authors from changing
its format by adding a comment to its implementation.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 15:01:30 -07:00
Jonathan Nieder
95f31e9ab5 convert: The native line-ending is \r\n on MinGW
If you try this:

 1. Install Git for Windows (from the msysgit project)

 2. Put

	[core]
		autocrlf = false
		eol = native

    in your .gitconfig.

 3. Clone a project with

	*.txt text

    in its .gitattributes.

Then with current git, any text files checked out have LF line
endings, instead of the expected CRLF.

Cc: Johannes Schindelin <johannes.schindelin@gmx.de>
Cc: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 11:18:35 -07:00
Thomas Rast
70d26c6e76 read_revisions_from_stdin: make copies for handle_revision_arg
read_revisions_from_stdin() has passed pointers to its read buffer
down to handle_revision_arg() since its inception way back in 42cabc3
(Teach rev-list an option to read revs from the standard input.,
2006-09-05).  Even back then, this was a bug: through
add_pending_object, the argument was recorded in the object_array's
'name' field.

Fix it by making a copy whenever read_revisions_from_stdin() passes an
argument down the callchain.  The other caller runs handle_revision_arg()
on argv[], where it would be redundant to make a copy.

Signed-off-by: Thomas Rast <trast@inf.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 11:17:48 -07:00
Jeff King
b793acf14c http: set curl FAILONERROR each time we select a handle
Because we reuse curl handles for multiple requests, the
setup of a handle happens in two stages: stable, global
setup and per-request setup. The lifecycle of a handle is
something like:

  1. get_curl_handle; do basic global setup that will last
     through the whole program (e.g., setting the user
     agent, ssl options, etc)

  2. get_active_slot; set up a per-request baseline (e.g.,
     clearing the read/write functions, making it a GET
     request, etc)

  3. perform the request with curl_*_perform functions

  4. goto step 2 to perform another request

Breaking it down this way means we can avoid doing global
setup from step (1) repeatedly, but we still finish step (2)
with a predictable baseline setup that callers can rely on.

Until commit 6d052d7 (http: add HTTP_KEEP_ERROR option,
2013-04-05), setting curl's FAILONERROR option was a global
setup; we never changed it. However, 6d052d7 introduced an
option where some requests might turn off FAILONERROR. Later
requests using the same handle would have the option
unexpectedly turned off, which meant they would not notice
http failures at all.

This could easily be seen in the test-suite for the
"half-auth" cases of t5541 and t5551. The initial requests
turned off FAILONERROR, which meant it was erroneously off
for the rpc POST. That worked fine for a successful request,
but meant that we failed to react properly to the HTTP 401
(instead, we treated whatever the server handed us as a
successful message body).

The solution is simple: now that FAILONERROR is a
per-request setting, we move it to get_active_slot to make
sure it is reset for each request.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 10:13:46 -07:00
Jiang Xin
bc554df8c9 i18n: branch: mark strings for translation
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Reviewed-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 21:05:21 -07:00
Felipe Contreras
afad200558 remote-bzr: fix prefix of tags
In the current transport-helper code, refs without namespaced refspecs don't
work correctly, so let's always use them.

Some people reported issues with 'git clone --mirror', and this fixes them, as
well as possibly others.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 16:08:40 -07:00
Junio C Hamano
aec3f77941 Update draft release notes to 1.8.3
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:45:15 -07:00
Junio C Hamano
f678d9b592 Merge branch 'jk/diff-graph-submodule-summary'
Make "git diff --graph" work better with submodule log output.

* jk/diff-graph-submodule-summary:
  submodule: print graph output next to submodule log
2013-04-15 12:41:01 -07:00
Junio C Hamano
825ccfc23c Merge branch 'jk/diff-algo-finishing-touches'
"git diff --diff-algorithm algo" is also understood as "git diff
--diff-algorithm=algo".

* jk/diff-algo-finishing-touches:
  diff: allow unstuck arguments with --diff-algorithm
  git-merge(1): document diff-algorithm option to merge-recursive
2013-04-15 12:40:58 -07:00
Junio C Hamano
948cf4f5e5 Merge branch 'rt/commentchar-fmt-merge-msg'
The new core.commentchar configuration was not applied to a few
places.

* rt/commentchar-fmt-merge-msg:
  fmt-merge-msg: use core.commentchar in tag signatures completely
  fmt-merge-msg: respect core.commentchar in people credits
2013-04-15 12:40:56 -07:00
Junio C Hamano
e1a3f17e9d Merge branch 'lf/bundle-with-tip-wo-message'
"git bundle" did not like a bundle created using a commit without
any message as its one of the prerequistes.

* lf/bundle-with-tip-wo-message:
  bundle: Accept prerequisites without commit messages
2013-04-15 12:40:51 -07:00
Junio C Hamano
51ff04baad Merge branch 'jk/show-branch-strbuf'
"git show-branch" was not prepared to show a very long run of
ancestor operators e.g. foobar^2~2^2^2^2...^2~4 correctly.

* jk/show-branch-strbuf:
  show-branch: use strbuf instead of static buffer
2013-04-15 12:40:49 -07:00
Junio C Hamano
f4f6a75329 Merge branch 'jk/http-error-messages'
Improve error reporting from the http transfer clients.

* jk/http-error-messages:
  http: drop http_error function
  remote-curl: die directly with http error messages
  http: re-word http error message
  http: simplify http_error helper function
  remote-curl: consistently report repo url for http errors
  remote-curl: always show friendlier 404 message
  remote-curl: let servers override http 404 advice
  remote-curl: show server content on http errors
  http: add HTTP_KEEP_ERROR option
2013-04-15 12:40:46 -07:00
Junio C Hamano
d809d050ff Merge branch 'tr/perl-keep-stderr-open'
Closing (not redirecting to /dev/null) the standard error stream is
not a very smart thing to do.  Later open may return file
descriptor #2 for unrelated purpose, and error reporting code may
write into them.

* tr/perl-keep-stderr-open:
  t9700: do not close STDERR
  perl: redirect stderr to /dev/null instead of closing
2013-04-15 12:40:41 -07:00
Karsten Blees
0aaf62b6e0 dir.c: git-status --ignored: don't scan the work tree twice
'git-status --ignored' still scans the work tree twice to collect
untracked and ignored files, respectively.

fill_directory / read_directory already supports collecting untracked and
ignored files in a single directory scan. However, the DIR_COLLECT_IGNORED
flag to enable this has some git-add specific side-effects (e.g. it
doesn't recurse into ignored directories, so listing ignored files with
--untracked=all doesn't work).

The DIR_SHOW_IGNORED flag doesn't list untracked files and returns ignored
files in dir_struct.entries[] (instead of dir_struct.ignored[] as
DIR_COLLECT_IGNORED). DIR_SHOW_IGNORED is used all throughout git.

We don't want to break the existing API, so lets introduce a new flag
DIR_SHOW_IGNORED_TOO that lists untracked as well as ignored files similar
to DIR_COLLECT_FILES, but will recurse into sub-directories based on the
other flags as DIR_SHOW_IGNORED does.

In dir.c::read_directory_recursive, add ignored files to either
dir_struct.entries[] or dir_struct.ignored[] based on the flags. Also move
the DIR_COLLECT_IGNORED case here so that filling result lists is in a
common place.

In wt-status.c::wt_status_collect_untracked, use the new flag and read
results from dir_struct.ignored[]. Remove the extra fill_directory call.

builtin/check-ignore.c doesn't call fill_directory, setting the git-add
specific DIR_COLLECT_IGNORED flag has no effect here. Remove for clarity.

Update API documentation to reflect the changes.

Performance: with this patch, 'git-status --ignored' is typically as fast
as 'git-status'.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:36:42 -07:00
Karsten Blees
defd7c7b37 dir.c: git-status --ignored: don't scan the work tree three times
'git-status --ignored' recursively scans directories up to three times:

 1. To collect untracked files.

 2. To collect ignored files.

 3. When collecting ignored files, to check that an untracked directory
    that potentially contains ignored files doesn't also contain untracked
    files (i.e. isn't already listed as untracked).

Let's get rid of case 3 first.

Currently, read_directory_recursive returns a boolean whether a directory
contains the requested files or not (actually, it returns the number of
files, but no caller actually needs that), and DIR_SHOW_IGNORED specifies
what we're looking for.

To be able to test for both untracked and ignored files in a single scan,
we need to return a bit more info, and the result must be independent of
the DIR_SHOW_IGNORED flag.

Reuse the path_treatment enum as return value of read_directory_recursive.
Split path_handled in two separate values path_excluded and path_untracked
that don't change their meaning with the DIR_SHOW_IGNORED flag. We don't
need an extra value path_untracked_and_excluded, as directories with both
untracked and ignored files should be listed as untracked.

Rename path_ignored to path_none for clarity (i.e. "don't treat that path"
in contrast to "the path is ignored and should be treated according to
DIR_SHOW_IGNORED").

Replace enum directory_treatment with path_treatment. That's just another
enum with the same meaning, no need to translate back and forth.

In treat_directory, get rid of the extra read_directory_recursive call and
all the DIR_SHOW_IGNORED-specific code.

In read_directory_recursive, decide whether to dir_add_name path_excluded
or path_untracked paths based on the DIR_SHOW_IGNORED flag.

The return value of read_directory_recursive is the maximum path_treatment
of all files and sub-directories. In the check_only case, abort when we've
reached the most significant value (path_untracked).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
8aaf8d7728 dir.c: git-status: avoid is_excluded checks for tracked files
Checking if a file is in the index is much faster (hashtable lookup) than
checking if the file is excluded (linear search over exclude patterns).

Skip is_excluded checks for files: move the cache_name_exists check from
treat_file to treat_one_path and return early if the file is tracked.

This can safely be done as all other code paths also return path_ignored
for tracked files, and dir_add_ignored skips tracked files as well.

There's just one line left in treat_file, so move this to treat_one_path
as well.

Here's some performance data for git-status from the linux and WebKit
repos (best of 10 runs on a Debian Linux on SSD, core.preloadIndex=true):

       |    status      | status --ignored
       | linux | WebKit | linux | WebKit
-------+-------+--------+-------+---------
before | 0.218 |  1.583 | 0.321 |  2.579
after  | 0.156 |  0.988 | 0.202 |  1.279
gain   | 1.397 |  1.602 | 1.589 |  2.016

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
b07bc8c8c3 dir.c: replace is_path_excluded with now equivalent is_excluded API
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:01 -07:00
Karsten Blees
95c6f27164 dir.c: unify is_excluded and is_path_excluded APIs
The is_excluded and is_path_excluded APIs are very similar, except for a
few noteworthy differences:

is_excluded doesn't handle ignored directories, results for paths within
ignored directories are incorrect. This is probably based on the premise
that recursive directory scans should stop at ignored directories, which
is no longer true (in certain cases, read_directory_recursive currently
calls is_excluded *and* is_path_excluded to get correct ignored state).

is_excluded caches parsed .gitignore files of the last directory in struct
dir_struct. If the directory changes, it finds a common parent directory
and is very careful to drop only as much state as necessary. On the other
hand, is_excluded will also read and parse .gitignore files in already
ignored directories, which are completely irrelevant.

is_path_excluded correctly handles ignored directories by checking if any
component in the path is excluded. As it uses is_excluded internally, this
unfortunately forces is_excluded to drop and re-read all .gitignore files,
as there is no common parent directory for the root dir.

is_path_excluded tracks state in a separate struct path_exclude_check,
which is essentially a wrapper of dir_struct with two more fields. However,
as is_path_excluded also modifies dir_struct, it is not possible to e.g.
use multiple path_exclude_check structures with the same dir_struct in
parallel. The additional structure just unnecessarily complicates the API.

Teach is_excluded / prep_exclude about ignored directories: whenever
entering a new directory, first check if the entire directory is excluded.
Remember the excluded state in dir_struct. Don't traverse into already
ignored directories (i.e. don't read irrelevant .gitignore files).

Directories could also be excluded by exclude patterns specified on the
command line or .git/info/exclude, so we cannot simply skip prep_exclude
entirely if there's no .gitignore file name (dir_struct.exclude_per_dir).
Move this check to just before actually reading the file.

is_path_excluded is now equivalent to is_excluded, so we can simply
redirect to it (the public API is cleaned up in the next patch).

The performance impact of the additional ignored check per directory is
hardly noticeable when reading directories recursively (e.g. 'git status').
However, performance of git commands using the is_path_excluded API (e.g.
'git ls-files --cached --ignored --exclude-standard') is greatly improved
as this no longer re-reads .gitignore files on each call.

Here's some performance data from the linux and WebKit repos (best of 10
runs on a Debian Linux on SSD, core.preloadIndex=true):

       | ls-files -ci   |    status      | status --ignored
       | linux | WebKit | linux | WebKit | linux | WebKit
-------+-------+--------+-------+--------+-------+---------
before | 0.506 |  6.539 | 0.212 |  1.555 | 0.323 |  2.541
after  | 0.080 |  1.191 | 0.218 |  1.583 | 0.321 |  2.579
gain   | 6.325 |  5.490 | 0.972 |  0.982 | 1.006 |  0.985

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00
Karsten Blees
6cd5c582dc dir.c: move prep_exclude
Move prep_exclude in preparation for the next patch.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-15 12:34:00 -07:00