git-commit-vandalism/builtin
Erik Elfström 0179ca7a62 clean: improve performance when removing lots of directories
"git clean" uses resolve_gitlink_ref() to check for the presence of
nested git repositories, but it has the drawback of creating a
ref_cache entry for every directory that should potentially be
cleaned. The linear search through the ref_cache list causes a massive
performance hit for large number of directories.

Modify clean.c:remove_dirs to use setup.c:is_git_directory and
setup.c:read_gitfile_gently instead.

Both these functions will open files and parse contents when they find
something that looks like a git repository. This is ok from a
performance standpoint since finding repository candidates should be
comparatively rare.

Using is_git_directory and read_gitfile_gently should give a more
standardized check for what is and what isn't a git repository but
also gives three behavioral changes.

The first change is that we will now detect and avoid cleaning empty
nested git repositories (only init run). This is desirable.

Second, we will no longer die when cleaning a file named ".git" with
garbage content (it will be cleaned instead). This is also desirable.

The last change is that we will detect and avoid cleaning empty bare
repositories that have been placed in a directory named ".git". This
is not desirable but should have no real user impact since we already
fail to clean non-empty bare repositories in the same scenario. This
is thus deemed acceptable.

On top of this we add some extra precautions. If read_gitfile_gently
fails to open the git file, read the git file or verify the path in
the git file we assume that the path with the git file is a valid
repository and avoid cleaning.

Update t7300 to reflect these changes in behavior.

The time to clean an untracked directory containing 100000 sub
directories went from 61s to 1.7s after this change.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-15 13:14:24 -07:00
..
add.c Merge branch 'jk/add-e-kill-editor' 2015-05-22 12:41:55 -07:00
annotate.c annotate: use argv_array 2014-07-16 11:10:11 -07:00
apply.c Merge branch 'bc/object-id' 2015-05-05 21:00:23 -07:00
archive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
bisect--helper.c Replace deprecated OPT_BOOLEAN by OPT_BOOL 2013-08-05 11:32:19 -07:00
blame.c Merge branch 'rs/janitorial' 2015-06-01 12:45:15 -07:00
branch.c Merge branch 'bc/object-id' 2015-06-05 12:17:37 -07:00
bundle.c bundle: verify arguments more strictly 2015-05-08 10:52:11 -07:00
cat-file.c Merge branch 'dt/cat-file-follow-symlinks' 2015-06-01 12:45:16 -07:00
check-attr.c standardize usage info string format 2015-01-14 09:32:04 -08:00
check-ignore.c standardize usage info string format 2015-01-14 09:32:04 -08:00
check-mailmap.c standardize usage info string format 2015-01-14 09:32:04 -08:00
check-ref-format.c standardize usage info string format 2015-01-14 09:32:04 -08:00
checkout-index.c prefix_path(): unconditionally free results in the callers 2015-05-05 10:31:51 -07:00
checkout.c add_pending_uninteresting_ref(): rewrite to take an object_id argument 2015-05-25 12:19:28 -07:00
clean.c clean: improve performance when removing lots of directories 2015-06-15 13:14:24 -07:00
clone.c Merge branch 'mh/clone-verbosity-fix' 2015-05-22 12:41:56 -07:00
column.c standardize usage info string format 2015-01-14 09:32:04 -08:00
commit-tree.c commit-tree: simplify parsing of option -S using skip_prefix() 2014-12-29 09:32:45 -08:00
commit.c Merge branch 'nd/untracked-cache' 2015-05-26 13:24:46 -07:00
config.c Sync with 2.3.8 2015-05-11 14:39:28 -07:00
count-objects.c count-objects: report unused files in $GIT_DIR/worktrees/... 2014-12-01 11:00:18 -08:00
credential.c
describe.c get_name(): rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
diff-files.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff-index.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff-tree.c standardize usage info string format 2015-01-14 09:32:04 -08:00
diff.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
fast-export.c teach fast-export an --anonymize option 2014-08-27 10:42:16 -07:00
fetch-pack.c standardize usage info string format 2015-01-14 09:32:04 -08:00
fetch.c builtin/fetch: rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
fmt-merge-msg.c Merge branch 'jc/plug-fmt-merge-msg-leak' 2015-05-11 14:23:46 -07:00
for-each-ref.c Merge branch 'bc/object-id' 2015-06-05 12:17:37 -07:00
fsck.c fsck: change functions to use object_id 2015-05-25 12:19:32 -07:00
gc.c Merge branch 'nd/multiple-work-trees' 2015-05-11 14:23:39 -07:00
get-tar-commit-id.c use skip_prefix() to avoid more magic numbers 2014-10-07 11:09:16 -07:00
grep.c Merge branch 'ps/grep-help-all-callback-arg' 2015-04-20 15:28:34 -07:00
hash-object.c Merge branch 'jc/hash-object' 2015-05-11 14:23:59 -07:00
help.c Merge branch 'sb/leaks' 2015-03-20 13:11:53 -07:00
index-pack.c Merge branch 'nd/slim-index-pack-memory-usage' 2015-05-11 14:23:44 -07:00
init-db.c Merge branch 'jk/init-core-worktree-at-root' into maint 2015-05-13 14:05:49 -07:00
interpret-trailers.c trailer: add interpret-trailers command 2014-10-13 13:55:27 -07:00
log.c Merge branch 'jk/at-push-sha1' 2015-06-05 12:17:36 -07:00
ls-files.c Merge branch 'jc/report-path-error-to-dir' into maint 2015-03-31 14:53:08 -07:00
ls-remote.c standardize usage info string format 2015-01-14 09:32:04 -08:00
ls-tree.c ls-tree: disable negative pathspec because it's not supported 2014-12-01 11:33:45 -08:00
mailinfo.c standardize usage info string format 2015-01-14 09:32:04 -08:00
mailsplit.c mailsplit: remove unnecessary unlink(2) call 2014-10-07 10:49:57 -07:00
merge-base.c standardize usage info string format 2015-01-14 09:32:04 -08:00
merge-file.c Merge branch 'ab/merge-file-prefix' 2015-02-22 12:28:25 -08:00
merge-index.c standardize usage info string format 2015-01-14 09:32:04 -08:00
merge-ours.c
merge-recursive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
merge-tree.c merge-tree: remove unused df_conflict arguments 2014-09-02 11:02:58 -07:00
merge.c Merge branch 'jk/at-push-sha1' 2015-06-05 12:17:36 -07:00
mktag.c
mktree.c builtin/mktree.c: use ALLOC_GROW() in append_to_tree() 2014-03-03 14:54:45 -08:00
mv.c standardize usage info string format 2015-01-14 09:32:04 -08:00
name-rev.c name_ref(): rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
notes.c standardize usage info string format 2015-01-14 09:32:04 -08:00
pack-objects.c builtin/pack-objects: rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
pack-redundant.c standardize usage info string format 2015-01-14 09:32:04 -08:00
pack-refs.c standardize usage info string format 2015-01-14 09:32:04 -08:00
patch-id.c patch-id: convert to use struct object_id 2015-03-13 22:43:14 -07:00
prune-packed.c standardize usage info string format 2015-01-14 09:32:04 -08:00
prune.c Merge branch 'nd/multiple-work-trees' 2015-05-11 14:23:39 -07:00
push.c push: allow --follow-tags to be set by config push.followTags 2015-03-14 15:08:35 -07:00
read-tree.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
receive-pack.c show_ref_cb(): rewrite to take an object_id argument 2015-05-25 12:19:29 -07:00
reflog.c builtin/reflog: rewrite ref functions to take an object_id argument 2015-05-25 12:19:30 -07:00
remote-ext.c use skip_prefix() to avoid more magic numbers 2014-10-07 11:09:16 -07:00
remote-fd.c
remote.c builtin/remote: rewrite functions to take object_id arguments 2015-05-25 12:19:30 -07:00
repack.c Merge branch 'nd/multiple-work-trees' 2015-05-11 14:23:39 -07:00
replace.c show_reference(): rewrite to take an object_id argument 2015-05-25 12:19:30 -07:00
rerere.c standardize usage info string format 2015-01-14 09:32:04 -08:00
reset.c lockfile.h: extract new header file for the functions in lockfile.c 2014-10-01 13:56:14 -07:00
rev-list.c commit: record buffer length in cache 2014-06-13 12:09:38 -07:00
rev-parse.c builtin/rev-parse: rewrite to take an object_id argument 2015-05-25 12:19:27 -07:00
revert.c standardize usage info string format 2015-01-14 09:32:04 -08:00
rm.c use file_exists() to check if a file exists in the worktree 2015-05-20 13:49:10 -07:00
send-pack.c send-pack.c: add --atomic command line argument 2015-01-07 19:56:44 -08:00
shortlog.c standardize usage info string format 2015-01-14 09:32:04 -08:00
show-branch.c cmd_show_branch(): fix error message 2015-05-25 12:19:31 -07:00
show-ref.c show_ref(): convert local variable peeled to object_id 2015-05-25 12:19:32 -07:00
stripspace.c builtin/stripspace.c: fix broken indentation 2013-09-06 13:33:17 -07:00
symbolic-ref.c standardize usage info string format 2015-01-14 09:32:04 -08:00
tag.c builtin/show-ref: rewrite to take an object_id argument 2015-05-25 12:19:33 -07:00
unpack-file.c
unpack-objects.c index-pack: terminate object buffers with NUL 2014-12-09 11:56:37 -08:00
update-index.c Merge branch 'nd/untracked-cache' 2015-05-26 13:24:46 -07:00
update-ref.c ref_transaction_verify(): new function to check a reference's value 2015-02-17 11:24:59 -08:00
update-server-info.c
upload-archive.c replace {pre,suf}fixcmp() with {starts,ends}_with() 2013-12-05 14:13:21 -08:00
var.c
verify-commit.c standardize usage info string format 2015-01-14 09:32:04 -08:00
verify-pack.c standardize usage info string format 2015-01-14 09:32:04 -08:00
verify-tag.c standardize usage info string format 2015-01-14 09:32:04 -08:00
write-tree.c