git-commit-vandalism/contrib
SZEDER Gábor c1bc0a0e92 completion: remove repeated dirnames with 'awk' during path completion
During git-aware path completion, after all the trailing path
components have been removed from the output of 'git ls-files' and
'git diff-index' (see previous patch), each directory name is repeated
as many times as the number of listed paths it contains.  This can be
a lot of repetitions, especially when invoking path completion close
to the root of a big worktree, which would cause a considerable
overhead downstream of __git_index_files(), in particular in the shell
loop that fills the COMPREPLY array.  To reduce this overhead,
__git_index_files() runs the classic '... |sort |uniq' pattern to
remove those repetitions from the function's output.

While removing repeated directory names is effective in reducing the
number of iterations in that shell loop, it still imposes the overhead
of fork()+exec()ing two external processes, and two additional stages
in the pipeline, where potentially relatively large amount of data can
be passed between two subsequent pipeline stages.

Extend __git_index_files()'s 'awk' script to remove repeated path
components by first creating and filling an associative array indexed
by all encountered path components (after the trailing path components
have been removed), and then iterating over this array and printing
the indices, i.e. unique path components.  This way we can remove the
'|sort |uniq' pipeline stages, and their eliminated overhead results
in faster path completion.

Listing all tracked files (12) and directories (23) at the top of the
worktree in linux.git (over 62k files), i.e. what's doing all the hard
work behind 'git rm <TAB>':

  Before this patch, best of five, using GNU awk on Linux:

    real    0m0.069s
    user    0m0.089s
    sys     0m0.026s

  After:

    real    0m0.052s
    user    0m0.072s
    sys     0m0.014s

  Difference: -24.6%

Note that this changes order of elements in __git_index_files()'s
output.  This is not an issue, because this function was only ever
intended to feed paths into the COMPREPLY array, and Bash will sort
its elements (according to the users locale) anyway.

Note also that using 'awk' to remove repeated path components is also
beneficial for the performance of the next two patches:

  - The first will extend this 'awk' script to dequote quoted paths in
    the output of 'git ls-files' and 'git diff-index'.  With this
    patch it will only have to dequote unique path components, not
    all.

  - The second will, among other things, extend this 'awk' script to
    prepend prefix path components from the command line to the
    currently completed path component.  Consequently, each line in
    'awk's output will grow longer.  Without this patch that '|sort
    |uniq' would have to exchange and process that much more data.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-17 12:49:36 +09:00
..
buildsystems
coccinelle Merge branch 'rs/cocci-strbuf-addf-to-addstr' 2018-02-15 14:55:44 -08:00
completion completion: remove repeated dirnames with 'awk' during path completion 2018-04-17 12:49:36 +09:00
contacts git-contacts: also recognise "Reported-by:" 2017-07-27 09:42:55 -07:00
credential Merge branch 'tz/fsf-address-update' into maint 2017-11-21 14:05:32 +09:00
diff-highlight diff-highlight: detect --graph by indent 2018-03-21 10:24:19 -07:00
emacs Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
examples Remove contrib/examples/* 2018-03-26 13:48:50 -07:00
fast-import Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
git-jump git-jump: give contact instructions in the README 2017-11-21 11:01:02 +09:00
git-shell-commands
hg-to-git Replace Free Software Foundation address in license notices 2017-11-09 13:21:21 +09:00
hooks hooks/pre-auto-gc-battery: allow gc to run on non-laptops 2018-02-28 14:24:46 -08:00
long-running-filter docs: warn about possible '=' in clean/smudge filter process values 2016-12-06 11:29:52 -08:00
mw-to-git Merge branch 'ab/mediawiki-namespace' 2017-11-15 12:14:32 +09:00
persistent-https docs/config: mention protocol implications of url.insteadOf 2017-06-01 10:07:10 +09:00
remote-helpers contrib: git-remote-{bzr,hg} placeholders don't need Python 2017-03-03 11:09:34 -08:00
stats
subtree Merge branch 'sg/subtree-signed-commits' 2018-03-08 12:36:25 -08:00
svn-fe
thunderbird-patch-inline contrib/thunderbird-patch-inline/appp.sh: use the $( ... ) construct for command substitution 2015-12-27 15:33:13 -08:00
update-unicode update_unicode.sh: remove the plane filter 2016-12-14 09:48:07 -08:00
workdir git-new-workdir: mark script as LF-only 2017-05-10 13:32:50 +09:00
convert-grafts-to-replace-refs.sh
git-resurrect.sh Merge branch 'jc/bs-t-is-not-a-tab-for-sed' 2017-04-16 23:29:29 -07:00
README
remotes2config.sh
rerere-train.sh contrib/rerere-train: optionally overwrite existing resolutions 2017-07-26 13:38:48 -07:00

Contributed Software

Although these pieces are available as part of the official git
source tree, they are in somewhat different status.  The
intention is to keep interesting tools around git here, maybe
even experimental ones, to give users an easier access to them,
and to give tools wider exposure, so that they can be improved
faster.

I am not expecting to touch these myself that much.  As far as
my day-to-day operation is concerned, these subdirectories are
owned by their respective primary authors.  I am willing to help
if users of these components and the contrib/ subtree "owners"
have technical/design issues to resolve, but the initiative to
fix and/or enhance things _must_ be on the side of the subtree
owners.  IOW, I won't be actively looking for bugs and rooms for
enhancements in them as the git maintainer -- I may only do so
just as one of the users when I want to scratch my own itch.  If
you have patches to things in contrib/ area, the patch should be
first sent to the primary author, and then the primary author
should ack and forward it to me (git pull request is nicer).
This is the same way as how I have been treating gitk, and to a
lesser degree various foreign SCM interfaces, so you know the
drill.

I expect that things that start their life in the contrib/ area
to graduate out of contrib/ once they mature, either by becoming
projects on their own, or moving to the toplevel directory.  On
the other hand, I expect I'll be proposing removal of disused
and inactive ones from time to time.

If you have new things to add to this area, please first propose
it on the git mailing list, and after a list discussion proves
there are some general interests (it does not have to be a
list-wide consensus for a tool targeted to a relatively narrow
audience -- for example I do not work with projects whose
upstream is svn, so I have no use for git-svn myself, but it is
of general interest for people who need to interoperate with SVN
repositories in a way git-svn works better than git-svnimport),
submit a patch to create a subdirectory of contrib/ and put your
stuff there.

-jc