![]() During git-aware path completion, after all the trailing path components have been removed from the output of 'git ls-files' and 'git diff-index' (see previous patch), each directory name is repeated as many times as the number of listed paths it contains. This can be a lot of repetitions, especially when invoking path completion close to the root of a big worktree, which would cause a considerable overhead downstream of __git_index_files(), in particular in the shell loop that fills the COMPREPLY array. To reduce this overhead, __git_index_files() runs the classic '... |sort |uniq' pattern to remove those repetitions from the function's output. While removing repeated directory names is effective in reducing the number of iterations in that shell loop, it still imposes the overhead of fork()+exec()ing two external processes, and two additional stages in the pipeline, where potentially relatively large amount of data can be passed between two subsequent pipeline stages. Extend __git_index_files()'s 'awk' script to remove repeated path components by first creating and filling an associative array indexed by all encountered path components (after the trailing path components have been removed), and then iterating over this array and printing the indices, i.e. unique path components. This way we can remove the '|sort |uniq' pipeline stages, and their eliminated overhead results in faster path completion. Listing all tracked files (12) and directories (23) at the top of the worktree in linux.git (over 62k files), i.e. what's doing all the hard work behind 'git rm <TAB>': Before this patch, best of five, using GNU awk on Linux: real 0m0.069s user 0m0.089s sys 0m0.026s After: real 0m0.052s user 0m0.072s sys 0m0.014s Difference: -24.6% Note that this changes order of elements in __git_index_files()'s output. This is not an issue, because this function was only ever intended to feed paths into the COMPREPLY array, and Bash will sort its elements (according to the users locale) anyway. Note also that using 'awk' to remove repeated path components is also beneficial for the performance of the next two patches: - The first will extend this 'awk' script to dequote quoted paths in the output of 'git ls-files' and 'git diff-index'. With this patch it will only have to dequote unique path components, not all. - The second will, among other things, extend this 'awk' script to prepend prefix path components from the command line to the currently completed path component. Consequently, each line in 'awk's output will grow longer. Without this patch that '|sort |uniq' would have to exchange and process that much more data. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.. | ||
buildsystems | ||
coccinelle | ||
completion | ||
contacts | ||
credential | ||
diff-highlight | ||
emacs | ||
examples | ||
fast-import | ||
git-jump | ||
git-shell-commands | ||
hg-to-git | ||
hooks | ||
long-running-filter | ||
mw-to-git | ||
persistent-https | ||
remote-helpers | ||
stats | ||
subtree | ||
svn-fe | ||
thunderbird-patch-inline | ||
update-unicode | ||
workdir | ||
convert-grafts-to-replace-refs.sh | ||
git-resurrect.sh | ||
README | ||
remotes2config.sh | ||
rerere-train.sh |
Contributed Software Although these pieces are available as part of the official git source tree, they are in somewhat different status. The intention is to keep interesting tools around git here, maybe even experimental ones, to give users an easier access to them, and to give tools wider exposure, so that they can be improved faster. I am not expecting to touch these myself that much. As far as my day-to-day operation is concerned, these subdirectories are owned by their respective primary authors. I am willing to help if users of these components and the contrib/ subtree "owners" have technical/design issues to resolve, but the initiative to fix and/or enhance things _must_ be on the side of the subtree owners. IOW, I won't be actively looking for bugs and rooms for enhancements in them as the git maintainer -- I may only do so just as one of the users when I want to scratch my own itch. If you have patches to things in contrib/ area, the patch should be first sent to the primary author, and then the primary author should ack and forward it to me (git pull request is nicer). This is the same way as how I have been treating gitk, and to a lesser degree various foreign SCM interfaces, so you know the drill. I expect that things that start their life in the contrib/ area to graduate out of contrib/ once they mature, either by becoming projects on their own, or moving to the toplevel directory. On the other hand, I expect I'll be proposing removal of disused and inactive ones from time to time. If you have new things to add to this area, please first propose it on the git mailing list, and after a list discussion proves there are some general interests (it does not have to be a list-wide consensus for a tool targeted to a relatively narrow audience -- for example I do not work with projects whose upstream is svn, so I have no use for git-svn myself, but it is of general interest for people who need to interoperate with SVN repositories in a way git-svn works better than git-svnimport), submit a patch to create a subdirectory of contrib/ and put your stuff there. -jc