The name-hash used for detecting paths that are different only in
cases (which matter on case insensitive filesystems) has been
optimized to take advantage of multi-threading when it makes sense.
* jh/memihash-opt:
name-hash: add test-lazy-init-name-hash to .gitignore
name-hash: add perf test for lazy_init_name_hash
name-hash: add test-lazy-init-name-hash
name-hash: perf improvement for lazy_init_name_hash
hashmap: document memihash_cont, hashmap_disallow_rehash api
hashmap: add disallow_rehash setting
hashmap: allow memihash computation to be continued
name-hash: specify initial size for istate.dir_hash table
Add t/helper/test-lazy-init-name-hash.c test code
to demonstrate performance times for lazy_init_name_hash()
using the original single-threaded and the new multi-threaded
code paths.
Includes a --dump option to dump the created hashmaps to
stdout. You can use this to run both code paths and
confirm that they generate the same hashmaps.
Includes a --analyze option to analyze performance of both
code paths over a range of index sizes to help you find a
lower bound for the LAZY_THREAD_COST in name-hash.c.
For example, passing "-a 4000" will set "istate.cache_nr"
to 4000 and then try the multi-threaded code -- probably
giving 2 threads with 2000 entries each. It will then
run both the single-threaded (1x4000) and the multi-threaded
(2x2000) and compare the times. It will then repeat the
test with 8000, 12000, and etc. so that you can see the
cross over.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
So far, we had no explicit tests of that function.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few codepaths had to rely on a global variable when sorting
elements of an array because sort(3) API does not allow extra data
to be passed to the comparison function. Use qsort_s() when
natively available, and a fallback implementation of it when not,
to eliminate the need, which is a prerequisite for making the
codepath reentrant.
* rs/qsort-s:
ref-filter: use QSORT_S in ref_array_sort()
string-list: use QSORT_S in string_list_sort()
perf: add basic sort performance test
add QSORT_S
compat: add qsort_s()
Add a sort command to test-string-list that reads lines from stdin,
stores them in a string_list and then sorts it. Use it in a simple
perf test script to measure the performance of string_list_sort().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Callers of the hold_locked_index() function pass 0 when they want to
prepare to write a new version of the index file without wishing to
die or emit an error message when the request fails (e.g. somebody
else already held the lock), and pass 1 when they want the call to
die upon failure.
This option is called LOCK_DIE_ON_ERROR by the underlying lockfile
API, and the hold_locked_index() function translates the paramter to
LOCK_DIE_ON_ERROR when calling the hold_lock_file_for_update().
Replace these hardcoded '1' with LOCK_DIE_ON_ERROR and stop
translating. Callers other than the ones that are replaced with
this change pass '0' to the function; no behaviour change is
intended with this patch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Among the callers of hold_locked_index() that passes 0:
- diff.c::refresh_index_quietly() at the end of "git diff" is an
opportunistic update; it leaks the lockfile structure but it is
just before the program exits and nobody should care.
- builtin/describe.c::cmd_describe(),
builtin/commit.c::cmd_status(),
sequencer.c::read_and_refresh_cache() are all opportunistic
updates and they are OK.
- builtin/update-index.c::cmd_update_index() takes a lock upfront
but we may end up not needing to update the index (i.e. the
entries may be fully up-to-date), in which case we do not need to
issue an error upon failure to acquire the lock. We do diagnose
and die if we indeed need to update, so it is OK.
- wt-status.c::require_clean_work_tree() IS BUGGY. It asks
silence, does not check the returned value. Compare with
callsites like cmd_describe() and cmd_status() to notice that it
is wrong to call update_index_if_able() unconditionally.
These test helper programs access the index, but do not ever
setup_git_directory(), meaning we just blindly looked in
".git/index". This happened to work for the purposes of our
tests (which do not run from subdirectories, nor in
non-repos), but it's a bad habit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We call "qsort(array, nelem, sizeof(array[0]), fn)", and most of
the time third parameter is redundant. A new QSORT() macro lets us
omit it.
* rs/qsort:
show-branch: use QSORT
use QSORT, part 2
coccicheck: use --all-includes by default
remove unnecessary check before QSORT
use QSORT
add QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT. The resulting code is
shorter and supports empty arrays with NULL pointers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The callbacks for iterating a sha1_array must have a void
return. This is unlike our usual for_each semantics, where
a callback may interrupt iteration and have its value
propagated. Let's switch it to the usual form, which will
enable its use in more places (e.g., where we are replacing
an existing iteration with a different data structure).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour. The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.
* jk/setup-sequence-update:
t1007: factor out repeated setup
init: reset cached config when entering new repo
init: expand comments explaining config trickery
config: only read .git/config from configured repos
test-config: setup git directory
t1302: use "git -C"
pager: handle early config
pager: use callbacks instead of configset
pager: make pager_program a file-local static
pager: stop loading git_default_config()
pager: remove obsolete comment
diff: always try to set up the repository
diff: handle --no-index prefixes consistently
diff: skip implicit no-index check when given --no-index
patch-id: use RUN_SETUP_GENTLY
hash-object: always try to set up the git repository
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:
@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash
@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code cleanup.
* rs/submodule-config-code-cleanup:
submodule-config: fix test binary crashing when no arguments given
submodule-config: combine early return code into one goto
submodule-config: passing name reference for .gitmodule blobs
submodule-config: use explicit empty string instead of strbuf in config_from()
Replace uses of strbuf_addf() for adding strings with more lightweight
strbuf_addstr() calls.
In http-push.c it becomes easier to see what's going on without having
to verfiy that the definition of PROPFIND_ALL_REQUEST doesn't contain
any format specifiers.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
There are certain house-keeping tasks that need to be performed at
the very beginning of any Git program, and programs that are not
built-in commands had to do them exactly the same way as "git"
potty does. It was easy to make mistakes in one-off standalone
programs (like test helpers). A common "main()" function that
calls cmd_main() of individual program has been introduced to
make it harder to make mistakes.
* jk/common-main:
mingw: declare main()'s argv as const
common-main: call git_setup_gettext()
common-main: call restore_sigpipe_to_default()
common-main: call sanitize_stdfds()
common-main: call git_extract_argv0_path()
add an extra level of indirection to main()
"git grep -i" has been taught to fold case in non-ascii locales
correctly.
* nd/icase:
grep.c: reuse "icase" variable
diffcore-pickaxe: support case insensitive match on non-ascii
diffcore-pickaxe: Add regcomp_or_die()
grep/pcre: support utf-8
gettext: add is_utf8_locale()
grep/pcre: prepare locale-dependent tables for icase matching
grep: rewrite an if/else condition to avoid duplicate expression
grep/icase: avoid kwsset when -F is specified
grep/icase: avoid kwsset on literal non-ascii strings
test-regex: expose full regcomp() to the command line
test-regex: isolate the bug test code
grep: break down an "if" stmt in preparation for next changes
The internal code used to show local timezone offset is not
prepared to handle timestamps beyond year 2100, and gave a
bogus offset value to the caller. Use a more benign looking
+0000 instead and let "git log" going in such a case, instead
of aborting.
* jk/tzoffset-fix:
local_tzoffset: detect errors from tm_to_time_t
t0006: test various date formats
t0006: rename test-date's "show" to "relative"
The internal code used to show local timezone offset is not
prepared to handle timestamps beyond year 2100, and gave a
bogus offset value to the caller. Use a more benign looking
+0000 instead and let "git log" going in such a case, instead
of aborting.
* jk/tzoffset-fix:
local_tzoffset: detect errors from tm_to_time_t
t0006: test various date formats
t0006: rename test-date's "show" to "relative"
"upload-pack" allows a custom "git pack-objects" replacement when
responding to "fetch/clone" via the uploadpack.packObjectsHook.
* jk/upload-pack-hook:
upload-pack: provide a hook for running pack-objects
t1308: do not get fooled by symbolic links to the source tree
config: add a notion of "scope"
config: return configset value for current_config_ functions
config: set up config_source for command-line config
git_config_parse_parameter: refactor cleanup code
git_config_with_options: drop "found" counting
Instead of taking advantage of a struct string_list that is
allocated with all NULs happens to be STRING_LIST_INIT_NODUP kind,
initialize them explicitly as such, to document their behaviour
better.
* jk/string-list-static-init:
use string_list initializer consistently
blame,shortlog: don't make local option variables static
interpret-trailers: don't duplicate option strings
parse_opt_string_list: stop allocating new strings
hashcpy with null_sha1 as the source is equivalent to hashclr. In
addition to being simpler, using hashclr may give the compiler a chance
to optimize better. Convert instances of hashcpy with the source
argument of null_sha1 to hashclr.
This transformation was implemented using the following semantic patch:
@@
expression E1;
@@
-hashcpy(E1, null_sha1);
+hashclr(E1);
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are two types of string_lists: those that own the
string memory, and those that don't. You can tell the
difference by the strdup_strings flag, and one should use
either STRING_LIST_INIT_DUP, or STRING_LIST_INIT_NODUP as an
initializer.
Historically, the normal all-zeros initialization has
corresponded to the NODUP case. Many sites use no
initializer at all, and that works as a shorthand for that
case. But for a reader of the code, it can be hard to
remember which is which. Let's be more explicit and actually
have each site declare which type it means to use.
This is a fairly mechanical conversion; I assumed each site
was correct as-is, and just switched them all to NODUP.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A config callback passed to git_config() doesn't know very
much about the context in which it sees a variable. It can
ask whether the variable comes from a file, and get the file
name. But without analyzing the filename (which is hard to
do accurately), it cannot tell whether it is in system-level
config, user-level config, or repo-specific config.
Generally this doesn't matter; the point of not passing this
to the callback is that it should treat the config the same
no matter where it comes from. But some programs, like
upload-pack, are a special case: we should be able to run
them in an untrusted repository, which means we cannot use
any "dangerous" config from the repository config file (but
it is OK to use it from system or user config).
This patch teaches the config code to record the "scope" of
each variable, and make it available inside config
callbacks, similar to how we give access to the filename.
The scope is the starting source for a particular parsing
operation, and remains the same even if we include other
files (so a .git/config which includes another file will
remain CONFIG_SCOPE_REPO, as it would be similarly
untrusted).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When 473166b (config: add 'origin_type' to config_source
struct, 2016-02-19) added accessor functions for the origin
type and name, it taught them only to look at the "cf"
struct that is filled in while we are parsing the config.
This is sufficient to make it work with git-config, which
uses git_config_with_options() under the hood. That function
freshly parses the config files and triggers the callback
when it parses each key.
Most git programs, however, use git_config(). This interface
will populate a cache during the actual parse, and then
serve values from the cache. Calling current_config_filename()
in a callback here will find a NULL cf and produce an error.
There are no such callers right now, but let's prepare for
adding some by making this work.
We already record source information in a struct attached to
each value. We just need to make it globally available and
then consult it from the accessor functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
t0040 had too many unnecessary repetitions in its test data. Teach
test-parse-options program so that a caller can tell what it
expects in its output, so that these repetitions can be cleaned up.
* jc/test-parse-options-expect:
t0040: convert a few tests to use test-parse-options --expect
t0040: remove unused test helpers
test-parse-options: --expect=<string> option to simplify tests
test-parse-options: fix output when callback option fails
"git commit" learned to pay attention to "commit.verbose"
configuration variable and act as if "--verbose" option was
given from the command line.
* pb/commit-verbose-config:
commit: add a commit.verbose config variable
t7507-commit-verbose: improve test coverage by testing number of diffs
parse-options.c: make OPTION_COUNTUP respect "unspecified" values
t/t7507: improve test coverage
t0040-parse-options: improve test coverage
test-parse-options: print quiet as integer
t0040-test-parse-options.sh: fix style issues
Move from unsigned char[20] to struct object_id continues.
* bc/object-id:
match-trees: convert several leaf functions to use struct object_id
tree-walk: convert tree_entry_extract() to use struct object_id
struct name_entry: use struct object_id instead of unsigned char sha1[20]
match-trees: convert shift_tree() and shift_tree_by() to use object_id
test-match-trees: convert to use struct object_id
sha1-name: introduce a get_oid() function
This keeps top dir a bit less crowded. And because these programs are
for testing purposes, it makes sense that they stay somewhere in t/
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>