git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	6047b28eb7	Merge branch 'en/header-split-cleanup' Split key function and data structure definitions out of cache.h to new header files and adjust the users. * en/header-split-cleanup: csum-file.h: remove unnecessary inclusion of cache.h write-or-die.h: move declarations for write-or-die.c functions from cache.h treewide: remove cache.h inclusion due to setup.h changes setup.h: move declarations for setup.c functions from cache.h treewide: remove cache.h inclusion due to environment.h changes environment.h: move declarations for environment.c functions from cache.h treewide: remove unnecessary includes of cache.h wrapper.h: move declarations for wrapper.c functions from cache.h path.h: move function declarations for path.c functions from cache.h cache.h: remove expand_user_path() abspath.h: move absolute path functions from cache.h environment: move comment_line_char from cache.h treewide: remove unnecessary cache.h inclusion from several sources treewide: remove unnecessary inclusion of gettext.h treewide: be explicit about dependence on gettext.h treewide: remove unnecessary cache.h inclusion from a few headers	2023-04-06 13:38:31 -07:00
Junio C Hamano	72871b198f	Merge branch 'ab/remove-implicit-use-of-the-repository' Code clean-up around the use of the_repository. * ab/remove-implicit-use-of-the-repository: libs: use "struct repository " argument, not "the_repository" post-cocci: adjust comments for recent repo_ migration cocci: apply the "revision.h" part of "the_repository.pending" cocci: apply the "rerere.h" part of "the_repository.pending" cocci: apply the "refs.h" part of "the_repository.pending" cocci: apply the "promisor-remote.h" part of "the_repository.pending" cocci: apply the "packfile.h" part of "the_repository.pending" cocci: apply the "pretty.h" part of "the_repository.pending" cocci: apply the "object-store.h" part of "the_repository.pending" cocci: apply the "diff.h" part of "the_repository.pending" cocci: apply the "commit.h" part of "the_repository.pending" cocci: apply the "commit-reach.h" part of "the_repository.pending" cocci: apply the "cache.h" part of "the_repository.pending" cocci: add missing "the_repository" macros to "pending" cocci: sort "the_repository" rules by header cocci: fix incorrect & verbose "the_repository" rules cocci: remove dead rule from "the_repository.pending.cocci"	2023-04-06 13:38:30 -07:00
Junio C Hamano	62df03c277	Merge branch 'jk/blame-contents-with-arbitrary-commit' "git blame --contents=<file> <rev> -- <path>" used to be forbidden, but now it finds the origins of lines starting at <file> contents through the history that leads to <rev>. * jk/blame-contents-with-arbitrary-commit: blame: allow --contents to work with non-HEAD commit	2023-04-04 14:28:28 -07:00
Ævar Arnfjörð Bjarmason	bc726bd075	cocci: apply the "object-store.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "object-store.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Ævar Arnfjörð Bjarmason	ecb5091fd4	cocci: apply the "commit.h" part of "the_repository.pending" Apply the part of "the_repository.pending.cocci" pertaining to "commit.h". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-28 07:36:45 -07:00
Jacob Keller	1a3119ed06	blame: allow --contents to work with non-HEAD commit The --contents option can be used with git blame to blame the file as if it had the contents from the specified file. This is akin to copying the contents into the working tree and then running git blame. This option has been supported since `1cfe77333f` ("git-blame: no rev means start from the working tree file.") The --contents option always blames the file as if it was based on the current HEAD commit. If you try to pass a revision while using --contents, you get the following error: fatal: cannot use --contents with final commit object name This is because the blame process generates a fake working tree commit which always uses the HEAD object as its sole parent. Enhance fake_working_tree_commit to take the object ID to use for the parent instead of always using the HEAD object. Then, always generate a fake commit when we have contents provided, even if we have a final object. Remove the check to disallow --contents and a final revision. Note that the behavior of generating a fake working commit is still skipped when a revision is provided but --contents is not provided. Generating such a commit in that case would combine the currently checked out file contents with the provided revision, which breaks normal blame behavior and produces unexpected results. This enables use of --contents with an arbitrary revision, rather than forcing the use of the local HEAD commit. This makes the --contents option significantly more flexible, as it is no longer required to check out the working tree to the desired commit before using --contents. Reword the documentation so that its clear that --contents can be used with <rev>. Add tests for the --contents option to the annotate-tests.sh test script. Signed-off-by: Jacob Keller <jacob.keller@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-24 12:05:22 -07:00
Elijah Newren	e38da487cc	setup.h: move declarations for setup.c functions from cache.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:54 -07:00
Elijah Newren	f394e093df	treewide: be explicit about dependence on gettext.h Dozens of files made use of gettext functions, without explicitly including gettext.h. This made it more difficult to find which files could remove a dependence on cache.h. Make C files explicitly include gettext.h if they are using it. However, while compat/fsmonitor/fsm-ipc-darwin.c should also gain an include of gettext.h, it was left out to avoid conflicting with an in-flight topic. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-03-21 10:56:51 -07:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Junio C Hamano	4e0d160bbc	Merge branch 'rs/mergesort' Make our mergesort implementation type-safe. * rs/mergesort: mergesort: remove llist_mergesort() packfile: use DEFINE_LIST_SORT fetch-pack: use DEFINE_LIST_SORT commit: use DEFINE_LIST_SORT blame: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT test-mergesort: use DEFINE_LIST_SORT_DEBUG mergesort: add macros for typed sort of linked lists mergesort: tighten merge loop mergesort: unify ranks loops	2022-08-03 13:36:09 -07:00
René Scharfe	47c30f7daa	blame: use DEFINE_LIST_SORT Build a typed sort function for blame entries using DEFINE_LIST_SORT instead of calling llist_mergesort(). This gets rid of the next pointer accessor functions and their calling overhead at the cost of a slightly increased object text size. Before: __TEXT __DATA __OBJC others dec hex 24621 56 0 147515 172192 2a0a0 blame.o With this patch: __TEXT __DATA __OBJC others dec hex 25229 56 0 151702 176987 2b35b blame.o Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-17 15:20:38 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Junio C Hamano	362f869ff2	Merge branch 'ab/diff-free-more' Leakfixes. * ab/diff-free-more: diff.[ch]: have diff_free() free options->parseopts diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec)	2022-02-25 15:47:36 -08:00
Ævar Arnfjörð Bjarmason	244c27242f	diff.[ch]: have diff_free() call clear_pathspec(opts.pathspec) Have the diff_free() function call clear_pathspec(). Since the diff_flush() function calls this all its callers can be simplified to rely on it instead. When I added the diff_free() function in `e900d494dc` (diff: add an API for deferred freeing, 2021-02-11) I simply missed this, or wasn't interested in it. Let's consolidate this now. This means that any future callers (and I've got revision.c in mind) that embed a "struct diff_options" can simply call diff_free() instead of needing know that it has an embedded pathspec. This does fix a bunch of leaks, but I can't mark any test here as passing under the SANITIZE=leak testing mode because in `886e1084d7` (builtin/: add UNLEAKs, 2017-10-01) an UNLEAK(rev) was added, which plasters over the memory leak. E.g. "t4011-diff-symlink.sh" would report fewer leaks with this fix, but because of the UNLEAK() reports none. I'll eventually loop around to removing that UNLEAK(rev) annotation as I'll fix deeper issues with the revisions API leaking. This is one small step on the way there, a new freeing function in revisions.c will want to call this diff_free(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-16 13:50:13 -08:00
Jerry Zhang	9d505b7b49	git-rev-list: add --exclude-first-parent-only flag It is useful to know when a branch first diverged in history from some integration branch in order to be able to enumerate the user's local changes. However, these local changes can include arbitrary merges, so it is necessary to ignore this merge structure when finding the divergence point. In order to do this, teach the "rev-list" family to accept "--exclude-first-parent-only", which restricts the traversal of excluded commits to only follow first parent links. -A-----E-F-G--main \ / / B-C-D--topic In this example, the goal is to return the set {B, C, D} which represents a topic branch that has been merged into main branch. `git rev-list topic ^main` will end up returning no commits since excluding main will end up traversing the commits on topic as well. `git rev-list --exclude-first-parent-only topic ^main` however will return {B, C, D} as desired. Add docs for the new flag, and clarify the doc for --first-parent to indicate that it applies to traversing the set of included commits only. Signed-off-by: Jerry Zhang <jerry@skydio.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-01-12 11:08:42 -08:00
brian m. carlson	14228447c9	hash: provide per-algorithm null OIDs Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:39 +09:00
René Scharfe	ca56dadb4b	use CALLOC_ARRAY Add and apply a semantic patch for converting code that open-codes CALLOC_ARRAY to use it instead. It shortens the code and infers the element size automatically. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-13 16:00:09 -08:00
Junio C Hamano	bf0a430f70	Merge branch 'en/strmap' A specialization of hashmap that uses a string as key has been introduced. Hopefully it will see wider use over time. * en/strmap: shortlog: use strset from strmap.h Use new HASHMAP_INIT macro to simplify hashmap initialization strmap: take advantage of FLEXPTR_ALLOC_STR when relevant strmap: enable allocations to come from a mem_pool strmap: add a strset sub-type strmap: split create_entry() out of strmap_put() strmap: add functions facilitating use as a string->int map strmap: enable faster clearing and reusing of strmaps strmap: add more utility functions strmap: new utility functions hashmap: provide deallocation function names hashmap: introduce a new hashmap_partial_clear() hashmap: allow re-use after hashmap_free() hashmap: adjust spacing to fix argument alignment hashmap: add usage documentation explaining hashmap_free[_entries]()	2020-11-21 15:14:38 -08:00
Elijah Newren	6da1a25814	hashmap: provide deallocation function names hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Philippe Blain	3af31e8786	blame: simplify 'setup_blame_bloom_data' interface The penultimate commit moved the initialization of 'sb.path' in 'builtin/blame.c::cmd_blame' before the call to 'blame.c::setup_blame_bloom_data'. Since 'cmd_blame' is the only caller of 'setup_blame_bloom_data', it is now unnecessary for 'setup_blame_bloom_data' to receive 'path' as a separate argument, as 'sb.path' is already initialized. Remove this argument from setup_blame_bloom_data's interface and use the 'path' field of the 'sb' 'struct blame_scoreboard' instead. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-01 15:54:15 -08:00
Philippe Blain	88894aaeea	blame: simplify 'setup_scoreboard' interface The previous commit moved the initialization of 'sb.path' in 'builtin/blame.c::cmd_blame' before the call to 'blame.c::setup_scoreboard'. Since 'cmd_blame' is the only caller of 'setup_scoreboard', it is now unnecessary for 'setup_scoreboard' to receive 'path' as a separate argument, as 'sb.path' is already initialized. Remove this argument from setup_scoreboard's interface and use the 'path' field of the 'sb' 'struct blame_scoreboard' instead. Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-01 15:54:15 -08:00
René Scharfe	db7d07f610	blame: handle deref_tag() returning NULL Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-12 12:25:14 -07:00
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Junio C Hamano	e1dd499513	Merge branch 'ea/blame-use-oideq' Code cleanup. * ea/blame-use-oideq: blame.c: replace instance of !oidcmp for oideq	2020-09-18 17:58:01 -07:00
Taylor Blau	312cff5207	bloom: split 'get_bloom_filter()' in two 'get_bloom_filter' takes a flag to control whether it will compute a Bloom filter if the requested one is missing. In the next patch, we'll add yet another parameter to this method, which would force all but one caller to specify an extra 'NULL' parameter at the end. Instead of doing this, split 'get_bloom_filter' into two functions: 'get_bloom_filter' and 'get_or_compute_bloom_filter'. The former only looks up a Bloom filter (and does not compute one if it's missing, thus dropping the 'compute_if_not_present' flag). The latter does compute missing Bloom filters, with an additional parameter to store whether or not it needed to do so. This simplifies many call-sites, since the majority of existing callers to 'get_bloom_filter' do not want missing Bloom filters to be computed (so they can drop the parameter entirely and use the simpler version of the function). While we're at it, instrument the new 'get_or_compute_bloom_filter()' with counters in the 'write_commit_graph_context' struct which store the number of filters that we did and didn't compute, as well as filters that were truncated. It would be nice to drop the 'compute_if_not_present' flag entirely, since all remaining callers of 'get_or_compute_bloom_filter' pass it as '1', but this will change in a future patch and hence cannot be removed. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 09:31:25 -07:00
Taylor Blau	4f3644056a	commit-graph: introduce 'get_bloom_filter_settings()' Many places in the code often need a pointer to the commit-graph's 'struct bloom_filter_settings', in which case they often take the value from the top-most commit-graph. In the non-split case, this works as expected. In the split case, however, things get a little tricky. Not all layers in a chain of incremental commit-graphs are required to themselves have Bloom data, and so whether or not some part of the code uses Bloom filters depends entirely on whether or not the top-most level of the commit-graph chain has Bloom filters. This has been the behavior since Bloom filters were introduced, and has been codified into the tests since `a759bfa9ee` (t4216: add end to end tests for git log with Bloom filters, 2020-04-06). In fact, t4216.130 requires that Bloom filters are not used in exactly the case described earlier. There is no reason that this needs to be the case, since it is perfectly valid for commits in an earlier layer to have Bloom filters when commits in a newer layer do not. Since Bloom settings are guaranteed in practice to be the same for any layer in a chain that has Bloom data, it is sufficient to traverse the '->base_graph' pointer until either (1) a non-null 'struct bloom_filter_settings ' is found, or (2) until we are at the root of the commit-graph chain. Introduce a 'get_bloom_filter_settings()' function that does just this, and use it instead of purely dereferencing the top-most graph's '->bloom_filter_settings' pointer. While we're at it, add an additional test in t5324 to guard against code in the commit-graph writing machinery that doesn't correctly handle a NULL 'struct bloom_filter '. Co-authored-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-09 12:51:48 -07:00
Edmundo Carmona Antoranz	1302badd16	blame.c: replace instance of !oidcmp for oideq `0906ac2b` (blame: use changed-path Bloom filters, 2020-04-16) introduced a call to oidcmp() that should have been oideq(), which was introduced in `14438c44` (introduce hasheq() and oideq(), 2018-08-28). Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-08 15:57:26 -07:00
Jeff King	c2ebaa27d6	blame: only coalesce lines that are adjacent in result After blame has finished but before we produce any output, we coalesce groups of lines that were adjacent in the original suspect (which may have been split apart by lines in intermediate commits which went away). However, this can cause incorrect output if the lines are not also adjacent in the result. For instance, the case in t8003 has: ABC DEF which becomes ABC SPLIT DEF Blaming only lines 1 and 3 in the result yields two blame groups (one for each line) that were adjacent in the original. That's enough for us to coalesce them into a single group, but that loses information: our output routines assume they're adjacent in the result as well, and we output: <oid> 1) ABC <oid> 2) SPLIT This is nonsense for two reasons: - we were asked about line 3, not line 2; we should not output the SPLIT line at all - commit <oid> did not touch the SPLIT line at all! We found the correct blame for line 3, but the bug is actually in the output stage, which is showing the wrong line number and content from the final file. We can fix this by only coalescing when both the suspect and result lines are adjacent. That fixes this bug, but keeps coalescing in cases where want it (e.g., the existing test in t8003 where SPLIT goes away, and the lines really are adjacent in the result). Reported-by: Nuthan Munaiah <nm6061@rit.edu> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-13 10:09:38 -07:00
Abhishek Kumar	c49c82aa4c	commit: move members graph_pos, generation to a slab We remove members `graph_pos` and `generation` from the struct commit. The default assignments in init_commit_node() are no longer valid, which is fine as the slab helpers return appropriate default values and the assignments are removed. We will replace existing use of commit->generation and commit->graph_pos by commit_graph_data_slab helpers using `contrib/coccinelle/commit.cocci'. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:30 -07:00
Jeff King	fe88f9f91f	blame: drop unused parameter from maybe_changed_path We don't use the "parent" parameter at all (probably because the bloom filter for a commit is always defined against a single parent anyway). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-23 14:37:03 -07:00
Derrick Stolee	0906ac2b54	blame: use changed-path Bloom filters The changed-path Bloom filters help reduce the amount of tree parsing required during history queries. Before calculating a diff, we can ask the filter if a path changed between a commit and its first parent. If the filter says "no" then we can move on without parsing trees. If the filter says "maybe" then we parse trees to discover if the answer is actually "yes" or "no". When computing a blame, there is a section in find_origin() that computes a diff between a commit and one of its parents. When this is the first parent, we can check the Bloom filters before calling diff_tree_oid(). In order to make this work with the blame machinery, we need to initialize a struct bloom_key with the initial path. But also, we need to add more keys to a list if a rename is detected. We then check to see if _any_ of these keys answer "maybe" in the diff. During development, I purposefully left out this "add a new key when a rename is detected" to see if the test suite would catch my error. That is how I discovered the issues with GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS from the previous change. With that change, we can feel some confidence in the coverage of this change. If a user requests copy detection using "git blame -C", then there are more places where the set of "important" files can expand. I do not know enough about how this happens in the blame machinery. Thus, the Bloom filter integration is explicitly disabled in this mode. A later change could expand the bloom_key data with an appropriate call (or calls) to add_bloom_key(). If we did not disable this mode, then the following tests would fail: t8003-blame-corner-cases.sh t8011-blame-split-file.sh Generally, this is a performance enhancement and should not change the behavior of 'git blame' in any way. If a repo has a commit-graph file with computed changed-path Bloom filters, then they should notice improved performance for their 'git blame' commands. Here are some example timings that I found by blaming some paths in the Linux kernel repository: git blame arch/x86/kernel/topology.c >/dev/null Before: 0.83s After: 0.24s git blame kernel/time/time.c >/dev/null Before: 0.72s After: 0.24s git blame tools/perf/ui/stdio/hist.c >/dev/null Before: 0.27s After: 0.11s I specifically looked for "deep" paths that were also edited many times. As a counterpoint, the MAINTAINERS file was edited many times but is located in the root tree. This means that the cost of computing a diff relative to the pathspec is very small. Here are the timings for that command: git blame MAINTAINERS >/dev/null Before: 20.1s After: 18.0s These timings are the best of five. The worst-case runs were on the order of 2.5 minutes for both cases. Note that the MAINTAINERS file has 18,740 lines across 17,000+ commits. This happens to be one of the cases where this change provides the least improvement. The lack of improvement for the MAINTAINERS file and the relatively modest improvement for the other examples can be easily explained. The blame machinery needs to compute line-level diffs to determine which lines were changed by each commit. That makes up a large proportion of the computation time, and this change does not attempt to improve on that section of the algorithm. The MAINTAINERS file is large and changed often, so it takes time to determine which lines were updated by which commit. In contrast, the code files are much smaller, and it takes longer to comute the line-by-line diff for a single patch on the Linux mailing lists. Outside of the "-C" integration, I believe there is little more to gain from the changed-path Bloom filters for 'git blame' after this patch. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-16 15:38:06 -07:00
Junio C Hamano	5efabc7ed9	Merge branch 'ew/hashmap' Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use _entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry " hashmap: use _entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry " introduce container_of macro hashmap_put takes "struct hashmap_entry " hashmap_remove takes "const struct hashmap_entry " hashmap_get takes "const struct hashmap_entry " hashmap_add takes "struct hashmap_entry " hashmap_get_next takes "const struct hashmap_entry " hashmap_entry_init takes "struct hashmap_entry " packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent	2019-10-15 13:48:02 +09:00
Eric Wong	404ab78e39	hashmap: remove type arg from hashmap_{get,put,remove}_entry Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:12 +09:00
Eric Wong	23dee69f53	OFFSETOF_VAR macro to simplify hashmap iterators While we cannot rely on a `__typeof__' operator being portable to use with `offsetof'; we can calculate the pointer offset using an existing pointer and the address of a member using pointer arithmetic for compilers without `__typeof__'. This allows us to simplify usage of hashmap iterator macros by not having to specify a type when a pointer of that type is already given. In the future, list iterator macros (e.g. list_for_each_entry) may also be implemented using OFFSETOF_VAR to save hackers the trouble of using container_of/list_entry macros and without relying on non-portable `__typeof__'. v3: use `__typeof__' to avoid clang warnings Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	c8e424c9c9	hashmap: introduce hashmap_free_entries `hashmap_free_entries' behaves like `container_of' and passes the offset of the hashmap_entry struct to the internal `hashmap_free_' function, allowing the function to free any struct pointer regardless of where the hashmap_entry field is located. `hashmap_free' no longer takes any arguments aside from the hashmap itself. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	87571c3f71	hashmap: use *_entry APIs for iteration Inspired by list_for_each_entry in the Linux kernel. Once again, these are somewhat compromised usability-wise by compilers lacking __typeof__ support. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f23a465132	hashmap_get{,_from_hash} return "struct hashmap_entry *" Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	28ee794128	hashmap_remove takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b6c5241606	hashmap_get takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b94e5c1df6	hashmap_add takes "struct hashmap_entry " This is less error-prone than "void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	d22245a2e3	hashmap_entry_init takes "struct hashmap_entry " C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry " to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
brian m. carlson	fee49308a1	blame: remove needless comparison with GIT_SHA1_HEXSZ When faking a working tree commit, we read in lines from MERGE_HEAD into a strbuf. Because the strbuf is NUL-terminated and get_oid_hex will fail if it unexpectedly encounters a NUL, the check for the length of the line is unnecessary. There is no optimization benefit from this case, either, since on failure we call die. Remove this check, since it is no longer needed. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 15:04:57 -07:00
Junio C Hamano	1eb0a12ec3	Merge branch 'nd/tree-walk-with-repo' The tree-walk API learned to pass an in-core repository instance throughout more codepaths. * nd/tree-walk-with-repo: t7814: do not generate same commits in different repos Use the right 'struct repository' instead of the_repository match-trees.c: remove the_repo from shift_tree*() tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks() tree-walk.c: remove the_repo from get_tree_entry() tree-walk.c: remove the_repo from fill_tree_descriptor() sha1-file.c: remove the_repo from read_object_with_reference()	2019-07-19 11:30:21 -07:00
Junio C Hamano	209f075593	Merge branch 'br/blame-ignore' "git blame" learned to "ignore" commits in the history, whose effects (as well as their presence) get ignored. * br/blame-ignore: t8014: remove unnecessary braces blame: drop some unused function parameters blame: add a test to cover blame_coalesce() blame: use the fingerprint heuristic to match ignored lines blame: add a fingerprint heuristic to match ignored lines blame: optionally track line fingerprints during fill_blame_origin() blame: add config options for the output of ignored or unblamable lines blame: add the ability to ignore commits and their changes blame: use a helper function in blame_chunk() Move oidset_parse_file() to oidset.c fsck: rename and touch up init_skiplist()	2019-07-19 11:30:20 -07:00
Jeff King	07a54dc307	blame: drop some unused function parameters These unused parameters were introduced recently as part of the br/blame-ignore topic. I assume they are not indicative of bugs, but are just leftovers from the development process (they were introduced by the series but not used in any of its iterations). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-28 10:13:56 -07:00
Nguyễn Thái Ngọc Duy	50ddb089ff	tree-walk.c: remove the_repo from get_tree_entry() Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-27 12:45:17 -07:00
Barret Rhoden	a07a97760c	blame: use the fingerprint heuristic to match ignored lines This commit integrates the fuzzy fingerprint heuristic into guess_line_blames(). We actually make two passes. The first pass uses the fuzzy algorithm to find a match within the current diff chunk. If that fails, the second pass searches the entire parent file for the best match. For an example of scanning the entire parent for a match, consider: commit-a 30) #include <sys/header_a.h> commit-b 31) #include <header_b.h> commit-c 32) #include <header_c.h> Then commit X alphabetizes them: commit-X 30) #include <header_b.h> commit-X 31) #include <header_c.h> commit-X 32) #include <sys/header_a.h> If we just check the parent's chunk (i.e. the first pass), we'd get: commit-b 30) #include <header_b.h> commit-c 31) #include <header_c.h> commit-X 32) #include <sys/header_a.h> That's because commit X actually consists of two chunks: one chunk is removing sys/header_a.h, then some context, and the second chunk is adding sys/header_a.h. If we scan the entire parent file, we get: commit-b 30) #include <header_b.h> commit-c 31) #include <header_c.h> commit-a 32) #include <sys/header_a.h> Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 13:38:09 -07:00
Michael Platings	1d028dc682	blame: add a fingerprint heuristic to match ignored lines This algorithm will replace the heuristic used to identify lines from ignored commits with one that finds likely candidate lines in the parent's version of the file. The actual replacement occurs in an upcoming commit. The old heuristic simply assigned lines in the target to the same line number (plus offset) in the parent. The new function uses a fingerprinting algorithm to detect similarity between lines. The new heuristic is designed to accurately match changes made mechanically by formatting tools such as clang-format and clang-tidy. These tools make changes such as breaking up lines to fit within a character limit or changing identifiers to fit with a naming convention. The heuristic is not intended to match more extensive refactoring changes and may give misleading results in such cases. In most cases formatting tools preserve line ordering, so the heuristic is optimised for such cases. (Some types of changes do reorder lines e.g. sorting keep the line content identical, the git blame -M option can already be used to address this). The reason that it is advantageous to rely on ordering is due to source code repeating the same character sequences often e.g. declaring an identifier on one line and using that identifier on several subsequent lines. This means that lines can look very similar to each other which presents a problem when doing fuzzy matching. Relying on ordering gives us extra clues to point towards the true match. The heuristic operates on a single diff chunk change at a time. It creates a “fingerprint” for each line on each side of the change. Fingerprints are described in detail in the comment for `struct fingerprint`, but essentially are a multiset of the character pairs in a line. The heuristic first identifies the line in the target entry whose fingerprint is most clearly matched to a line fingerprint in the parent entry. Where fingerprints match identically, the position of the lines is used as a tie-break. The heuristic locks in the best match, and subtracts the fingerprint of the line in the target entry from the fingerprint of the line in the parent entry to prevent other lines being matched on the same parts of that line. It then repeats the process recursively on the section of the chunk before the match, and then the section of the chunk after the match. Here's an example of the difference the fingerprinting makes. Consider a file with two commits: commit-a 1) void func_1(void x, void y); commit-b 2) void func_2(void x, void y); After a commit 'X', we have: commit-X 1) void func_1(void x, commit-X 2) void y); commit-X 3) void func_2(void x, commit-X 4) void y); When we blame-ignored with the old algorithm, we get: commit-a 1) void func_1(void x, commit-b 2) void y); commit-X 3) void func_2(void x, commit-X 4) void y); Where commit-b is blamed for 2 instead of 3. With the fingerprint algorithm, we get: commit-a 1) void func_1(void x, commit-a 2) void y); commit-b 3) void func_2(void x, commit-b 4) void y); Note line 2 could be matched with either commit-a or commit-b as it is equally similar to both lines, but is matched with commit-a because its position as a fraction of the new line range is more similar to commit-a as a fraction of the old line range. Line 4 is also equally similar to both lines, but as it appears after line 3 which will be matched first it cannot be matched with an earlier line. For many more examples, see t/t8014-blame-ignore-fuzzy.sh which contains example parent and target files and the line numbers in the parent that must be matched. Signed-off-by: Michael Platings <michael@platin.gs> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 13:38:08 -07:00
Barret Rhoden	1fc73384ba	blame: optionally track line fingerprints during fill_blame_origin() fill_blame_origin() is a convenient place to store data that we will use throughout the lifetime of a blame_origin. Some heuristics for ignoring commits during a blame session can make use of this storage. In particular, we will calculate a fingerprint for each line of a file for blame_origins involved in an ignored commit. In this commit, we only calculate the line_starts, reusing the existing code from the scoreboard's line_starts. In an upcoming commit, we will actually compute the fingerprints. This feature will be used when we attempt to pass blame entries to parents when we "ignore" a commit. Most uses of fill_blame_origin() will not require this feature, hence the flag parameter. Multiple calls to fill_blame_origin() are idempotent, and any of them can request the creation of the fingerprints structure. Suggested-by: Michael Platings <michael@platin.gs> Signed-off-by: Barret Rhoden <brho@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-16 11:36:23 +09:00

1 2 3 4

163 Commits