Although the patience and histogram algorithms initialize the
environment they do not free it if there is an error. In contrast for
the Myers algorithm the environment is initalized in xdl_do_diff() and
it is freed if there is an error. Fix this by always initializing the
environment in xdl_do_diff() and freeing it there if there is an
error. Remove the comment in do_patience_diff() about the environment
being freed by xdl_diff() as it is not accurate because (a) xdl_diff()
does not do that if there is an error and (b) xdl_diff() is not the
only caller.
Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 663c5ad035 (diff histogram: intern strings, 2021-11-17), our
cmp_recs() does not call xdl_recmatch(), and thus no longer needs an
xpparam_t struct from which to get the flags. We can drop the unused
parameter from the function, as well as the macro which wraps it.
There's no functional change here; it's just simplifying things (and
making -Wunused-parameter happier).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This macro has been unused since 43ca7530df (xdiff/xhistogram: rely on
xdl_trim_ends(), 2011-08-01). The function that called it went away
there, and its use in the CMP() macro was inlined. It probably should
have been deleted then, but nobody noticed.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Histogram is the only diff algorithm not to call
xdl_classify_record(). xdl_classify_record() ensures that the hash
values of two strings that are not equal differ which means that it is
not necessary to use xdl_recmatch() when comparing lines, all that is
necessary is to compare the hash values. This gives a 7% reduction in
the runtime of "git log --patch" when using the histogram diff
algorithm.
Test HEAD^ HEAD
-----------------------------------------------------------------------------
4000.1: log -3000 (baseline) 0.18(0.14+0.04) 0.19(0.17+0.02) +5.6%
4000.2: log --raw -3000 (tree-only) 0.99(0.77+0.21) 0.98(0.78+0.20) -1.0%
4000.3: log -p -3000 (Myers) 4.84(4.31+0.51) 4.81(4.15+0.64) -0.6%
4000.4: log -p -3000 --histogram 6.34(5.86+0.46) 5.87(5.19+0.66) -7.4%
4000.5: log -p -3000 --patience 5.39(4.60+0.76) 5.35(4.60+0.73) -0.7%
Signed-off-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xpparam_t structures are usually zero-initialized before their specific
fields are assigned to, but there are three locations in the tree where
that does not happen. Add the missing memset() calls in order to make
initialization of xpparam_t structures consistent tree-wide and to
prevent stack garbage from being used as field values.
Signed-off-by: Michał Kępień <michal@isc.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
8c912eea94 ("teach --histogram to diff", 2011-07-12) included them, but
were already part of xinclude.h
Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running the same reproduction script as the previous patch,
it turns out the stack is too small, which can be easily avoided.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes a memory issue when recursing a lot, which can be reproduced as
seq 1 100000 >one
seq 1 4 100000 >two
git diff --no-index --histogram one two
Before this patch, histogram_diff would call itself recursively before
calling free_index, which would mean a lot of memory is allocated during
the recursion and only freed afterwards. By moving the memory allocation
(and its free call) into find_lcs, the memory is free'd before we recurse,
such that memory is reused in the next step of the recursion instead of
using new memory.
This addresses only the memory pressure, not the run time complexity,
that is also awful for the corner case outlined above.
Helpful in understanding the code (in addition to the sparse history of
this file), was https://stackoverflow.com/a/32367597 which reproduces
most of the code comments of the JGit implementation.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This will be useful in the next patch as we'll introduce multiple
callers.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By passing the 'xpp' and 'env' argument directly to the function
'fall_back_to_classic_diff', we eliminate an occurrence of the 'index'
in histogram_diff, which will prove useful in a bit.
While at it, move it up in the file. This will make the diff of
one of the next patches more legible.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of these were found using Lucas De Marchi's codespell tool.
Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because the default Myers, patience and histogram algorithms cannot be in
effect at the same time, XDL_PATIENCE_DIFF and XDL_HISTOGRAM_DIFF are not
independent bits. Instead of wasting one bit per algorithm, define a few
macros to access the few bits they occupy and update the code that access
them.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Having an additional variable (ptr) instead of changing line(1|2) and
count(1|2) was for debugging purposes.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do away with reduce_common_start_end() and use xdf->dstart and xdf->dend
set by xdl_trim_ends() that similarly tells us where the first unmatched
line from the start and end occurs.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously we were over-complicating matters by trying to combine the
recursed results. Now, terminate immediately if a recursive call failed
and return its result.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Port JGit's HistogramDiff algorithm over to C. Rough numbers (TODO) show
that it is faster than its --patience cousin, as well as the default
Meyers algorithm.
The implementation has been reworked to use structs and pointers,
instead of bitmasks, thus doing away with JGit's 2^28 line limit.
We also use xdiff's default hash table implementation (xdl_hash_bits()
with XDL_HASHLONG()) for convenience.
Signed-off-by: Tay Ray Chuan <rctay89@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>