builtin: patch-id: add --verbatim as a command mode

There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Jerry Zhang 2022-10-24 20:07:43 +00:00 committed by Junio C Hamano
parent 93105aba6c
commit 2871f4d447
3 changed files with 121 additions and 36 deletions

View File

@ -8,18 +8,18 @@ git-patch-id - Compute unique ID for a patch
SYNOPSIS SYNOPSIS
-------- --------
[verse] [verse]
'git patch-id' [--stable | --unstable] 'git patch-id' [--stable | --unstable | --verbatim]
DESCRIPTION DESCRIPTION
----------- -----------
Read a patch from the standard input and compute the patch ID for it. Read a patch from the standard input and compute the patch ID for it.
A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a A "patch ID" is nothing but a sum of SHA-1 of the file diffs associated with a
patch, with whitespace and line numbers ignored. As such, it's "reasonably patch, with line numbers ignored. As such, it's "reasonably stable", but at
stable", but at the same time also reasonably unique, i.e., two patches that the same time also reasonably unique, i.e., two patches that have the same
have the same "patch ID" are almost guaranteed to be the same thing. "patch ID" are almost guaranteed to be the same thing.
IOW, you can use this thing to look for likely duplicate commits. The main usecase for this command is to look for likely duplicate commits.
When dealing with 'git diff-tree' output, it takes advantage of When dealing with 'git diff-tree' output, it takes advantage of
the fact that the patch is prefixed with the object name of the the fact that the patch is prefixed with the object name of the
@ -30,6 +30,12 @@ This can be used to make a mapping from patch ID to commit ID.
OPTIONS OPTIONS
------- -------
--verbatim::
Calculate the patch-id of the input as it is given, do not strip
any whitespace.
This is the default if patchid.verbatim is true.
--stable:: --stable::
Use a "stable" sum of hashes as the patch ID. With this option: Use a "stable" sum of hashes as the patch ID. With this option:
- Reordering file diffs that make up a patch does not affect the ID. - Reordering file diffs that make up a patch does not affect the ID.
@ -45,14 +51,16 @@ OPTIONS
of "-O<orderfile>", thereby making existing databases storing such of "-O<orderfile>", thereby making existing databases storing such
"unstable" or historical patch-ids unusable. "unstable" or historical patch-ids unusable.
- All whitespace within the patch is ignored and does not affect the id.
This is the default if patchid.stable is set to true. This is the default if patchid.stable is set to true.
--unstable:: --unstable::
Use an "unstable" hash as the patch ID. With this option, Use an "unstable" hash as the patch ID. With this option,
the result produced is compatible with the patch-id value produced the result produced is compatible with the patch-id value produced
by git 1.9 and older. Users with pre-existing databases storing by git 1.9 and older and whitespace is ignored. Users with pre-existing
patch-ids produced by git 1.9 and older (who do not deal with reordered databases storing patch-ids produced by git 1.9 and older (who do not deal
patches) may want to use this option. with reordered patches) may want to use this option.
This is the default. This is the default.

View File

@ -2,6 +2,7 @@
#include "builtin.h" #include "builtin.h"
#include "config.h" #include "config.h"
#include "diff.h" #include "diff.h"
#include "parse-options.h"
static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result) static void flush_current_id(int patchlen, struct object_id *id, struct object_id *result)
{ {
@ -57,7 +58,7 @@ static int scan_hunk_header(const char *p, int *p_before, int *p_after)
} }
static int get_one_patchid(struct object_id *next_oid, struct object_id *result, static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
struct strbuf *line_buf, int stable) struct strbuf *line_buf, int stable, int verbatim)
{ {
int patchlen = 0, found_next = 0; int patchlen = 0, found_next = 0;
int before = -1, after = -1; int before = -1, after = -1;
@ -76,8 +77,11 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
if (!skip_prefix(line, "diff-tree ", &p) && if (!skip_prefix(line, "diff-tree ", &p) &&
!skip_prefix(line, "commit ", &p) && !skip_prefix(line, "commit ", &p) &&
!skip_prefix(line, "From ", &p) && !skip_prefix(line, "From ", &p) &&
starts_with(line, "\\ ") && 12 < strlen(line)) starts_with(line, "\\ ") && 12 < strlen(line)) {
if (verbatim)
the_hash_algo->update_fn(&ctx, line, strlen(line));
continue; continue;
}
if (!get_oid_hex(p, next_oid)) { if (!get_oid_hex(p, next_oid)) {
found_next = 1; found_next = 1;
@ -152,8 +156,8 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
if (line[0] == '+' || line[0] == ' ') if (line[0] == '+' || line[0] == ' ')
after--; after--;
/* Compute the sha without whitespace */ /* Add line to hash algo (possibly removing whitespace) */
len = remove_space(line); len = verbatim ? strlen(line) : remove_space(line);
patchlen += len; patchlen += len;
the_hash_algo->update_fn(&ctx, line, len); the_hash_algo->update_fn(&ctx, line, len);
} }
@ -166,7 +170,7 @@ static int get_one_patchid(struct object_id *next_oid, struct object_id *result,
return patchlen; return patchlen;
} }
static void generate_id_list(int stable) static void generate_id_list(int stable, int verbatim)
{ {
struct object_id oid, n, result; struct object_id oid, n, result;
int patchlen; int patchlen;
@ -174,21 +178,32 @@ static void generate_id_list(int stable)
oidclr(&oid); oidclr(&oid);
while (!feof(stdin)) { while (!feof(stdin)) {
patchlen = get_one_patchid(&n, &result, &line_buf, stable); patchlen = get_one_patchid(&n, &result, &line_buf, stable, verbatim);
flush_current_id(patchlen, &oid, &result); flush_current_id(patchlen, &oid, &result);
oidcpy(&oid, &n); oidcpy(&oid, &n);
} }
strbuf_release(&line_buf); strbuf_release(&line_buf);
} }
static const char patch_id_usage[] = "git patch-id [--stable | --unstable]"; static const char *const patch_id_usage[] = {
N_("git patch-id [--stable | --unstable | --verbatim]"), NULL
};
struct patch_id_opts {
int stable;
int verbatim;
};
static int git_patch_id_config(const char *var, const char *value, void *cb) static int git_patch_id_config(const char *var, const char *value, void *cb)
{ {
int *stable = cb; struct patch_id_opts *opts = cb;
if (!strcmp(var, "patchid.stable")) { if (!strcmp(var, "patchid.stable")) {
*stable = git_config_bool(var, value); opts->stable = git_config_bool(var, value);
return 0;
}
if (!strcmp(var, "patchid.verbatim")) {
opts->verbatim = git_config_bool(var, value);
return 0; return 0;
} }
@ -197,21 +212,29 @@ static int git_patch_id_config(const char *var, const char *value, void *cb)
int cmd_patch_id(int argc, const char **argv, const char *prefix) int cmd_patch_id(int argc, const char **argv, const char *prefix)
{ {
int stable = -1; /* if nothing is set, default to unstable */
struct patch_id_opts config = {0, 0};
int opts = 0;
struct option builtin_patch_id_options[] = {
OPT_CMDMODE(0, "unstable", &opts,
N_("use the unstable patch-id algorithm"), 1),
OPT_CMDMODE(0, "stable", &opts,
N_("use the stable patch-id algorithm"), 2),
OPT_CMDMODE(0, "verbatim", &opts,
N_("don't strip whitespace from the patch"), 3),
OPT_END()
};
git_config(git_patch_id_config, &stable); git_config(git_patch_id_config, &config);
/* If nothing is set, default to unstable. */ /* verbatim implies stable */
if (stable < 0) if (config.verbatim)
stable = 0; config.stable = 1;
if (argc == 2 && !strcmp(argv[1], "--stable")) argc = parse_options(argc, argv, prefix, builtin_patch_id_options,
stable = 1; patch_id_usage, 0);
else if (argc == 2 && !strcmp(argv[1], "--unstable"))
stable = 0;
else if (argc != 1)
usage(patch_id_usage);
generate_id_list(stable); generate_id_list(opts ? opts > 1 : config.stable,
opts ? opts == 3 : config.verbatim);
return 0; return 0;
} }

View File

@ -8,13 +8,13 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
. ./test-lib.sh . ./test-lib.sh
test_expect_success 'setup' ' test_expect_success 'setup' '
as="a a a a a a a a" && # eight a str="ab cd ef gh ij kl mn op" &&
test_write_lines $as >foo && test_write_lines $str >foo &&
test_write_lines $as >bar && test_write_lines $str >bar &&
git add foo bar && git add foo bar &&
git commit -a -m initial && git commit -a -m initial &&
test_write_lines $as b >foo && test_write_lines $str b >foo &&
test_write_lines $as b >bar && test_write_lines $str b >bar &&
git commit -a -m first && git commit -a -m first &&
git checkout -b same main && git checkout -b same main &&
git commit --amend -m same-msg && git commit --amend -m same-msg &&
@ -22,8 +22,23 @@ test_expect_success 'setup' '
echo c >foo && echo c >foo &&
echo c >bar && echo c >bar &&
git commit --amend -a -m notsame-msg && git commit --amend -a -m notsame-msg &&
git checkout -b with_space main~ &&
cat >foo <<-\EOF &&
a b
c d
e f
g h
i j
k l
m n
op
EOF
cp foo bar &&
git add foo bar &&
git commit --amend -m "with spaces" &&
test_write_lines bar foo >bar-then-foo && test_write_lines bar foo >bar-then-foo &&
test_write_lines foo bar >foo-then-bar test_write_lines foo bar >foo-then-bar
' '
test_expect_success 'patch-id output is well-formed' ' test_expect_success 'patch-id output is well-formed' '
@ -128,9 +143,21 @@ test_patch_id_file_order () {
git format-patch -1 --stdout -O foo-then-bar >format-patch.output && git format-patch -1 --stdout -O foo-then-bar >format-patch.output &&
calc_patch_id <format-patch.output "ordered-$name" "$@" && calc_patch_id <format-patch.output "ordered-$name" "$@" &&
cmp_patch_id $relevant "$name" "ordered-$name" cmp_patch_id $relevant "$name" "ordered-$name"
} }
test_patch_id_whitespace () {
relevant="$1"
shift
name="ws-${1}-$relevant"
shift
get_top_diff "main~" >top-diff.output &&
calc_patch_id <top-diff.output "$name" "$@" &&
get_top_diff "with_space" >top-diff.output &&
calc_patch_id <top-diff.output "ws-$name" "$@" &&
cmp_patch_id $relevant "$name" "ws-$name"
}
# combined test for options: add more tests here to make them # combined test for options: add more tests here to make them
# run with all options # run with all options
test_patch_id () { test_patch_id () {
@ -146,6 +173,14 @@ test_expect_success 'file order is relevant with --unstable' '
test_patch_id_file_order relevant --unstable --unstable test_patch_id_file_order relevant --unstable --unstable
' '
test_expect_success 'whitespace is relevant with --verbatim' '
test_patch_id_whitespace relevant --verbatim --verbatim
'
test_expect_success 'whitespace is irrelevant without --verbatim' '
test_patch_id_whitespace irrelevant --stable --stable
'
#Now test various option combinations. #Now test various option combinations.
test_expect_success 'default is unstable' ' test_expect_success 'default is unstable' '
test_patch_id relevant default test_patch_id relevant default
@ -161,6 +196,17 @@ test_expect_success 'patchid.stable = false is unstable' '
test_patch_id relevant patchid.stable=false test_patch_id relevant patchid.stable=false
' '
test_expect_success 'patchid.verbatim = true is correct and stable' '
test_config patchid.verbatim true &&
test_patch_id_whitespace relevant patchid.verbatim=true &&
test_patch_id irrelevant patchid.verbatim=true
'
test_expect_success 'patchid.verbatim = false is unstable' '
test_config patchid.verbatim false &&
test_patch_id relevant patchid.verbatim=false
'
test_expect_success '--unstable overrides patchid.stable = true' ' test_expect_success '--unstable overrides patchid.stable = true' '
test_config patchid.stable true && test_config patchid.stable true &&
test_patch_id relevant patchid.stable=true--unstable --unstable test_patch_id relevant patchid.stable=true--unstable --unstable
@ -171,6 +217,11 @@ test_expect_success '--stable overrides patchid.stable = false' '
test_patch_id irrelevant patchid.stable=false--stable --stable test_patch_id irrelevant patchid.stable=false--stable --stable
' '
test_expect_success '--verbatim overrides patchid.stable = false' '
test_config patchid.stable false &&
test_patch_id_whitespace relevant stable=false--verbatim --verbatim
'
test_expect_success 'patch-id supports git-format-patch MIME output' ' test_expect_success 'patch-id supports git-format-patch MIME output' '
get_patch_id main && get_patch_id main &&
git checkout same && git checkout same &&
@ -225,7 +276,10 @@ test_expect_success 'patch-id handles no-nl-at-eof markers' '
EOF EOF
calc_patch_id nonl <nonl && calc_patch_id nonl <nonl &&
calc_patch_id withnl <withnl && calc_patch_id withnl <withnl &&
test_cmp patch-id_nonl patch-id_withnl test_cmp patch-id_nonl patch-id_withnl &&
calc_patch_id nonl-inc-ws --verbatim <nonl &&
calc_patch_id withnl-inc-ws --verbatim <withnl &&
! test_cmp patch-id_nonl-inc-ws patch-id_withnl-inc-ws
' '
test_expect_success 'patch-id handles diffs with one line of before/after' ' test_expect_success 'patch-id handles diffs with one line of before/after' '