Commit Graph

19 Commits

Author SHA1 Message Date
Jerry Zhang
2871f4d447 builtin: patch-id: add --verbatim as a command mode
There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:20 -07:00
Jerry Zhang
0df19eb9d9 builtin: patch-id: fix patch-id with binary diffs
"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:19 -07:00
Jerry Zhang
757e75c81e patch-id: fix scan_hunk_header on diffs with 1 line of before/after
Normally diffs will contain a hunk header of the format
"@@ -2,2 +2,15 @@ code". However when there is only 1 line of
change, the unified diff format allows for the second comma
separated value to be omitted in either before or after
line counts.

This can produce hunk headers that look like
"@@ -2 +2,18 @@ code" or "@@ -2,2 +2 @@ code".
As a result, scan_hunk_header mistakenly returns the line
number as line count, which then results in unpredictable
parsing errors with the rest of the patch, including giving
multiple lines of output for a single commit.

Fix by explicitly setting line count to 1 when there is
no comma, and add a test.

apply.c contains this same logic except it is correct. A
worthwhile future project might be to unify these two diff
parsers so they both benefit from fixes.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 11:24:23 -08:00
Jerry Zhang
56fa5ac39a patch-id: fix antipatterns in tests
Clean up the tests for patch-id by moving file preparation
tasks inside the test body and redirecting files directly into
stdin instead of using 'cat'.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 11:24:15 -08:00
Junio C Hamano
6b2edaa378 t4204 is not sanitizer clean at all
Earlier we marked that this patch-id test is leak-sanitizer clean,
but if we read the test script carefully, it is merely because we
have too many invocations of commands in the "git log" family on the
upstream side of the pipe, hiding breakages from them.

Split the pipeline so that breakages from these commands can be
caught (not limited to aborts due to leak-sanitizer) and unmark
the script as not passing the test with leak-sanitizer in effect.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-16 15:18:31 -08:00
Ævar Arnfjörð Bjarmason
9081a421a6 checkout: fix "branch info" memory leaks
The "checkout" command is one of the main sources of leaks in the test
suite, let's fix the common ones by not leaking from the "struct
branch_info".

Doing this is rather straightforward, albeit verbose, we need to
xstrdup() constant strings going into the struct, and free() the ones
we clobber as we go along.

This also means that we can delete previous partial leak fixes in this
area, i.e. the "path_to_free" accounting added by 96ec7b1e70 (Convert
resolve_ref+xstrdup to new resolve_refdup function, 2011-12-13).

There was some discussion about whether "we should retain the "const
char *" here and cast at free() time, or have it be a "char *". Since
this is not a public API with any sort of API boundary let's use
"char *", as is already being done for the "refname" member of the
same struct.

The tests to mark as passing were found with:

    rm .prove; GIT_SKIP_TESTS=t0027 prove -j8 --state=save t[0-9]*.sh :: --immediate
    # apply & compile this change
    prove -j8 --state=failed :: --immediate

I.e. the ones that were newly passing when the --state=failed command
was run. I left out "t3040-subprojects-basic.sh" and
"t4131-apply-fake-ancestor.sh" to to optimization-level related
differences similar to the ones noted in[1], except that these would
be something the current 'linux-leaks' job would run into.

1. https://lore.kernel.org/git/cover-v3-0.6-00000000000-20211022T175227Z-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-11-18 14:32:26 -08:00
Johannes Schindelin
8f37854b18 t4*: adjust the references to the default branch name "main"
Carefully excluding t4013 and t4015, which see independent development
elsewhere at the time of writing, we use `main` as the default branch
name in t4*. This trick was performed via

	$ (cd t &&
	   sed -i -e 's/master/main/g' -e 's/MASTER/MAIN/g' \
		-e 's/Master/Main/g' -- t4*.sh t4211/*.export &&
	   git checkout HEAD -- t4013\*)

This allows us to define `GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main`
for those tests.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 15:44:18 -08:00
Johannes Schindelin
334afbc76f tests: mark tests relying on the current default for init.defaultBranch
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.

To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in

- all test scripts that contain the keyword `master`,

- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
  initialize the default branch,

- t5560 because it sources `t/t556x_common` which uses `master`,

- t8002 and t8012 because both source `t/annotate-tests.sh` which also
  uses `master`)

This trick was performed by this command:

	$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' $(git grep -l master t/t[0-9]*.sh) \
	t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh

After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:

	$ git checkout HEAD -- \
		t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
		t/t1011-read-tree-sparse-checkout.sh \
		t/t1305-config-include.sh t/t1309-early-config.sh \
		t/t1402-check-ref-format.sh t/t1450-fsck.sh \
		t/t2024-checkout-dwim.sh \
		t/t2106-update-index-assume-unchanged.sh \
		t/t3040-subprojects-basic.sh t/t3301-notes.sh \
		t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
		t/t3436-rebase-more-options.sh \
		t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
		t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
		t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
		t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
		t/t5548-push-porcelain.sh \
		t/t5552-skipping-fetch-negotiator.sh \
		t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
		t/t5614-clone-submodules-shallow.sh \
		t/t7508-status.sh t/t7606-merge-custom.sh \
		t/t9302-fast-import-unpack-limit.sh

We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:

	$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' t/t980[0167]*.sh t/t9811*.sh

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 15:44:17 -08:00
brian m. carlson
7a1bcb251b t4204: make hash size independent
Use $OID_REGEX instead of a hard-coded regular expression.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-15 14:06:19 -08:00
Junio C Hamano
d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Jeff King
4a73aaaf18 patch-id: use RUN_SETUP_GENTLY
Patch-id does not require a repository because it is just
processing the incoming diff on stdin, but it may look at
git config for keys like patchid.stable.

Even though we do not setup_git_directory(), this works from
the top-level of a repository because we blindly look at
".git/config" in this case. But as the included test
demonstrates, it does not work from a subdirectory.

We can fix it by using RUN_SETUP_GENTLY. We do not take any
filenames from the user on the command line, so there's no
need to adjust them via prefix_filename().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano
5c63920190 t4204: do not let $name variable clobbered
test_patch_id_file_order shell function uses $name variable to hold
one filename, and calls another shell function calc_patch_id as a
downstream of one pipeline.  The called function, however, also uses
the same $name variable.  With a shell implementation that runs the
callee in the current shell environment, the caller's $name would
be clobbered by the callee's use of the same variable.

This hasn't been an issue with dash and bash.  ksh93 reveals the
breakage in the test script.

Fix it by using a distinct variable name in the callee.

Reported-by: Armin Kunaschik <megabreit@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-24 15:49:02 -07:00
Junio C Hamano
79e539404c Merge branch 'mt/patch-id-stable' (early part)
* 'mt/patch-id-stable' (early part):
  patch-id-test: test stable and unstable behaviour
  patch-id: make it stable against hunk reordering
  test doc: test_write_lines does not split its arguments
  test: add test_write_lines helper
2014-06-16 10:05:38 -07:00
Michael S. Tsirkin
8f2514e95f patch-id-test: test stable and unstable behaviour
Verify that patch ID supports an algorithm
that is stable against diff split and reordering.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 13:09:39 -07:00
Elia Pinto
ce21ccfae0 t4204-patch-id.sh: use the $( ... ) construct for command substitution
The Git CodingGuidelines prefer the $(...) construct for command
substitution instead of using the backquotes `...`.

The backquoted form is the traditional method for command
substitution, and is supported by POSIX.  However, all but the
simplest uses become complicated quickly.  In particular, embedded
command substitutions and/or the use of double quotes require
careful escaping with the backslash character.

The patch was generated by:

for _f in $(find . -name "*.sh")
do
   sed -i 's@`\(.*\)`@$(\1)@g' ${_f}
done

and then carefully proof-read.

Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-30 11:08:09 -07:00
Michael J Gruber
2485eab55c git-patch-id: do not trip over "no newline" markers
Currently, patch-id trips over our very own diff extension for marking
the absence of newline at EOF.

Fix it. (Ignore it, it's whitespace.)

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-17 11:56:50 -08:00
Michael J Gruber
f2b5e7af10 git-patch-id: test for "no newline" markers
Currently, patch-id trips over our very own output that marks the absence
of newline at EOF.

Expose this in a test.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-17 11:54:57 -08:00
Paolo Bonzini
580fb25b7a patch-id: Add support for mbox format
I have an alias that takes two arguments and compares their patch IDs.
I would like to use to make sure I've tested exactly what I submit
(patch by patch), like

   git patch-cmp origin/master.. file-being-sent

However, I cannot do that because git patch-id is fooled by the "-- "
trailer that git format-patch puts, or likely by the MIME boundary.

This patch adds hunk parsing logic to git patch-id in order to detect an
out of place "-" line and split the patch when it comes.  In addition,
commit ids in the "From " lines are considered and printed in the output.

Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-19 13:01:49 -07:00
Johannes Sixt
7f733de04e test-suite: Make test script numbers unique
In order to selectively skip tests, the environment variable GIT_SKIP_TESTS
can be set like this:

  $ GIT_SKIP_TESTS='t1301 t4150.18' make test

That is, its value can contain only the test script numbers, but not the
full script name. Therefore, it is important that the test scripts are
uniquely numbered. This makes it so.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-14 12:44:20 -07:00