Commit Graph

20210 Commits

Author SHA1 Message Date
Ævar Arnfjörð Bjarmason
e84a26e32f unpack-file: fix ancient leak in create_temp_file()
Fix a leak that's been with us since 3407bb4940 (Add "unpack-file"
helper that unpacks a sha1 blob into a tmpfile., 2005-04-18). See
00c8fd493a (cat-file: use streaming API to print blobs, 2012-03-07)
for prior art which shows the same API pattern, i.e. free()-ing the
result of read_object_file() after it's used.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
b6046abc0c built-ins & libs & helpers: add/move destructors, fix leaks
Fix various leaks in built-ins, libraries and a test helper here we
were missing a call to strbuf_release(), string_list_clear() etc, or
were calling them after a potential "return".

Comments on individual changes:

- builtin/checkout.c: Fix a memory leak that was introduced in [1]. A
  sibling leak introduced in [2] was recently fixed in [3]. As with [3]
  we should be using the wt_status_state_free_buffers() API introduced
  in [4].

- builtin/repack.c: Fix a leak that's been here since this use of
  "strbuf_release()" was added in a1bbc6c017 (repack: rewrite the shell
  script in C, 2013-09-15). We don't use the variable for anything
  except this loop, so we can instead free it right afterwards.

- builtin/rev-parse: Fix a leak that's been here since this code was
  added in 21d4783538 (Add a parseopt mode to git-rev-parse to bring
  parse-options to shell scripts., 2007-11-04).

- builtin/stash.c: Fix a couple of leaks that have been here since
  this code was added in d4788af875 (stash: convert create to builtin,
  2019-02-25), we strbuf_release()'d only some of the "struct strbuf" we
  allocated earlier in the function, let's release all of them.

- ref-filter.c: Fix a leak in 482c119186 (gpg-interface: improve
  interface for parsing tags, 2021-02-11), we don't use the "payload"
  variable that we ask parse_signature() to populate for us, so let's
  free it.

- t/helper/test-fake-ssh.c: Fix a leak that's been here since this
  code was added in 3064d5a38c (mingw: fix t5601-clone.sh,
  2016-01-27). Let's free the "struct strbuf" as soon as we don't need
  it anymore.

1. c45f0f525d (switch: reject if some operation is in progress,
   2019-03-29)
2. 2708ce62d2 (branch: sort detached HEAD based on a flag,
   2021-01-07)
3. abcac2e19f (ref-filter.c: fix a leak in get_head_description,
   2022-09-25)
4. 962dd7ebc3 (wt-status: introduce wt_status_state_free_buffers(),
   2020-09-27).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
b5fcb1c006 read-cache.c: clear and free "sparse_checkout_patterns"
The "sparse_checkout_patterns" member was added to the "struct
index_state" in 836e25c51b (sparse-checkout: hold pattern list in
index, 2021-03-30), but wasn't added to discard_index(). Let's do
that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
03267e8656 commit: discard partial cache before (re-)reading it
The read_cache() in prepare_to_commit() would end up clobbering the
pointer we had for a previously populated "the_index.cache_tree" in
the very common case of "git commit" stressed by e.g. the tests being
changed here.

We'd populate "the_index.cache_tree" by calling
"update_main_cache_tree" in prepare_index(), but would not end up with
a "fully prepared" index. What constitutes an existing index is
clearly overly fuzzy, here we'll check "active_nr" (aka
"the_index.cache_nr"), but our "the_index.cache_tree" might have been
malloc()'d already.

Thus the code added in 11c8a74a64 (commit: write cache-tree data when
writing index anyway, 2011-12-06) would end up allocating the
"cache_tree", and would interact here with code added in
7168624c35 (Do not generate full commit log message if it is not
going to be used, 2007-11-28). The result was a very common memory
leak.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Ævar Arnfjörð Bjarmason
e5e37517dd tests: mark tests as passing with SANITIZE=leak
This marks tests that have been leak-free since various recent
commits, but which were not marked us such when the memory leak was
fixed. These were mostly discovered with the "check" mode added in
faececa53f (test-lib: have the "check" mode for SANITIZE=leak
consider leak logs, 2022-07-28).

Commits that fixed the last memory leak in these tests. Per narrowing
down when they started to pass under SANITIZE=leak with "bisect":

- t1022-read-tree-partial-clone.sh:
  7e2619d8ff (list_objects_filter_options: plug leak of filter_spec
  strings, 2022-09-08)

- t4053-diff-no-index.sh: 07a6f94a6d (diff-no-index: release prefixed
  filenames, 2022-09-07)

- t6415-merge-dir-to-symlink.sh: bac92b1f39 (Merge branch
  'js/ort-clean-up-after-failed-merge', 2022-08-08).

- t5554-noop-fetch-negotiator.sh:
  66eede4a37 (prepare_repo_settings(): plug leak of config values,
  2022-09-08)

- t2012-checkout-last.sh, t7504-commit-msg-hook.sh,
  t91{15,46,60}-git-svn-*.sh: The in-flight "pw/rebase-no-reflog-action"
  series, upon which this is based:
  https://lore.kernel.org/git/pull.1405.git.1667575142.gitgitgadget@gmail.com/

Let's mark all of these as passing with
"TEST_PASSES_SANITIZE_LEAK=true", to have it regression tested,
including as part of the "linux-leaks" CI job.

Additionally, let's remove the "!SANITIZE_LEAK" prerequisite from
tests that now pass, these were marked as failing in:

- 77e56d55ba (diff.c: fix a double-free regression in a18d66cefb,
  2022-03-17)
- c4d1d52631 (tests: change some 'test $(git) = "x"' to test_cmp,
  2022-03-07)

These were not spotted with the new "check" mode, but manually, it
doesn't cover these sort of prerequisites. There's few enough that we
shouldn't bother to automate it. They'll be going away sooner than
later.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-21 12:32:48 +09:00
Johannes Schindelin
db8016b43f t5516/t5601: be less strict about the number of credential warnings
It is unclear as to _why_, but under certain circumstances the warning
about credentials being passed as part of the URL seems to be swallowed
by the `git remote-https` helper in the Windows jobs of Git's CI builds.

Since it is not actually important how many times Git prints the
warning/error message, as long as it prints it at least once, let's just
make the test a bit more lenient and test for the latter instead of the
former, which works around these CI issues.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 16:35:05 -04:00
Jeff King
762521e8a5 t5516: move plaintext-password tests from t5601 and t5516
Commit 6dcbdc0d66 (remote: create fetch.credentialsInUrl config,
2022-06-06) added tests for our handling of passwords in URLs. Since the
obvious URL to be affected is git-over-http, the tests use http. However
they don't set up a test server; they just try to access
https://localhost, assuming it will fail (because the nothing is
listening there).

This causes some possible problems:

  - There might be a web server running on localhost, and we do not
    actually want to connect to that.

  - The DNS resolver, or the local firewall, might take a substantial
    amount of time (or forever, whichever comes first) to fail to
    connect, slowing down the tests cases unnecessarily.

  - Since there's no server, our tests for "allow" and "warn" still
    expect the clone/fetch/push operations to fail, even though in the
    real world we'd expect these to succeed. We scrape stderr to see
    what happened, but it's not as robust as a more realistic test.

Let's instead move these to t5551, which is all about testing http and
where we have a real server. That eliminates any issues with contacting
a strange URL, and lets the "allow" and "warn" tests confirm that the
operation actually succeeds.

It's not quite a verbatim move for a few reasons:

  - we can drop the LIBCURL dependency; it's already part of
    lib-httpd.sh

  - we'll use HTTPD_URL_USER_PASS, etc, instead of our fake URL. To
    avoid repetition, we'll add a few extra variables.

  - the "https://username:@localhost" test uses a funny URL that
    lib-httpd.sh doesn't provide. We'll similarly construct it in a
    variable. Note that we're hard-coding the lib-httpd username here,
    but t5551 already does that everywhere.

  - for the "domain:port" test, the URL provided by lib-httpd is fine,
    since our test server will always be on an exotic port. But we'll
    confirm in the test that this is so.

  - since our message-matching is done via grep, I simplified it to use
    a regex, rather than trying to massage lib-httpd's variables.
    Arguably this makes it more readable, too, while retaining the bits
    we care about: the fatal/warning distinction, the "uses plaintext"
    message, and the fact that the password was redacted.

  - we'll use the /auth/ path for the repo, which shows that we are
    indeed making use of the auth information when needed.

  - we'll also use /smart/; most of these tests could be done via /dumb/
    in t5550, but setting up pushes there requires extra effort and
    dependencies. The smart protocol is what most everyone is using
    these days anyway.

This patch is my own, but I stole the analysis and a few bits of the
commit message from a patch by Johannes Schindelin.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-01 16:35:05 -04:00
Taylor Blau
b1e3dd68ee Merge branch 'en/merge-tree-sequence'
"git merge-tree --stdin" is a new way to request a series of merges
and report the merge results.

* en/merge-tree-sequence:
  merge-tree: support multiple batched merges with --stdin
  merge-tree: update documentation for differences in -z output
2022-10-30 21:04:44 -04:00
Taylor Blau
d32dd8add5 Merge branch 'ds/bundle-uri-3'
Define the logical elements of a "bundle list", data structure to
store them in-core, format to transfer them, and code to parse
them.

* ds/bundle-uri-3:
  bundle-uri: suppress stderr from remote-https
  bundle-uri: quiet failed unbundlings
  bundle: add flags to verify_bundle()
  bundle-uri: fetch a list of bundles
  bundle: properly clear all revision flags
  bundle-uri: limit recursion depth for bundle lists
  bundle-uri: parse bundle list in config format
  bundle-uri: unit test "key=value" parsing
  bundle-uri: create "key=value" line parsing
  bundle-uri: create base key-value pair parsing
  bundle-uri: create bundle_list struct and helpers
  bundle-uri: use plain string in find_temp_filename()
2022-10-30 21:04:44 -04:00
Taylor Blau
bf0d9d0d34 Merge branch 'rj/branch-do-not-exit-with-minus-one-status'
"git branch --edit-description" can exit with status -1 which is
not a good practice; it learned to use 1 as everybody else instead.

* rj/branch-do-not-exit-with-minus-one-status:
  branch: error code with --edit-description
2022-10-30 21:04:43 -04:00
Taylor Blau
0c025612d4 Merge branch 'rj/branch-copy-rename-error-codepath-cleanup'
Code simplification.

* rj/branch-copy-rename-error-codepath-cleanup:
  branch: error copying or renaming a detached HEAD
2022-10-30 21:04:43 -04:00
Taylor Blau
c41ec63ef5 Merge branch 'tb/cap-patch-at-1gb'
"git apply" limits its input to a bit less than 1 GiB.

* tb/cap-patch-at-1gb:
  apply: reject patches larger than ~1 GiB
2022-10-30 21:04:43 -04:00
Taylor Blau
969230b64f Merge branch 'en/ort-dir-rename-and-symlink-fix'
Merging a branch with directory renames into a branch that changes
the directory to a symlink was mishandled by the ort merge
strategy, which has been corrected.

* en/ort-dir-rename-and-symlink-fix:
  merge-ort: fix bug with dir rename vs change dir to symlink
2022-10-30 21:04:43 -04:00
Taylor Blau
a23e0b69e2 Merge branch 'pb/subtree-split-and-merge-after-squashing-tag-fix'
A bugfix to "git subtree" in its split and merge features.

* pb/subtree-split-and-merge-after-squashing-tag-fix:
  subtree: fix split after annotated tag was squashed merged
  subtree: fix squash merging after annotated tag was squashed merged
  subtree: process 'git-subtree-split' trailer in separate function
  subtree: use named variables instead of "$@" in cmd_pull
  subtree: define a variable before its first use in 'find_latest_squash'
  subtree: prefix die messages with 'fatal'
  subtree: add 'die_incompatible_opt' function to reduce duplication
  subtree: use 'git rev-parse --verify [--quiet]' for better error messages
  test-lib-functions: mark 'test_commit' variables as 'local'
2022-10-30 21:04:43 -04:00
Taylor Blau
8851c4b065 Merge branch 'pw/rebase-reflog-fixes'
Fix some bugs in the reflog messages when rebasing and changes the
reflog messages of "rebase --apply" to match "rebase --merge" with
the aim of making the reflog easier to parse.

* pw/rebase-reflog-fixes:
  rebase: cleanup action handling
  rebase --abort: improve reflog message
  rebase --apply: make reflog messages match rebase --merge
  rebase --apply: respect GIT_REFLOG_ACTION
  rebase --merge: fix reflog message after skipping
  rebase --merge: fix reflog when continuing
  t3406: rework rebase reflog tests
  rebase --apply: remove duplicated code
2022-10-30 21:04:43 -04:00
Taylor Blau
003f815dd9 Merge branch 'pw/rebase-keep-base-fixes'
"git rebase --keep-base" used to discard the commits that are
already cherry-picked to the upstream, even when "keep-base" meant
that the base, on top of which the history is being rebuilt, does
not yet include these cherry-picked commits.  The --keep-base
option now implies --reapply-cherry-picks and --no-fork-point
options.

* pw/rebase-keep-base-fixes:
  rebase --keep-base: imply --no-fork-point
  rebase --keep-base: imply --reapply-cherry-picks
  rebase: factor out branch_base calculation
  rebase: rename merge_base to branch_base
  rebase: store orig_head as a commit
  rebase: be stricter when reading state files containing oids
  t3416: set $EDITOR in subshell
  t3416: tighten two tests
2022-10-30 21:04:42 -04:00
Taylor Blau
e5be3c632a Merge branch 'jh/trace2-timers-and-counters'
Two new facilities, "timer" and "counter", are introduced to the
trace2 API.

* jh/trace2-timers-and-counters:
  trace2: add global counter mechanism
  trace2: add stopwatch timers
  trace2: convert ctx.thread_name from strbuf to pointer
  trace2: improve thread-name documentation in the thread-context
  trace2: rename the thread_name argument to trace2_thread_start
  api-trace2.txt: elminate section describing the public trace2 API
  tr2tls: clarify TLS terminology
  trace2: use size_t alloc,nr_open_regions in tr2tls_thread_ctx
2022-10-30 21:04:42 -04:00
Taylor Blau
c112d8d9c2 Merge branch 'tb/shortlog-group'
"git shortlog" learned to group by the "format" string.

* tb/shortlog-group:
  shortlog: implement `--group=committer` in terms of `--group=<format>`
  shortlog: implement `--group=author` in terms of `--group=<format>`
  shortlog: extract `shortlog_finish_setup()`
  shortlog: support arbitrary commit format `--group`s
  shortlog: extract `--group` fragment for translation
  shortlog: make trailer insertion a noop when appropriate
  shortlog: accept `--date`-related options
2022-10-30 21:04:42 -04:00
Taylor Blau
c88895e67b Merge branch 'jk/repack-tempfile-cleanup'
The way "git repack" creared temporary files when it received a
signal was prone to deadlocking, which has been corrected.

* jk/repack-tempfile-cleanup:
  t7700: annotate cruft-pack failure with ok=sigpipe
  repack: drop remove_temporary_files()
  repack: use tempfiles for signal cleanup
  repack: expand error message for missing pack files
  repack: populate extension bits incrementally
  repack: convert "names" util bitfield to array
2022-10-30 21:04:42 -04:00
Taylor Blau
160314e625 Merge branch 'jz/patch-id'
A new "--include-whitespace" option is added to "git patch-id", and
existing bugs in the internal patch-id logic that did not match
what "git patch-id" produces have been corrected.

* jz/patch-id:
  builtin: patch-id: remove unused diff-tree prefix
  builtin: patch-id: add --verbatim as a command mode
  patch-id: fix patch-id for mode changes
  builtin: patch-id: fix patch-id with binary diffs
  patch-id: use stable patch-id for rebases
  patch-id: fix stable patch id for binary / header-only
2022-10-30 21:04:41 -04:00
Junio C Hamano
7d5a4d86a6 Merge branch 'tb/diffstat-with-utf8-strwidth'
"git diff --stat" etc. were invented back when everything was ASCII
and strlen() was a way to measure the display width of a string;
adjust them to compute the display width assuming UTF-8 pathnames.

* tb/diffstat-with-utf8-strwidth:
  diff: leave NEEDWORK notes in show_stats() function
  diff.c: use utf8_strwidth() to count display width
2022-10-28 11:26:55 -07:00
Junio C Hamano
330135ac81 Merge branch 'mm/git-pm-try-catch-syntax-fix'
Fix a longstanding syntax error in Git.pm error codepath.

* mm/git-pm-try-catch-syntax-fix:
  Git.pm: trust rev-parse to find bare repositories
  Git.pm: add semicolon after catch statement
2022-10-28 11:26:54 -07:00
Junio C Hamano
c5dd7773e1 Merge branch 'tb/remove-unused-pack-bitmap'
When creating a multi-pack bitmap, remove per-pack bitmap files
unconditionally as they will never be consulted.

* tb/remove-unused-pack-bitmap:
  builtin/repack.c: remove redundant pack-based bitmaps
2022-10-28 11:26:54 -07:00
Junio C Hamano
7b9b634ca5 Merge branch 'ab/doc-synopsis-and-cmd-usage'
The short-help text shown by "git cmd -h" and the synopsis text
shown at the beginning of "git help cmd" have been made more
consistent.

* ab/doc-synopsis-and-cmd-usage: (34 commits)
  tests: assert consistent whitespace in -h output
  tests: start asserting that *.txt SYNOPSIS matches -h output
  doc txt & -h consistency: make "worktree" consistent
  worktree: define subcommand -h in terms of command -h
  reflog doc: list real subcommands up-front
  doc txt & -h consistency: make "commit" consistent
  doc txt & -h consistency: make "diff-tree" consistent
  doc txt & -h consistency: use "[<label>...]" for "zero or more"
  doc txt & -h consistency: make "annotate" consistent
  doc txt & -h consistency: make "stash" consistent
  doc txt & -h consistency: add missing options
  doc txt & -h consistency: use "git foo" form, not "git-foo"
  doc txt & -h consistency: make "bundle" consistent
  doc txt & -h consistency: make "read-tree" consistent
  doc txt & -h consistency: make "rerere" consistent
  doc txt & -h consistency: add missing options and labels
  doc txt & -h consistency: make output order consistent
  doc txt & -h consistency: add or fix optional "--" syntax
  doc txt & -h consistency: fix mismatching labels
  doc SYNOPSIS & -h: use "-" to separate words in labels, not "_"
  ...
2022-10-28 11:26:54 -07:00
Junio C Hamano
246eedf2bc Merge branch 'js/cmake-updates'
Update to build procedure with VS using CMake/CTest.

* js/cmake-updates:
  cmake: increase time-out for a long-running test
  cmake: avoid editing t/test-lib.sh
  add -p: avoid ambiguous signed/unsigned comparison
  cmake: copy the merge tools for testing
  cmake: make it easier to diagnose regressions in CTest runs
2022-10-27 14:51:53 -07:00
Junio C Hamano
702bb4baea Merge branch 'nw/t1002-cleanup'
Code clean-up in test.

* nw/t1002-cleanup:
  t1002: modernize outdated conditional
2022-10-27 14:51:53 -07:00
Junio C Hamano
6ae1a6eaf2 Merge branch 'ab/run-hook-api-cleanup'
Move a global variable added as a hack during regression fixes to
its proper place in the API.

* ab/run-hook-api-cleanup:
  run-command.c: remove "max_processes", add "const" to signal() handler
  run-command.c: pass "opts" further down, and use "opts->processes"
  run-command.c: use "opts->processes", not "pp->max_processes"
  run-command.c: don't copy "data" to "struct parallel_processes"
  run-command.c: don't copy "ungroup" to "struct parallel_processes"
  run-command.c: don't copy *_fn to "struct parallel_processes"
  run-command.c: make "struct parallel_processes" const if possible
  run-command API: move *_tr2() users to "run_processes_parallel()"
  run-command API: have run_process_parallel() take an "opts" struct
  run-command.c: use designated init for pp_init(), add "const"
  run-command API: don't fall back on online_cpus()
  run-command API: make "n" parameter a "size_t"
  run-command tests: use "return", not "exit"
  run-command API: have "run_processes_parallel{,_tr2}()" return void
  run-command test helper: use "else if" pattern
2022-10-27 14:51:53 -07:00
Junio C Hamano
f62c546455 Merge branch 'tb/save-keep-pack-during-geometric-repack'
When geometric repacking feature is in use together with the
--pack-kept-objects option, we lost packs marked with .keep files.

* tb/save-keep-pack-during-geometric-repack:
  repack: don't remove .keep packs with `--pack-kept-objects`
2022-10-27 14:51:53 -07:00
Junio C Hamano
220604042c Merge branch 'jk/unused-anno-more'
More UNUSED annotation to help using -Wunused option with the
compiler.

* jk/unused-anno-more:
  ll-merge: mark unused parameters in callbacks
  diffcore-pickaxe: mark unused parameters in pickaxe functions
  convert: mark unused parameter in null stream filter
  apply: mark unused parameters in noop error/warning routine
  apply: mark unused parameters in handlers
  date: mark unused parameters in handler functions
  string-list: mark unused callback parameters
  object-file: mark unused parameters in hash_unknown functions
  mark unused parameters in trivial compat functions
  update-index: drop unused argc from do_reupdate()
  submodule--helper: drop unused argc from module_list_compute()
  diffstat_consume(): assert non-zero length
2022-10-27 14:51:52 -07:00
Junio C Hamano
99bb1a0bea Merge branch 'tb/midx-bitmap-selection-fix'
A bugfix with tracing support in midx codepath

* tb/midx-bitmap-selection-fix:
  pack-bitmap-write.c: instrument number of reused bitmaps
  midx.c: instrument MIDX and bitmap generation with trace2 regions
  midx.c: consider annotated tags during bitmap selection
  midx.c: fix whitespace typo
2022-10-27 14:51:52 -07:00
Rubén Justo
8f24115165 branch: error code with --edit-description
Since c2d17ba3db (branch --edit-description: protect against mistyped
branch name, 2012-02-05) we return -1 on error editing the branch
description.

Let's change to 1, which follows the established convention and it is
better for portability reasons.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26 10:52:37 -07:00
Rubén Justo
77e7267e47 branch: error copying or renaming a detached HEAD
In c847f53712 (Detached HEAD (experimental), 2007-01-01) an error
condition was introduced in rename_branch() to prevent renaming, later
also copying, a detached HEAD.

The condition used was checking for NULL in oldname, the source branch
to rename/copy.  That condition cannot be satisfied because if no source
branch is specified, HEAD is going to be used in the call.

The error issued instead is:

	fatal: Invalid branch name: 'HEAD'

Let's remove the condition in copy_or_rename_branch() (the current
function name) and check for HEAD before calling it, dying with the
original intended error if we're in a detached HEAD.

Signed-off-by: Rubén Justo <rjusto@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-26 10:52:24 -07:00
Junio C Hamano
777f548b5a Merge branch 'gc/bare-repo-discovery'
Allow configuration files in "protected" scopes to include other
configuration files.

* gc/bare-repo-discovery:
  config: respect includes in protected config
2022-10-25 17:11:44 -07:00
Junio C Hamano
b988427918 Merge branch 'rs/diff-caret-bang-with-parents'
"git diff rev^!" did not show combined diff to go to the rev from
its parents.

* rs/diff-caret-bang-with-parents:
  diff: support ^! for merges
  revisions.txt: unspecify order of resolved parts of ^!
  revision: use strtol_i() for exclude_parent
2022-10-25 17:11:43 -07:00
Taylor Blau
f1c0e3946e apply: reject patches larger than ~1 GiB
The apply code is not prepared to handle extremely large files. It uses
"int" in some places, and "unsigned long" in others.

This combination leads to unfortunate problems when switching between
the two types. Using "int" prevents us from handling large files, since
large offsets will wrap around and spill into small negative values,
which can result in wrong behavior (like accessing the patch buffer with
a negative offset).

Converting from "unsigned long" to "int" also has truncation problems
even on LLP64 platforms where "long" is the same size as "int", since
the former is unsigned but the latter is not.

To avoid potential overflow and truncation issues in `git apply`, apply
similar treatment as in dcd1742e56 (xdiff: reject files larger than
~1GB, 2015-09-24), where the xdiff code was taught to reject large
files for similar reasons.

The maximum size was chosen somewhat arbitrarily, but picking a value
just shy of a gigabyte allows us to double it without overflowing 2^31-1
(after which point our value would wrap around to a negative number).
To give ourselves a bit of extra margin, the maximum patch size is a MiB
smaller than a full GiB, which gives us some slop in case we allocate
"(records + 1) * sizeof(int)" or similar.

Luckily, the security implications of these conversion issues are
relatively uninteresting, because a victim needs to be convinced to
apply a malicious patch.

Reported-by: 정재우 <thebound7@gmail.com>
Suggested-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-25 15:21:17 -07:00
Jerry Zhang
2871f4d447 builtin: patch-id: add --verbatim as a command mode
There are situations where the user might not want the default
setting where patch-id strips all whitespace. They might be working
in a language where white space is syntactically important, or they
might have CI testing that enforces strict whitespace linting. In
these cases, a whitespace change would result in the patch
fundamentally changing, and thus deserving of a different id.

Add a new mode that is exclusive of --stable and --unstable called
--verbatim. It also corresponds to the config
patchid.verbatim = true. In this mode, the stable algorithm is
used and whitespace is not stripped from the patch text.

Users of --unstable mainly care about compatibility with old git
versions, which unstripping the whitespace would break. Thus there
isn't a usecase for the combination of --verbatim and --unstable,
and we don't expose this so as to not add maintainence burden.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
fixes https://github.com/Skydio/revup/issues/2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:20 -07:00
Jerry Zhang
93105aba6c patch-id: fix patch-id for mode changes
Currently patch-id as used in rebase and cherry-pick does not account
for file modes if the file is modified. One consequence of this is
that if you have a local patch that changes modes, but upstream
has applied an outdated version of the patch that doesn't include
that mode change, "git rebase" will drop your local version of the
patch along with your mode changes. It also means that internal
patch-id doesn't produce the same output as the builtin, which does
account for mode changes due to them being part of diff output.

Fix by adding mode to the patch-id if it has changed, in the same
format that would be produced by diff, so that it is compatible
with builtin patch-id.

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:20 -07:00
Jerry Zhang
0df19eb9d9 builtin: patch-id: fix patch-id with binary diffs
"git patch-id" currently doesn't produce correct output if the
incoming diff has any binary files. Add logic to get_one_patchid
to handle the different possible styles of binary diff. This
attempts to keep resulting patch-ids identical to what would be
produced by the counterpart logic in diff.c, that is it produces
the id by hashing the a and b oids in succession.

In general we handle binary diffs by first caching the object ids from
the "index" line and using those if we then find an indication
that the diff is binary.

The input could contain patches generated with "git diff --binary". This
currently breaks the parse logic and results in multiple patch-ids
output for a single commit. Here we have to skip the contents of the
patch itself since those do not go into the patch id. --binary
implies --full-index so the object ids are always available.

When the diff is generated with --full-index there is no patch content
to skip over.

When a diff is generated without --full-index or --binary, it will
contain abbreviated object ids. This will still result in a sufficiently
unique patch-id when hashed, but does not match internal patch id
output. We'll call this ok for now as we already need specialized
arguments to diff in order to match internal patch id (namely -U3).

Signed-off-by: Jerry Zhang <Jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:19 -07:00
Jerry Zhang
0570be79ea patch-id: fix stable patch id for binary / header-only
Patch-ids for binary patches are found by hashing the object
ids of the before and after objects in succession. However in
the --stable case, there is a bug where hunks are not flushed
for binary and header-only patch ids, which would always result
in a patch-id of 0000. The --unstable case is currently correct.

Reorder the logic to branch into 3 cases for populating the
patch body: header-only which populates nothing, binary which
populates the object ids, and normal which populates the text
diff. All branches will end up flushing the hunk.

Don't populate the ---a/ and +++b/ lines for binary diffs, to correspond
to those lines not being present in the "git diff" text output.
This is necessary because we advertise that the patch-id calculated
internally and used in format-patch is the same that what the
builtin "git patch-id" would produce when piped from a diff.

Update the test to run on both binary and normal files.

Signed-off-by: Jerry Zhang <jerry@skydio.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 15:44:19 -07:00
Taylor Blau
3dc95e09e1 shortlog: support arbitrary commit format --groups
In addition to generating a shortlog based on committer, author, or the
identity in one or more specified trailers, it can be useful to generate
a shortlog based on an arbitrary commit format.

This can be used, for example, to generate a distribution of commit
activity over time, like so:

    $ git shortlog --group='%cd' --date='format:%Y-%m' -s v2.37.0..
       117  2022-06
       274  2022-07
       324  2022-08
       263  2022-09
         7  2022-10

Arbitrary commit formats can be used. In fact, `git shortlog`'s default
behavior (to count by commit authors) can be emulated as follows:

    $ git shortlog --group='%aN <%aE>' ...

and future patches will make the default behavior (as well as
`--committer`, and `--group=trailer:<trailer>`) special cases of the
more flexible `--group` option.

Note also that the SHORTLOG_GROUP_FORMAT enum value is used only to
designate that `--group:<format>` is in use when in stdin mode to
declare that the combination is invalid.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 14:48:05 -07:00
Jeff King
251554c269 shortlog: accept --date-related options
Prepare for a future patch which will introduce arbitrary pretty formats
via the `--group` argument.

To allow additional customizability (for example, to support something
like `git shortlog -s --group='%aD' --date='format:%Y-%m' ...` (which
groups commits by the datestring 'YYYY-mm' according to author date), we
must store off the `--date` parsed from calling `parse_revision_opt()`.

Note that this also affects custom output `--format` strings in `git
shortlog`. Though this is a behavior change, this is arguably fixing a
long-standing bug (ie., that `--format` strings are not affected by
`--date` specifiers as they should be).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 14:48:05 -07:00
Jeff Hostetler
81071626ba trace2: add global counter mechanism
Add global counters mechanism to Trace2.

The Trace2 counters mechanism adds the ability to create a set of
global counter variables and an API to increment them efficiently.
Counters can optionally report per-thread usage in addition to the sum
across all threads.

Counter events are emitted to the Trace2 logs when a thread exits and
at process exit.

Counters are an alternative to `data` and `data_json` events.

Counters are useful when you want to measure something across the life
of the process, when you don't want per-measurement events for
performance reasons, when the data does not fit conveniently within a
region, or when your control flow does not easily let you write the
final total.  For example, you might use this to report the number of
calls to unzip() or the number of de-delta steps during a checkout.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 12:45:26 -07:00
Jeff Hostetler
8ad575646c trace2: add stopwatch timers
Add stopwatch timer mechanism to Trace2.

Timers are an alternative to Trace2 Regions.  Regions are useful for
measuring the time spent in various computation phases, such as the
time to read the index, time to scan for unstaged files, time to scan
for untracked files, and etc.

However, regions are not appropriate in all places.  For example,
during a checkout, it would be very inefficient to use regions to
measure the total time spent inflating objects from the ODB from
across the entire lifetime of the process; a per-unzip() region would
flood the output and significantly slow the command; and some form of
post-processing would be requried to compute the time spent in unzip().

Timers can be used to measure a series of timer intervals and emit
a single summary event (at thread and/or process exit).

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-24 12:45:26 -07:00
Jeff King
9b3fadfd06 t7700: annotate cruft-pack failure with ok=sigpipe
One of our tests intentionally causes the cruft-pack generation phase of
repack to fail, in order to stimulate an exit from repack at the desired
moment. It does so by feeding a bogus option argument to pack-objects.
This is a simple and reliable way to get pack-objects to fail, but it
has one downside: pack-objects will die before reading its stdin, which
means the caller repack may racily get SIGPIPE writing to it.

For the purposes of this test, that's OK. We are checking whether repack
cleans up already-created .tmp files, and it will do so whether it exits
or dies by signal (because the tempfile API hooks both).

But we have to tell test_must_fail that either outcome is OK, or it
complains about the signal. Arguably this is a workaround (compared to
fixing repack), as repack dying to SIGPIPE means that it loses the
opportunity to give a more detailed message. But we don't actually write
such a message anyway; we rely on pack-objects to have written something
useful to stderr, and it does. In either case (signal or exit), that is
the main thing the user will see.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-23 11:08:45 -07:00
Elijah Newren
ec1edbcb56 merge-tree: support multiple batched merges with --stdin
Add an option, --stdin, to merge-tree which will accept lines of input
with two branches to merge per line, and which will perform all the
merges and give output for each in turn.  This option implies -z, and
modifies the output to also include a merge status since the exit code
of the program can no longer convey that information now that multiple
merges are involved.

This could be useful, for example, by Git hosting providers.  When one
branch is updated, one may want to check whether all code reviews
targetting that branch can still cleanly merge.  Avoiding the overhead
of starting up a separate process for each of those code reviews might
provide significant savings in a repository with many code reviews.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 22:21:26 -07:00
Jeff King
20da61f25f Git.pm: trust rev-parse to find bare repositories
When initializing a repository object, we run "git rev-parse --git-dir"
to let the C version of Git find the correct directory. But curiously,
if this fails we don't automatically say "not a git repository".
Instead, we do our own pure-perl check to see if we're in a bare
repository.

This makes little sense, as rev-parse will report both bare and non-bare
directories. This logic comes from d5c7721d58 (Git.pm: Add support for
subdirectories inside of working copies, 2006-06-24), but I don't see
any reason given why we can't just rely on rev-parse. Worse, because we
treat any non-error response from rev-parse as a non-bare repository,
we'll erroneously set the object's WorkingCopy, even in a bare
repository.

But it gets worse. Since 8959555cee (setup_git_directory(): add an owner
check for the top-level directory, 2022-03-02), it's actively wrong (and
dangerous). The perl code doesn't implement the same ownership checks.
And worse, after "finding" the bare repository, it sets GIT_DIR in the
environment, which tells any subsequent Git commands that we've
confirmed the directory is OK, and to trust us. I.e., it re-opens the
vulnerability plugged by 8959555cee when using Git.pm's repository
discovery code.

We can fix this by just relying on rev-parse to tell us when we're not
in a repository, which fixes the vulnerability. Furthermore, we'll ask
its --is-bare-repository function to tell us if we're bare or not, and
rely on that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 16:39:48 -07:00
Elijah Newren
2b86c10084 merge-ort: fix bug with dir rename vs change dir to symlink
When changing a directory to a symlink on one side of history, and
renaming the parent of that directory to a different directory name
on the other side, e.g. with this kind of setup:

    Base commit: Has a file named dir/subdir/file
    Side1:       Rename dir/ -> renamed-dir/
    Side2:       delete dir/subdir/file, add dir/subdir as symlink

Then merge-ort was running into an assertion failure:

    git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed

merge-recursive did not have as obvious an issue handling this case,
likely because we never fixed it to handle the case from commit
902c521a35 ("t6423: more involved directory rename test", 2020-10-15)
where we need to be careful about nested renames when a directory rename
occurs (dir/ -> renamed-dir/ implies dir/subdir/ ->
renamed-dir/subdir/).  However, merge-recursive does have multiple
problems with this testcase:

  * Incorrect stages for the file: merge-recursive omits the stage in
    the index corresponding to the base stage, making `git status`
    report "added by us" for renamed-dir/subdir/file instead of the
    expected "deleted by them".

  * Poor directory/file conflict handling: For the renamed-dir/subdir
    symlink, instead of reporting a file/directory conflict as
    expected, it reports "Error: Refusing to lose untracked file at
    renamed-dir/subdir".  This is a lie because there is no untracked
    file at that location.  It then does the normal suboptimal
    merge-recursive thing of having the symlink be tracked in the index
    at a location where it can't be written due to D/F conflicts
    (namely, renamed-dir/subdir), but writes it to the working tree at
    a different location as a new untracked file (namely,
    renamed-dir/subdir~B^0)

Technically, these problems don't prevent the user from resolving the
merge if they can figure out to ignore the confusion, but because both
pieces of output are quite confusing I don't want to modify the test
to claim the recursive also passes it even if it doesn't have the bug
that ort did.

So, fix the bug in ort by splitting the conflict_info for "dir/subdir"
into two, one for the directory part, one for the file (i.e. symlink)
part, since the symlink is being renamed by directory rename detection.
The directory part is needed for proper nesting, since there are still
conflict_info fields for files underneath it (though those are marked
as is_null, they are still present until the entries are processed,
and the entry processing wants every non-toplevel entry to have a
parent directory).

Reported-by: Stefano Rivera <stefano@rivera.za.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 16:10:33 -07:00
Jeff King
193430717a repack: drop remove_temporary_files()
After we've successfully finished the repack, we call
remove_temporary_files(), which looks for and removes any files matching
".tmp-$$-pack-*", where $$ is the pid of the current process. But this
is pointless. If we make it this far in the process, we've already
renamed these tempfiles into place, and there is nothing left to delete.

Nor is there a point in trying to call it to clean up when we _aren't_
successful. It's not safe for using in a signal handler, and the
previous commit already handed that job over to the tempfile API.

It might seem like it would be useful to clean up stray .tmp files left
by other invocations of git-repack. But it won't clean those files; it
only matches ones with its pid, and leaves the rest. Fortunately, those
are cleaned up naturally by successive calls to git-repack; we'll
consider .tmp-*.pack the same as normal packfiles, so "repack -ad", etc,
will roll up their contents and eventually delete them.

The one case that could matter is if pack-objects generates an extension
we don't know about, like ".tmp-pack-$$-$hash.some-new-ext". The current
code will quietly delete such a file, while after this patch we'd leave
it in place. In practice this doesn't happen, and would be indicative of
a bug. Leaving the file as cruft is arguably a better behavior, as it
means somebody is more likely to eventually notice and fix the bug.  If
we really wanted to be paranoid, we could scan for and warn about such
files, but that seems like overkill.

There's nothing to test with regard to the removal of this function. It
was doing nothing, so the behavior should be the same.  However, we can
verify (and protect) our assumption that "repack -ad" will eventually
remove stray files by adding a test for that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21 18:03:52 -07:00
Jeff King
9cf10d8786 repack: use tempfiles for signal cleanup
When git-repack exits due to a signal, it tries to clean up by calling
its remove_temporary_files() function, which walks through the packs dir
looking for ".tmp-$$-pack-*" files to delete (where "$$" is the pid of
the current process).

The biggest problem here is that remove_temporary_files() is not safe to
call in a signal handler. It uses opendir(), which isn't on the POSIX
async-signal-safe list. The details will be platform-specific, but a
likely issue is that it needs to allocate memory; if we receive a signal
while inside malloc(), etc, we'll conflict on the allocator lock and
deadlock with ourselves.

We can fix this by just cleaning up the files directly, without walking
the directory. We already know the complete list of .tmp-* files that
were generated, because we recorded them via populate_pack_exts(). When
we find files there, we can use register_tempfile() to record the
filenames. If we receive a signal, then the tempfile API will clean them
up for us, and it's async-safe and pretty battle-tested.

Note that this is slightly racier than the existing scheme. We don't
record the filenames until pack-objects tells us the hash over stdout.
So during the period between it generating the file and reporting the
hash, we'd fail to clean up. However, that period is very small. During
most of the pack generation process pack-objects is using its own
internal tempfiles. It's only at the very end that it moves them into
the names git-repack expects, and then it immediately reports the name
to us. Given that cleanup like this is best effort (after all, we may
get SIGKILL), this level of race is acceptable.

When we register the tempfiles, we'll record them locally and use the
result to call rename_tempfile(), rather than renaming by hand.  This
isn't strictly necessary, as once we've renamed the files they're gone,
and the tempfile API's cleanup unlink() would simply become a pointless
noop. But managing the lifetimes of the tempfile objects is the cleanest
thing to do, and the tempfile pointers naturally fill the same role as
the old booleans.

This patch also fixes another small problem. We only hook signals, and
don't set up an atexit handler. So if we see an error that causes us to
die(), we'll leave the .tmp-* files in place. But since the tempfile API
handles this for us, this is now fixed for free. The new test covers
this by stimulating a failure of pack-objects when generating a cruft
pack. Before this patch, the .tmp-* file for the main pack would have
been left, but now we correctly clean it up.

Two small subtleties on the implementation:

  - in the renaming loop, we can stop re-constructing fname_old; we only
    use it when we have a tempfile to rename, so we can just ask the
    tempfile for its path (which, barring bugs, should be identical)

  - when renaming fails, our error message mentions fname_old. But since
    a failed rename_tempfile() invalidates the tempfile struct, we'll
    lose access to that string. Instead, let's mention the destination
    filename, which is what most other callers do.

Reported-by: Jan Pokorný <poki@fnusa.cz>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21 18:03:52 -07:00
Philippe Blain
455f0adf57 test-lib-functions: mark 'test_commit' variables as 'local'
Some variables in 'test_commit' have names that are common enough that
it is very likely that test authors might use them in a test. If they do
so and use 'test_commit' between setting such a variable and using it,
the variable value from 'test_commit' will leak back into the test and
most likely break it.

Prevent that by marking all variables in 'test_commit' as 'local'. This
allow a subsequent commit to use a 'tag' variable.

Signed-off-by: Philippe Blain <levraiphilippeblain@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-21 13:51:05 -07:00