Commit Graph

48865 Commits

Author SHA1 Message Date
Junio C Hamano
b438722c06 Merge branch 'ah/doc-empty-string-is-false' into maint
Doc update.

* ah/doc-empty-string-is-false:
  doc: clarify "config --bool" behaviour with empty string
2017-09-10 17:03:03 +09:00
Junio C Hamano
afa6608b93 Merge branch 'rs/merge-microcleanup' into maint
Code clean-up.

* rs/merge-microcleanup:
  merge: use skip_prefix()
2017-09-10 17:03:02 +09:00
Junio C Hamano
c580ce194f Merge branch 'rs/find-pack-entry-bisection' into maint
Code clean-up.

* rs/find-pack-entry-bisection:
  sha1_file: avoid comparison if no packed hash matches the first byte
2017-09-10 17:03:02 +09:00
Junio C Hamano
c7759cd60a Merge branch 'rs/apply-lose-prefix-length' into maint
Code clean-up.

* rs/apply-lose-prefix-length:
  apply: remove prefix_length member from apply_state
2017-09-10 17:03:01 +09:00
Junio C Hamano
70def2c47f Merge branch 'rj/add-chmod-error-message' into maint
Message fix.

* rj/add-chmod-error-message:
  builtin/add: add detail to a 'cannot chmod' error message
2017-09-10 17:03:00 +09:00
Junio C Hamano
822a4d4178 Merge branch 'jk/hashcmp-memcmp' into maint
Code clean-up.

* jk/hashcmp-memcmp:
  hashcmp: use memcmp instead of open-coded loop
2017-09-10 17:02:59 +09:00
Junio C Hamano
f35a1d75b5 Merge branch 'rs/t3700-clean-leftover' into maint
A test fix.

* rs/t3700-clean-leftover:
  t3700: fix broken test under !POSIXPERM
2017-09-10 17:02:58 +09:00
Junio C Hamano
8f3d48e14e Merge branch 'jc/perl-git-comment-typofix' into maint
A comment fix.

* jc/perl-git-comment-typofix:
  perl/Git.pm: typofix in a comment
2017-09-10 17:02:57 +09:00
Junio C Hamano
036e1274a2 Merge branch 'mf/no-dashed-subcommands' into maint
Code clean-up.

* mf/no-dashed-subcommands:
  scripts: use "git foo" not "git-foo"
2017-09-10 17:02:56 +09:00
Junio C Hamano
1eb539a9b3 Merge branch 'ab/ref-filter-no-contains' into maint
A test fix.

* ab/ref-filter-no-contains:
  tests: don't give unportable ">" to "test" built-in, use -gt
2017-09-10 17:02:56 +09:00
Junio C Hamano
ea8bf00095 Merge branch 'rs/archive-excluded-directory' into maint
"git archive" did not work well with pathspecs and the
export-ignore attribute.

We may want to resurrect the "we don't archive an empty directory"
bonus patch, but I do not mind merging the above early to 'next'
and leave it as a separate follow-up enhancement.
cf. <20170820090629.tumvqwzkromcykjf@sigill.intra.peff.net>

* rs/archive-excluded-directory:
  archive: don't queue excluded directories
  archive: factor out helper functions for handling attributes
  t5001: add tests for export-ignore attributes and exclude pathspecs
2017-09-10 17:02:55 +09:00
Junio C Hamano
78ad09403c Merge branch 'mg/killed-merge' into maint
Killing "git merge --edit" before the editor returns control left
the repository in a state with MERGE_MSG but without MERGE_HEAD,
which incorrectly tells the subsequent "git commit" that there was
a squash merge in progress.  This has been fixed.

* mg/killed-merge:
  merge: save merge state earlier
  merge: split write_merge_state in two
  merge: clarify call chain
  Documentation/git-merge: explain --continue
2017-09-10 17:02:55 +09:00
Junio C Hamano
648a50a08a Merge branch 'tb/apply-with-crlf' into maint
"git apply" that is used as a better "patch -p1" failed to apply a
taken from a file with CRLF line endings to a file with CRLF line
endings.  The root cause was because it misused convert_to_git()
that tried to do "safe-crlf" processing by looking at the index
entry at the same path, which is a nonsense---in that mode, "apply"
is not working on the data in (or derived from) the index at all.
This has been fixed.

* tb/apply-with-crlf:
  apply: file commited with CRLF should roundtrip diff and apply
  convert: add SAFE_CRLF_KEEP_CRLF
2017-09-10 17:02:55 +09:00
Junio C Hamano
27015b4f95 Merge branch 'cc/subprocess-handshake-missing-capabilities' into maint
When handshake with a subprocess filter notices that the process
asked for an unknown capability, Git did not report what program
the offending subprocess was running.  This has been corrected.

We may want a follow-up fix to tighten the error checking, though.

* cc/subprocess-handshake-missing-capabilities:
  sub-process: print the cmd when a capability is unsupported
2017-09-10 17:02:55 +09:00
Junio C Hamano
f1b64e8e64 Merge branch 'as/grep-quiet-no-match-exit-code-fix' into maint
"git grep -L" and "git grep --quiet -L" reported different exit
codes; this has been corrected.

* as/grep-quiet-no-match-exit-code-fix:
  git-grep: correct exit code with --quiet and -L
2017-09-10 17:02:55 +09:00
Junio C Hamano
8388f986b6 Merge branch 'kd/stash-with-bash-4.4' into maint
bash 4.4 or newer gave a warning on NUL byte in command
substitution done in "git stash"; this has been squelched.

* kd/stash-with-bash-4.4:
  stash: prevent warning about null bytes in input
2017-09-10 17:02:54 +09:00
Junio C Hamano
fbded00b0d Merge branch 'rs/win32-syslog-leakfix' into maint
Memory leak in an error codepath has been plugged.

* rs/win32-syslog-leakfix:
  win32: plug memory leak on realloc() failure in syslog()
2017-09-10 17:02:54 +09:00
Junio C Hamano
438776e3d4 Merge branch 'rs/unpack-entry-leakfix' into maint
Memory leak in an error codepath has been plugged.

* rs/unpack-entry-leakfix:
  sha1_file: release delta_stack on error in unpack_entry()
2017-09-10 17:02:53 +09:00
Junio C Hamano
c3b931e162 Merge branch 'rs/fsck-obj-leakfix' into maint
Memory leak in an error codepath has been plugged.

* rs/fsck-obj-leakfix:
  fsck: free buffers on error in fsck_obj()
2017-09-10 17:02:53 +09:00
Junio C Hamano
e0d52ec4ab Merge branch 'ur/svn-local-zone' into maint
"git svn" used with "--localtime" option did not compute the tz
offset for the timestamp in question and instead always used the
current time, which has been corrected.

* ur/svn-local-zone:
  git svn fetch: Create correct commit timestamp when using --localtime
2017-09-10 17:02:52 +09:00
Junio C Hamano
00fd0afefd Merge branch 'pw/am-signoff' into maint
"git am -s" has been taught that some input may end with a trailer
block that is not Signed-off-by: and it should refrain from adding
an extra blank line before adding a new sign-off in such a case.

* pw/am-signoff:
  am: fix signoff when other trailers are present
2017-09-10 17:02:51 +09:00
Junio C Hamano
0f80fb185e Merge branch 'rs/in-obsd-basename-dirname-take-const' into maint
Portability fix.

* rs/in-obsd-basename-dirname-take-const:
  test-path-utils: handle const parameter of basename and dirname
2017-09-10 17:02:51 +09:00
Junio C Hamano
b3a19e060c Merge branch 'rs/t4062-obsd' into maint
Test portability fix.

* rs/t4062-obsd:
  t4062: use less than 256 repetitions in regex
2017-09-10 17:02:51 +09:00
Junio C Hamano
c2e19411a7 Merge branch 'rs/obsd-getcwd-workaround' into maint
Test portability fix for BSDs.

* rs/obsd-getcwd-workaround:
  t0001: skip test with restrictive permissions if getpwd(3) respects them
2017-09-10 17:02:50 +09:00
Junio C Hamano
277194a280 Merge branch 'bw/clone-recursive-quiet' into maint
"git clone --recurse-submodules --quiet" did not pass the quiet
option down to submodules.

* bw/clone-recursive-quiet:
  clone: teach recursive clones to respect -q
2017-09-10 17:02:49 +09:00
Junio C Hamano
86c726f0d1 Merge branch 'pw/sequence-rerere-autoupdate' into maint
Commands like "git rebase" accepted the --rerere-autoupdate option
from the command line, but did not always use it.  This has been
fixed.

* pw/sequence-rerere-autoupdate:
  cherry-pick/revert: reject --rerere-autoupdate when continuing
  cherry-pick/revert: remember --rerere-autoupdate
  t3504: use test_commit
  rebase -i: honor --rerere-autoupdate
  rebase: honor --rerere-autoupdate
  am: remember --rerere-autoupdate setting
2017-09-10 17:02:49 +09:00
Junio C Hamano
eba2a68f25 Merge branch 'bw/push-options-recursively-to-submodules' into maint
"git push --recurse-submodules $there HEAD:$target" was not
propagated down to the submodules, but now it is.

* bw/push-options-recursively-to-submodules:
  submodule--helper: teach push-check to handle HEAD
2017-09-10 17:02:49 +09:00
Junio C Hamano
702239d049 Merge branch 'ma/pager-per-subcommand-action' into maint
The "tag.pager" configuration variable was useless for those who
actually create tag objects, as it interfered with the use of an
editor.  A new mechanism has been introduced for commands to enable
pager depending on what operation is being carried out to fix this,
and then "git tag -l" is made to run pager by default.

If this works out OK, I think there are low-hanging fruits in
other commands like "git branch" that outputs long list in one mode
while taking input in another.

* ma/pager-per-subcommand-action:
  git.c: ignore pager.* when launching builtin as dashed external
  tag: change default of `pager.tag` to "on"
  tag: respect `pager.tag` in list-mode only
  t7006: add tests for how git tag paginates
  git.c: provide setup_auto_pager()
  git.c: let builtins opt for handling `pager.foo` themselves
  builtin.h: take over documentation from api-builtin.txt
2017-09-10 17:02:48 +09:00
Junio C Hamano
c2a3bb47f0 Merge branch 'jk/rev-list-empty-input' into maint
"git log --tag=no-such-tag" showed log starting from HEAD, which
has been fixed---it now shows nothing.

* jk/rev-list-empty-input:
  revision: do not fallback to default when rev_input_given is set
  rev-list: don't show usage when we see empty ref patterns
  revision: add rev_input_given flag
  t6018: flesh out empty input/output rev-list tests
2017-09-10 17:02:48 +09:00
Junio C Hamano
638eb4e701 Merge branch 'st/lib-gpg-kill-stray-agent' into maint
Some versions of GnuPG fails to kill gpg-agent it auto-spawned
and such a left-over agent can interfere with a test.  Work it
around by attempting to kill one before starting a new test.

* st/lib-gpg-kill-stray-agent:
  t: lib-gpg: flush gpg agent on startup
2017-09-10 17:02:48 +09:00
Rene Scharfe
b6ec307177 wt-status: release strbuf after use in wt_longstatus_print_tracking()
If format_tracking_info() returns 0, then it didn't touch its strbuf
parameter, so it's OK to exit early in that case.  Clean up sb in the
other case.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:38:57 +09:00
Martin Ågren
276d0e35c0 refs/files-backend: add refname, not "HEAD", to list
An earlier patch rewrote `split_symref_update()` to add a copy of a
string to a string list instead of adding the original string. That was
so that the original string could be freed in a later patch, but it is
also conceptually cleaner, since now all calls to `string_list_insert()`
and `string_list_append()` add `update->refname`. --- Except a literal
"HEAD" is added in `split_head_update()`.

Restructure `split_head_update()` in the same way as the earlier patch
did for `split_symref_update()`. This does not correct any practical
problem, but makes things conceptually cleaner. The downside is a call
to `string_list_has_string()`, which should be relatively cheap.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:36:58 +09:00
Martin Ågren
3f5ef95b5e refs/files-backend: correct return value in lock_ref_for_update
In one code path we return a literal -1 and not a symbolic constant. The
value -1 would be interpreted as TRANSACTION_NAME_CONFLICT, which is
wrong. Use TRANSACTION_GENERIC_ERROR instead (that is the only other
return value we have to choose from).

Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:36:58 +09:00
Martin Ågren
851e1fbd01 refs/files-backend: fix memory leak in lock_ref_for_update
After the previous patch, none of the functions we call hold on to
`referent.buf`, so we can safely release the string buffer before
returning.

Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:36:58 +09:00
Martin Ågren
c299468bd7 refs/files-backend: add longer-scoped copy of string to list
split_symref_update() receives a string-pointer `referent` and adds it
to the list of `affected_refnames`. The list simply holds on to the
pointers it is given, it does not copy the strings and it does not ever
free them. The `referent` string in split_symref_update() belongs to a
string buffer in the caller. After we return, the string will be leaked.

In the next patch, we want to properly release the string buffer in the
caller, but we can't safely do so until we've made sure that
`affected_refnames` will not be holding on to a pointer to the string.
We could configure the list to handle its own resources, but it would
mean some alloc/free-churning. The list is already handling other
strings (through other code paths) which we do not need to worry about,
and we'd be memory-churning those strings too, completely unnecessary.

Observe that split_symref_update() creates a `new_update`-object through
ref_transaction_add_update(), after which `new_update->refname` is a
copy of `referent`. The difference is, this copy will be freed, and it
will be freed *after* `affected_refnames` has been cleared.

Rearrange the handling of `referent`, so that we don't add it directly
to `affected_refnames`. Instead, first just check whether `referent`
exists in the string list, and later add `new_update->refname`.

Helped-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:36:58 +09:00
Ross Kabus
c818e74332 commit-tree: do not complete line in -F input
"git commit-tree -F <file>", unlike "cat <file> | git
commit-tree" (i.e. feeding the same contents from the standard
input), added a missing final newline when the input ended in an
incomplete line.

Correct this inconsistency by leaving the incomplete line as-is,
as erring on the side of not touching the input is preferrable
and expected for a plumbing command like "commit-tree".

Signed-off-by: Ross Kabus <rkabus@aerotech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-10 16:29:53 +09:00
Michael Haggerty
5e00a6c873 files_transaction_finish(): delete reflogs before references
If the deletion steps unexpectedly fail, it is less bad to leave a
reference without its reflog than it is to leave a reflog without its
reference, since the latter is an invalid repository state.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
9939b33d6a packed-backend: rip out some now-unused code
Now the outside world interacts with the packed ref store only via the
generic refs API plus a few lock-related functions. This allows us to
delete some functions that are no longer used, thereby completing the
encapsulation of the packed ref store.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
dc39e09942 files_ref_store: use a transaction to update packed refs
When processing a `files_ref_store` transaction, it is sometimes
necessary to delete some references from the "packed-refs" file. Do
that using a reference transaction conducted against the
`packed_ref_store`.

This change further decouples `files_ref_store` from
`packed_ref_store`. It also fixes multiple problems, including the two
revealed by test cases added in the previous commit.

First, the old code didn't obtain the `packed-refs` lock until
`files_transaction_finish()`. This means that a failure to acquire the
`packed-refs` lock (e.g., due to contention with another process)
wasn't detected until it was too late (problems like this are supposed
to be detected in the "prepare" phase). The new code acquires the
`packed-refs` lock in `files_transaction_prepare()`, the same stage of
the processing when the loose reference locks are being acquired,
removing another reason why the "prepare" phase might succeed and the
"finish" phase might nevertheless fail.

Second, the old code deleted the loose version of a reference before
deleting any packed version of the same reference. This left a moment
when another process might think that the packed version of the
reference is current, which is incorrect. (Even worse, the packed
version of the reference can be arbitrarily old, and might even point
at an object that has since been garbage-collected.)

Third, if a reference deletion fails to acquire the `packed-refs` lock
altogether, then the old code might leave the repository in the
incorrect state (possibly corrupt) described in the previous
paragraph.

Now we activate the new "packed-refs" file (sans any references that
are being deleted) *before* deleting the corresponding loose
references. But we hold the "packed-refs" lock until after the loose
references have been finalized, thus preventing a simultaneous
"pack-refs" process from packing the loose version of the reference in
the time gap, which would otherwise defeat our attempt to delete it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
6a2a7736d8 t1404: demonstrate two problems with reference transactions
Currently, a loose reference is deleted even before locking the
`packed-refs` file, let alone deleting any packed version of the
reference. This leads to two problems, demonstrated by two new tests:

* While a reference is being deleted, other processes might see the
  old, packed value of the reference for a moment before the packed
  version is deleted. Normally this would be hard to observe, but we
  can prolong the window by locking the `packed-refs` file externally
  before running `update-ref`, then unlocking it before `update-ref`'s
  attempt to acquire the lock times out.

* If the `packed-refs` file is locked so long that `update-ref` fails
  to lock it, then the reference can be left permanently in the
  incorrect state described in the previous point.

In a moment, both problems will be fixed.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
1444bfe027 files_initial_transaction_commit(): use a transaction for packed refs
Use a `packed_ref_store` transaction in the implementation of
`files_initial_transaction_commit()` rather than using internal
features of the packed ref store. This further decouples
`files_ref_store` from `packed_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
22b09cdfad prune_refs(): also free the linked list
At least since v1.7, the elements of the `refs_to_prune` linked list
have been leaked. Fix the leak by teaching `prune_refs()` to free the
list elements as it processes them.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
27d03d04d5 files_pack_refs(): use a reference transaction to write packed refs
Now that the packed reference store supports transactions, we can use
a transaction to write the packed versions of references that we want
to pack. This decreases the coupling between `files_ref_store` and
`packed_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
2fb330ca72 packed_delete_refs(): implement method
Implement `packed_delete_refs()` using a reference transaction. This
means that `files_delete_refs()` can use `refs_delete_refs()` instead
of `repack_without_refs()` to delete any packed references, decreasing
the coupling between the classes.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:04 +09:00
Michael Haggerty
2775d8724d packed_ref_store: implement reference transactions
Implement the methods needed to support reference transactions for
the packed-refs backend. The new methods are not yet used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:03 +09:00
Michael Haggerty
3bf4f56134 struct ref_transaction: add a place for backends to store data
`packed_ref_store` is going to want to store some transaction-wide
data, so make a place for it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:03 +09:00
Michael Haggerty
39c8df0cfe packed-backend: don't adjust the reference count on lock/unlock
The old code incremented the packed ref cache reference count when
acquiring the packed-refs lock, and decremented the count when
releasing the lock. This is unnecessary because:

* Another process cannot change the packed-refs file because it is
  locked.

* When we ourselves change the packed-refs file, we do so by first
  modifying the packed ref-cache, and then writing the data from the
  ref-cache to disk. So the packed ref-cache remains fresh because any
  changes that we plan to make to the file are made in the cache first
  anyway.

So there is no reason for the cache to become stale.

Moreover, the extra reference count causes a problem if we
intentionally clear the packed refs cache, as we sometimes need to do
if we change the cache in anticipation of writing a change to disk,
but then the write to disk fails. In that case, `packed_refs_unlock()`
would have no easy way to find the cache whose reference count it
needs to decrement.

This whole issue will soon become moot due to upcoming changes that
avoid changing the in-memory cache as part of updating the packed-refs
on disk, but this change makes that transition easier.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:18:03 +09:00
Michael Haggerty
3964281524 load_subtree(): check that prefix_len is in the expected range
This value, which is stashed in the last byte of an object_id hash,
gets handed around a lot. So add a sanity check before using it in
`load_subtree()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 03:16:13 +09:00
Jeff King
1ab03a57e1 shortlog: skip format/parse roundtrip for internal traversal
The original git-shortlog command parsed the output of
git-log, and the logic went something like this:

  1. Read stdin looking for "author" lines.

  2. Parse the identity into its name/email bits.

  3. Apply mailmap to the name/email.

  4. Reformat the identity into a single buffer that is our
     "key" for grouping entries (either a name by default,
     or "name <email>" if --email was given).

The first part happens in read_from_stdin(), and the other
three steps are part of insert_one_record().

When we do an internal traversal, we just swap out the stdin
read in step 1 for reading the commit objects ourselves.
Prior to 2db6b83d18 (shortlog: replace hand-parsing of
author with pretty-printer, 2016-01-18), that made sense; we
still had to parse the ident in the commit message.

But after that commit, we use pretty.c's "%an <%ae>" to get
the author ident (for simplicity). Which means that the
pretty printer is doing a parse/format under the hood, and
then we parse the result, apply the mailmap, and format the
result again.

Instead, we can just ask pretty.c to do all of those steps
for us (including the mailmap via "%aN <%aE>", and not
formatting the address when --email is missing).

And then we can push steps 2-4 into read_from_stdin(). This
speeds up "git shortlog -ns" on linux.git by about 3%, and
eliminates a leak in insert_one_record() of the namemailbuf
strbuf.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-09 01:57:03 +09:00
Jeff King
0e5bba53af add UNLEAK annotation for reducing leak false positives
It's a common pattern in git commands to allocate some
memory that should last for the lifetime of the program and
then not bother to free it, relying on the OS to throw it
away.

This keeps the code simple, and it's fast (we don't waste
time traversing structures or calling free at the end of the
program). But it also triggers warnings from memory-leak
checkers like valgrind or LSAN. They know that the memory
was still allocated at program exit, but they don't know
_when_ the leaked memory stopped being useful. If it was
early in the program, then it's probably a real and
important leak. But if it was used right up until program
exit, it's not an interesting leak and we'd like to suppress
it so that we can see the real leaks.

This patch introduces an UNLEAK() macro that lets us do so.
To understand its design, let's first look at some of the
alternatives.

Unfortunately the suppression systems offered by
leak-checking tools don't quite do what we want. A
leak-checker basically knows two things:

  1. Which blocks were allocated via malloc, and the
     callstack during the allocation.

  2. Which blocks were left un-freed at the end of the
     program (and which are unreachable, but more on that
     later).

Their suppressions work by mentioning the function or
callstack of a particular allocation, and marking it as OK
to leak.  So imagine you have code like this:

  int cmd_foo(...)
  {
	/* this allocates some memory */
	char *p = some_function();
	printf("%s", p);
	return 0;
  }

You can say "ignore allocations from some_function(),
they're not leaks". But that's not right. That function may
be called elsewhere, too, and we would potentially want to
know about those leaks.

So you can say "ignore the callstack when main calls
some_function".  That works, but your annotations are
brittle. In this case it's only two functions, but you can
imagine that the actual allocation is much deeper. If any of
the intermediate code changes, you have to update the
suppression.

What we _really_ want to say is that "the value assigned to
p at the end of the function is not a real leak". But
leak-checkers can't understand that; they don't know about
"p" in the first place.

However, we can do something a little bit tricky if we make
some assumptions about how leak-checkers work. They
generally don't just report all un-freed blocks. That would
report even globals which are still accessible when the
leak-check is run.  Instead they take some set of memory
(like BSS) as a root and mark it as "reachable". Then they
scan the reachable blocks for anything that looks like a
pointer to a malloc'd block, and consider that block
reachable. And then they scan those blocks, and so on,
transitively marking anything reachable from a global as
"not leaked" (or at least leaked in a different category).

So we can mark the value of "p" as reachable by putting it
into a variable with program lifetime. One way to do that is
to just mark "p" as static. But that actually affects the
run-time behavior if the function is called twice (you
aren't likely to call main() twice, but some of our cmd_*()
functions are called from other commands).

Instead, we can trick the leak-checker by putting the value
into _any_ reachable bytes. This patch keeps a global
linked-list of bytes copied from "unleaked" variables. That
list is reachable even at program exit, which confers
recursive reachability on whatever values we unleak.

In other words, you can do:

  int cmd_foo(...)
  {
	char *p = some_function();
	printf("%s", p);
	UNLEAK(p);
	return 0;
  }

to annotate "p" and suppress the leak report.

But wait, couldn't we just say "free(p)"? In this toy
example, yes. But UNLEAK()'s byte-copying strategy has
several advantages over actually freeing the memory:

  1. It's recursive across structures. In many cases our "p"
     is not just a pointer, but a complex struct whose
     fields may have been allocated by a sub-function. And
     in some cases (e.g., dir_struct) we don't even have a
     function which knows how to free all of the struct
     members.

     By marking the struct itself as reachable, that confers
     reachability on any pointers it contains (including those
     found in embedded structs, or reachable by walking
     heap blocks recursively.

  2. It works on cases where we're not sure if the value is
     allocated or not. For example:

       char *p = argc > 1 ? argv[1] : some_function();

     It's safe to use UNLEAK(p) here, because it's not
     freeing any memory. In the case that we're pointing to
     argv here, the reachability checker will just ignore
     our bytes.

  3. Likewise, it works even if the variable has _already_
     been freed. We're just copying the pointer bytes. If
     the block has been freed, the leak-checker will skip
     over those bytes as uninteresting.

  4. Because it's not actually freeing memory, you can
     UNLEAK() before we are finished accessing the variable.
     This is helpful in cases like this:

       char *p = some_function();
       return another_function(p);

     Writing this with free() requires:

       int ret;
       char *p = some_function();
       ret = another_function(p);
       free(p);
       return ret;

     But with unleak we can just write:

       char *p = some_function();
       UNLEAK(p);
       return another_function(p);

This patch adds the UNLEAK() macro and enables it
automatically when Git is compiled with SANITIZE=leak.  In
normal builds it's a noop, so we pay no runtime cost.

It also adds some UNLEAK() annotations to show off how the
feature works. On top of other recent leak fixes, these are
enough to get t0000 and t0001 to pass when compiled with
LSAN.

Note the case in commit.c which actually converts a
strbuf_release() into an UNLEAK. This code was already
non-leaky, but the free didn't do anything useful, since
we're exiting. Converting it to an annotation means that
non-leak-checking builds pay no runtime cost. The cost is
minimal enough that it's probably not worth going on a
crusade to convert these kinds of frees to UNLEAKS. I did it
here for consistency with the "sb" leak (though it would
have been equally correct to go the other way, and turn them
both into strbuf_release() calls).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-08 15:43:17 +09:00