Push the submodule version of collision-detecting SHA-1 hash
implementation a bit harder on builders.
* ab/sha1dc-build:
sha1dc_git.h: re-arrange an ifdef chain for a subsequent change
Makefile: under "make dist", include the sha1collisiondetection submodule
Makefile: don't error out under DC_SHA1_EXTERNAL if DC_SHA1_SUBMODULE=auto
Retire mru API as it does not give enough abstraction over
underlying list API to be worth it.
* gs/retire-mru:
mru: Replace mru.[ch] with list.h implementation
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.
* jh/fsck-promisors:
gc: do not repack promisor packfiles
rev-list: support termination at promisor objects
sha1_file: support lazily fetching missing objects
introduce fetch-object: fetch one promisor object
index-pack: refactor writing of .keep files
fsck: support promisor objects as CLI argument
fsck: support referenced promisor objects
fsck: support refs pointing to promisor objects
fsck: introduce partialclone extension
extension.partialclone: introduce partial clone extension
The build procedure for perl/ part has been greatly simplified by
weaning ourselves off of MakeMaker.
* ab/simplify-perl-makefile:
perl: treat PERLLIB_EXTRA as an extra path again
perl: avoid *.pmc and fix Error.pm further
Makefile: replace perl/Makefile.PL with simple make rules
Replace the custom calls to mru.[ch] with calls to list.h. This patch is
the final step in removing the mru API completely and inlining the logic.
This patch leads to significant code reduction and the mru API hence, is
not a useful abstraction anymore.
Signed-off-by: Gargi Sharma <gs051095@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tying loose ends for the recent integration work of
collision-detecting SHA-1 implementation.
* ab/dc-sha1-loose-ends:
Makefile: NO_OPENSSL=1 should no longer imply BLK_SHA1=1
PERLLIB_EXTRA was introduced in v1.9-rc0~88^2 (2013-11-15) as a way
for packagers to add additional directories such as the location of
Subversion's perl bindings to Git's perl path. Since 20d2a30f
(Makefile: replace perl/Makefile.PL with simple make rules,
2012-12-10) setting that variable breaks perl-based commands instead:
$ PATH=$HOME/opt/git/bin:$PATH
$ make install prefix=$HOME/opt/git PERLLIB_EXTRA=anextralibdir
[...]
$ head -2 $HOME/opt/git/libexec/git-core/git-add--interactive
#!/usr/bin/perl
use lib (split(/:/, $ENV{GITPERLLIB} || ":helloiamanextrainstlibdir" || "/usr/local/google/home/jrn/opt/git/share/perl5"));
$ git add -p
Empty compile time value given to use lib at /home/jrn/opt/git/libexec/git-core/git-add--interactive line 2.
Removing the spurious ":" at the beginning of ":$PERLLIB_EXTRA" avoids
the "Empty compile time value" error but with that tweak the problem
still remains: PERLLIB_EXTRA ends up replacing instead of
supplementing the perllibdir that would be passed to 'use lib' if
PERLLIB_EXTRA were not set.
The intent was to simplify, as the commit message to 20d2a30f
explains:
| The scripts themselves will 'use lib' the target directory, but if
| INSTLIBDIR is set it overrides it. It doesn't have to be this way,
| it could be set in addition to INSTLIBDIR, but my reading of
| [v1.9-rc0~88^2] is that this is the desired behavior.
Restore the previous code structure to make PERLLIB_EXTRA work again.
Reproducing this problem requires an invocation of "make install"
instead of running bin-wrappers/git in place, since the latter sets
the GITPERLLIB environment variable, avoiding trouble.
Reported-by: Jonathan Koren <jdkoren@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git version --build-options" learned to report the host CPU and
the exact commit object name the binary was built from.
* js/enhanced-version-info:
version --build-options: report commit, too, if possible
version --build-options: also report host CPU
Use the collision detecting SHA-1 implementation by default even when
NO_OPENSSL is set.
Setting NO_OPENSSL=UnfortunatelyYes has implied BLK_SHA1=1 ever since
the former was introduced in dd53c7ab29 (Support for NO_OPENSSL,
2005-07-29). That implication should have been removed when the
default SHA-1 implementation changed from OpenSSL to DC_SHA1 in
e6b07da278 (Makefile: make DC_SHA1 the default, 2017-03-17). Finish
what that commit started by removing the BLK_SHA1 fallback setting so
the default DC_SHA1 implementation will be used.
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous round tried to use *.pmc files but it confused RPM
dependency analysis on some distros. Install them as plain
vanilla *.pm files instead.
Also "local @_" construct did not properly work when goto &sub
is used until recent versions of Perl. Avoid it (and we do not
need to localize it here anyway).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A few structures and variables that are implementation details of
the decorate API have been renamed and then the API got documented
better.
* jt/decorate-api:
decorate: clean up and document API
In preparation for implementing narrow/partial clone, the object
walking machinery has been taught a way to tell it to "filter" some
objects from enumeration.
* jh/object-filtering:
rev-list: support --no-filter argument
list-objects-filter-options: support --no-filter
list-objects-filter-options: fix 'keword' typo in comment
pack-objects: add list-objects filtering
rev-list: add list-objects filtering support
list-objects: filter objects in traverse_commit_list
oidset: add iterator methods to oidset
oidmap: add oidmap iterator methods
dir: allow exclusions from blob in addition to file
The way "git worktree add" determines what branch to create from
where and checkout in the new worktree has been updated a bit.
* tg/worktree-create-tracking:
add worktree.guessRemote config option
worktree: add --guess-remote flag to add subcommand
worktree: make add <path> <branch> dwim
worktree: add --[no-]track option to the add subcommand
worktree: add can be created from any commit-ish
checkout: factor out functions to new lib file
In particular when local tags are used (or tags that are pushed to some
fork) to build Git, it is very hard to figure out from which particular
revision a particular Git executable was built. It gets worse when those
tags are deleted, or even updated.
Let's just report an exact, unabbreviated commit name in our build
options.
We need to be careful, though, to report when the current commit cannot
be determined, e.g. when building from a tarball without any associated
Git repository. This could be the case also when extracting Git's source
code into an unrelated Git worktree.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be helpful for bug reports to include information about the
environment in which the bug occurs. "git version --build-options" can
help to supplement this information. In addition to the size of 'long'
already reported by --build-options, also report the host's CPU type.
Example output:
$ git version --build-options
git version 2.9.3.windows.2.826.g06c0f2f
cpu: x86_64
sizeof-long: 4
New Makefile variable HOST_CPU supports cross-compiling.
Suggested-by: Adric Norris <landstander668@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the perl/Makefile.PL and the fallback perl/Makefile used under
NO_PERL_MAKEMAKER=NoThanks with a much simpler implementation heavily
inspired by how the i18n infrastructure's build process works[1].
The reason for having the Makefile.PL in the first place is that it
was initially[2] building a perl C binding to interface with libgit,
this functionality, that was removed[3] before Git.pm ever made it to
the master branch.
We've since since started maintaining a fallback perl/Makefile, as
MakeMaker wouldn't work on some platforms[4]. That's just the tip of
the iceberg. We have the PM.stamp hack in the top-level Makefile[5] to
detect whether we need to regenerate the perl/perl.mak, which I fixed
just recently to deal with issues like the perl version changing from
under us[6].
There is absolutely no reason for why this needs to be so complex
anymore. All we're getting out of this elaborate Rube Goldberg machine
was copying perl/* to perl/blib/* as we do a string-replacement on
the *.pm files to hardcode @@LOCALEDIR@@ in the source, as well as
pod2man-ing Git.pm & friends.
So replace the whole thing with something that's pretty much a copy of
how we generate po/build/**.mo from po/*.po, just with a small sed(1)
command instead of msgfmt. As that's being done rename the files
from *.pm to *.pmc just to indicate that they're generated (see
"perldoc -f require").
While I'm at it, change the fallback for Error.pm from being something
where we'll ship our own Error.pm if one doesn't exist at build time
to one where we just use a Git::Error wrapper that'll always prefer
the system-wide Error.pm, only falling back to our own copy if it
really doesn't exist at runtime. It's now shipped as
Git::FromCPAN::Error, making it easy to add other modules to
Git::FromCPAN::* in the future if that's needed.
Functional changes:
* This will not always install into perl's idea of its global
"installsitelib". This only potentially matters for packagers that
need to expose Git.pm for non-git use, and as explained in the
INSTALL file there's a trivial workaround.
* The scripts themselves will 'use lib' the target directory, but if
INSTLIBDIR is set it overrides it. It doesn't have to be this way,
it could be set in addition to INSTLIBDIR, but my reading of [7] is
that this is the desired behavior.
* We don't build man pages for all of the perl modules as we used to,
only Git(3pm). As discussed on-list[8] that we were building
installed manpages for purely internal APIs like Git::I18N or
private-Error.pm was always a bug anyway, and all the Git::SVN::*
ones say they're internal APIs.
There are apparently external users of Git.pm, but I don't expect
there to be any of the others.
As a side-effect of these general changes the perl documentation
now only installed by install-{doc,man}, not a mere "install" as
before.
1. 5e9637c629 ("i18n: add infrastructure for translating Git with
gettext", 2011-11-18)
2. b1edc53d06 ("Introduce Git.pm (v4)", 2006-06-24)
3. 18b0fc1ce1 ("Git.pm: Kill Git.xs for now", 2006-09-23)
4. f848718a69 ("Make perl/ build procedure ActiveState friendly.",
2006-12-04)
5. ee9be06770 ("perl: detect new files in MakeMaker builds",
2012-07-27)
6. c59c4939c2 ("perl: regenerate perl.mak if perl -V changes",
2017-03-29)
7. 0386dd37b1 ("Makefile: add PERLLIB_EXTRA variable that adds to
default perl path", 2013-11-15)
8. 87bmjjv1pu.fsf@evledraar.booking.com ("Re: [PATCH] Makefile:
replace perl/Makefile.PL with simple make rules"
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Include the sha1collisiondetection submodule when running "make
dist". Even though we've been shipping the sha1collisiondetection
submodule[1] and using it by default if it's checked out[2] anyone
downloading git as a tarball would just get an empty
sha1collisiondetection/ directory.
Doing this automatically is a feature that's missing from git-archive,
but in the meantime let's bundle this up into the tarball we
ship. This ensures that the DC_SHA1_SUBMODULE flag does what's
intended even in an unpacked tarball, and more importantly means we're
building the exact same code from the same paths from git.git and from
the tarball.
I am not including all the files in the submodule, only the ones git
actually needs (and the licenses), only including some files like this
would be a useful feature if git-archive ever adds the ability to
bundle up submodules.
1. commit 86cfd61e6b ("sha1dc: optionally use sha1collisiondetection
as a submodule", 2017-07-01)
2. cac87dc01d ("sha1collisiondetection: automatically enable when
submodule is populated", 2017-07-01)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fix a logic error in the initial introduction of DC_SHA1_EXTERNAL. If
git.git has a sha1collisiondetection submodule checked out the logic
to set DC_SHA1_SUBMODULE=auto would interact badly with the check for
whether DC_SHA1_SUBMODULE was set.
It would error out, meaning that there's no way to build git with
DC_SHA1_EXTERNAL=YesPlease without deinit-ing the submodule.
Instead, adjust the logic to only fire if the variable is to something
else than "auto" which would mean it's a mistake on the part of
whoever's building git, not just the Makefile tripping over its own
logic.
1. 3964cbbb5c ("sha1dc: allow building with the external sha1dc
library", 2017-08-15)
2. cac87dc01d ("sha1collisiondetection: automatically enable when
submodule is populated", 2017-07-01)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Improve the names of the identifiers in decorate.h, document them, and
add an example of how to use these functions.
The example is compiled and run as part of the test suite.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
You may want to run the test suite with a different shell
than you use to build Git. For instance, you may build with
SHELL_PATH=/bin/sh (because it's faster, or it's what you
expect to exist on systems where the build will be used) but
want to run the test suite with bash (e.g., since that
allows using "-x" reliably across the whole test suite).
There's currently no good way to do this.
You might think that doing two separate make invocations,
like:
make &&
make -C t SHELL_PATH=/bin/bash
would work. And it _almost_ does. The second make will see
our bash SHELL_PATH, and we'll use that to run the
individual test scripts (or tell prove to use it to do so).
So far so good.
But this breaks down when "--tee" or "--verbose-log" is
used. Those options cause the test script to actually
re-exec itself using $SHELL_PATH. But wait, wouldn't our
second make invocation have set SHELL_PATH correctly in the
environment?
Yes, but test-lib.sh sources GIT-BUILD-OPTIONS, which we
built during the first "make". And that overrides the
environment, giving us the original SHELL_PATH again.
Let's introduce a new variable that lets you specify a
specific shell to be run for the test scripts. Note that we
have to touch both the main and t/ Makefiles, since we have
to record it in GIT-BUILD-OPTIONS in one, and use it in the
latter.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new mechanism to upgrade the wire protocol in place is proposed
and demonstrated that it works with the older versions of Git
without harming them.
* bw/protocol-v1:
Documentation: document Extra Parameters
ssh: introduce a 'simple' ssh variant
i5700: add interop test for protocol transition
http: tell server that the client understands v1
connect: tell server that the client understands v1
connect: teach client to recognize v1 server response
upload-pack, receive-pack: introduce protocol version 1
daemon: recognize hidden request arguments
protocol: introduce protocol extension mechanisms
pkt-line: add packet_write function
connect: in ref advertisement, shallows are last
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.
This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).
Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.
This will be tested in a subsequent commit.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Factor the functions out, so they can be re-used from other places. In
particular these functions will be re-used in builtin/worktree.c to make
git worktree add dwim more.
While there add some docs to the function.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted from the
traversal.
Update traverse_commit_list() to be a wrapper for the above
with a null filter to minimize the number of callers that
needed to be changed.
Object filtering will be used in a future commit by rev-list
and pack-objects for partial clone and fetch to omit unwanted
objects from the result.
traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used. It
should be possible to extend such support in the future (at
least to simple filters that do not require object pathnames),
but that is beyond the scope of this patch series.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We learned to talk to watchman to speed up "git status" and other
operations that need to see which paths have been modified.
* bp/fsmonitor:
fsmonitor: preserve utf8 filenames in fsmonitor-watchman log
fsmonitor: read entirety of watchman output
fsmonitor: MINGW support for watchman integration
fsmonitor: add a performance test
fsmonitor: add a sample integration script for Watchman
fsmonitor: add test cases for fsmonitor extension
split-index: disable the fsmonitor extension when running the split index test
fsmonitor: add a test tool to dump the index extension
update-index: add fsmonitor support to update-index
ls-files: Add support in ls-files to display the fsmonitor valid bit
fsmonitor: add documentation for the fsmonitor extension.
fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.
update-index: add a new --force-write-index option
preload-index: add override to enable testing preload-index
bswap: add 64 bit endianness helper get_be64
Create protocol.{c,h} and provide functions which future servers and
clients can use to determine which protocol to use or is being used.
Also introduce the 'GIT_PROTOCOL' environment variable which will be
used to communicate a colon separated list of keys with optional values
to a server. Unknown keys and values must be tolerated. This mechanism
is used to communicate which version of the wire protocol a client would
like to use with a server.
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Operations that do not touch (majority of) packed refs have been
optimized by making accesses to packed-refs file lazy; we no longer
pre-parse everything, and an access to a single ref in the
packed-refs does not touch majority of irrelevant refs, either.
* mh/mmap-packed-refs: (21 commits)
packed-backend.c: rename a bunch of things and update comments
mmapped_ref_iterator: inline into `packed_ref_iterator`
ref_cache: remove support for storing peeled values
packed_ref_store: get rid of the `ref_cache` entirely
ref_store: implement `refs_peel_ref()` generically
packed_read_raw_ref(): read the reference from the mmapped buffer
packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
read_packed_refs(): ensure that references are ordered when read
packed_ref_cache: keep the `packed-refs` file mmapped if possible
packed-backend.c: reorder some definitions
mmapped_ref_iterator_advance(): no peeled value for broken refs
mmapped_ref_iterator: add iterator over a packed-refs file
packed_ref_cache: remember the file-wide peeling state
read_packed_refs(): read references with minimal copying
read_packed_refs(): make parsing of the header line more robust
read_packed_refs(): only check for a header at the top of the file
read_packed_refs(): use mmap to read the `packed-refs` file
die_unterminated_line(), die_invalid_line(): new functions
packed_ref_cache: add a backlink to the associated `packed_ref_store`
prefix_ref_iterator: break when we leave the prefix
...
Add a test utility (test-drop-caches) that flushes all changes to disk
then drops file system cache on Windows, Linux, and OSX.
Add a perf test (p7519-fsmonitor.sh) for fsmonitor.
By default, the performance test will utilize the Watchman file system
monitor if it is installed. If Watchman is not installed, it will use a
dummy integration script that does not report any new or modified files.
The dummy script has very little overhead which provides optimistic results.
The performance test will also use the untracked cache feature if it is
available as fsmonitor uses it to speed up scanning for untracked files.
There are 4 environment variables that can be used to alter the default
behavior of the performance test:
GIT_PERF_7519_UNTRACKED_CACHE: used to configure core.untrackedCache
GIT_PERF_7519_SPLIT_INDEX: used to configure core.splitIndex
GIT_PERF_7519_FSMONITOR: used to configure core.fsmonitor
GIT_PERF_7519_DROP_CACHE: if set, the OS caches are dropped between tests
The big win for using fsmonitor is the elimination of the need to scan the
working directory looking for changed and untracked files. If the file
information is all cached in RAM, the benefits are reduced.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a test utility (test-dump-fsmonitor) that will dump the fsmonitor
index extension.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.
We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.
In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.
To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.
Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.
Signed-off-by: Ben Peart <benpeart@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is similar to using the hashmap in hashmap.c, but with an
easier-to-use API. In particular, custom entry comparisons no longer
need to be written, and lookups can be done without constructing a
temporary entry structure.
This is implemented as a thin wrapper over the hashmap API. In
particular, this means that there is an additional 4-byte overhead due
to the fact that the first 4 bytes of the hash is redundantly stored.
For now, I'm taking the simpler approach, but if need be, we can
reimplement oidmap without affecting the callers significantly.
oidset has been updated to use oidmap.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Keep a copy of the `packed-refs` file contents in memory for as long
as a `packed_ref_cache` object is in use:
* If the system allows it, keep the `packed-refs` file mmapped.
* If not (either because the system doesn't support `mmap()` at all,
or because a file that is currently mmapped cannot be replaced via
`rename()`), then make a copy of the file's contents in
heap-allocated space, and keep that around instead.
We base the choice of behavior on a new build-time switch,
`MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows
variants.
After this commit, `MMAP_NONE` and `MMAP_TEMPORARY` are still handled
identically. But the next commit will introduce a difference.
This whole change is still pointless, because we only read the
`packed-refs` file contents immediately after instantiating the
`packed_ref_cache`. But that will soon change.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of our programs consider that it is OK to release dynamic
storage that is used throughout the life of the program by simply
exiting, but this makes it harder to leak detection tools to avoid
reporting false positives. Plug many existing leaks and introduce
a mechanism for developers to mark that the region of memory
pointed by a pointer is not lost/leaking to help these tools.
* jk/leak-checkers:
add UNLEAK annotation for reducing leak false positives
set_git_dir: handle feeding gitdir to itself
repository: free fields before overwriting them
reset: free allocated tree buffers
reset: make tree counting less confusing
config: plug user_config leak
update-index: fix cache entry leak in add_one_file()
add: free leaked pathspec after add_files_to_cache()
test-lib: set LSAN_OPTIONS to abort by default
test-lib: --valgrind should not override --verbose-log
Platforms that ship with a separate sha1 with collision detection
library can link to it instead of using the copy we ship as part of
our source tree.
* ti/external-sha1dc:
sha1dc: allow building with the external sha1dc library
sha1dc: build git plumbing code more explicitly
It's a common pattern in git commands to allocate some
memory that should last for the lifetime of the program and
then not bother to free it, relying on the OS to throw it
away.
This keeps the code simple, and it's fast (we don't waste
time traversing structures or calling free at the end of the
program). But it also triggers warnings from memory-leak
checkers like valgrind or LSAN. They know that the memory
was still allocated at program exit, but they don't know
_when_ the leaked memory stopped being useful. If it was
early in the program, then it's probably a real and
important leak. But if it was used right up until program
exit, it's not an interesting leak and we'd like to suppress
it so that we can see the real leaks.
This patch introduces an UNLEAK() macro that lets us do so.
To understand its design, let's first look at some of the
alternatives.
Unfortunately the suppression systems offered by
leak-checking tools don't quite do what we want. A
leak-checker basically knows two things:
1. Which blocks were allocated via malloc, and the
callstack during the allocation.
2. Which blocks were left un-freed at the end of the
program (and which are unreachable, but more on that
later).
Their suppressions work by mentioning the function or
callstack of a particular allocation, and marking it as OK
to leak. So imagine you have code like this:
int cmd_foo(...)
{
/* this allocates some memory */
char *p = some_function();
printf("%s", p);
return 0;
}
You can say "ignore allocations from some_function(),
they're not leaks". But that's not right. That function may
be called elsewhere, too, and we would potentially want to
know about those leaks.
So you can say "ignore the callstack when main calls
some_function". That works, but your annotations are
brittle. In this case it's only two functions, but you can
imagine that the actual allocation is much deeper. If any of
the intermediate code changes, you have to update the
suppression.
What we _really_ want to say is that "the value assigned to
p at the end of the function is not a real leak". But
leak-checkers can't understand that; they don't know about
"p" in the first place.
However, we can do something a little bit tricky if we make
some assumptions about how leak-checkers work. They
generally don't just report all un-freed blocks. That would
report even globals which are still accessible when the
leak-check is run. Instead they take some set of memory
(like BSS) as a root and mark it as "reachable". Then they
scan the reachable blocks for anything that looks like a
pointer to a malloc'd block, and consider that block
reachable. And then they scan those blocks, and so on,
transitively marking anything reachable from a global as
"not leaked" (or at least leaked in a different category).
So we can mark the value of "p" as reachable by putting it
into a variable with program lifetime. One way to do that is
to just mark "p" as static. But that actually affects the
run-time behavior if the function is called twice (you
aren't likely to call main() twice, but some of our cmd_*()
functions are called from other commands).
Instead, we can trick the leak-checker by putting the value
into _any_ reachable bytes. This patch keeps a global
linked-list of bytes copied from "unleaked" variables. That
list is reachable even at program exit, which confers
recursive reachability on whatever values we unleak.
In other words, you can do:
int cmd_foo(...)
{
char *p = some_function();
printf("%s", p);
UNLEAK(p);
return 0;
}
to annotate "p" and suppress the leak report.
But wait, couldn't we just say "free(p)"? In this toy
example, yes. But UNLEAK()'s byte-copying strategy has
several advantages over actually freeing the memory:
1. It's recursive across structures. In many cases our "p"
is not just a pointer, but a complex struct whose
fields may have been allocated by a sub-function. And
in some cases (e.g., dir_struct) we don't even have a
function which knows how to free all of the struct
members.
By marking the struct itself as reachable, that confers
reachability on any pointers it contains (including those
found in embedded structs, or reachable by walking
heap blocks recursively.
2. It works on cases where we're not sure if the value is
allocated or not. For example:
char *p = argc > 1 ? argv[1] : some_function();
It's safe to use UNLEAK(p) here, because it's not
freeing any memory. In the case that we're pointing to
argv here, the reachability checker will just ignore
our bytes.
3. Likewise, it works even if the variable has _already_
been freed. We're just copying the pointer bytes. If
the block has been freed, the leak-checker will skip
over those bytes as uninteresting.
4. Because it's not actually freeing memory, you can
UNLEAK() before we are finished accessing the variable.
This is helpful in cases like this:
char *p = some_function();
return another_function(p);
Writing this with free() requires:
int ret;
char *p = some_function();
ret = another_function(p);
free(p);
return ret;
But with unleak we can just write:
char *p = some_function();
UNLEAK(p);
return another_function(p);
This patch adds the UNLEAK() macro and enables it
automatically when Git is compiled with SANITIZE=leak. In
normal builds it's a noop, so we pay no runtime cost.
It also adds some UNLEAK() annotations to show off how the
feature works. On top of other recent leak fixes, these are
enough to get t0000 and t0001 to pass when compiled with
LSAN.
Note the case in commit.c which actually converts a
strbuf_release() into an UNLEAK. This code was already
non-leaky, but the free didn't do anything useful, since
we're exiting. Converting it to an annotation means that
non-leak-checking builds pay no runtime cost. The cost is
minimal enough that it's probably not worth going on a
crusade to convert these kinds of frees to UNLEAKS. I did it
here for consistency with the "sb" leak (though it would
have been equally correct to go the other way, and turn them
both into strbuf_release() calls).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to spend more than necessary cycles allocating and freeing
piece of memory while writing each index entry out. This has been
optimized.
* kw/write-index-reduce-alloc:
read-cache: avoid allocating every ondisk entry when writing
read-cache: fix memory leak in do_write_index
perf: add test for writing the index
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.
Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.
In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These used to be for manipulating the in-memory repo_tree structure,
but nowadays they are convenience wrappers to handle a few git-vs-svn
mismatches:
1. Git does not track empty directories but Subversion does. When
looking up a path in git that Subversion thinks exists and finding
nothing, we can safely assume that the path represents a
directory. This is needed when a later Subversion revision
modifies that directory.
2. Subversion allows deleting a file by copying. In Git fast-import
we have to handle that more explicitly as a deletion.
These are details of the tool's interaction with git fast-import.
Move them to fast_export.c, where other such details are handled.
This way the function names do not start with a repo_ prefix that
would clash with the repository object introduced in
v2.14.0-rc0~38^2~16 (repository: introduce the repository object,
2017-06-22) or an svn_ prefix that would clash with libsvn (in case
someone wants to link this code with libsvn some day).
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "ref-store" code reorganization continues.
* mh/packed-ref-store: (32 commits)
files-backend: cheapen refname_available check when locking refs
packed_ref_store: handle a packed-refs file that is a symlink
read_packed_refs(): die if `packed-refs` contains bogus data
t3210: add some tests of bogus packed-refs file contents
repack_without_refs(): don't lock or unlock the packed refs
commit_packed_refs(): remove call to `packed_refs_unlock()`
clear_packed_ref_cache(): don't protest if the lock is held
packed_refs_unlock(), packed_refs_is_locked(): new functions
packed_refs_lock(): report errors via a `struct strbuf *err`
packed_refs_lock(): function renamed from lock_packed_refs()
commit_packed_refs(): use a staging file separate from the lockfile
commit_packed_refs(): report errors rather than dying
packed_ref_store: make class into a subclass of `ref_store`
packed-backend: new module for handling packed references
packed_read_raw_ref(): new function, replacing `resolve_packed_ref()`
packed_ref_store: support iteration
packed_peel_ref(): new function, extracted from `files_peel_ref()`
repack_without_refs(): take a `packed_ref_store *` parameter
get_packed_ref(): take a `packed_ref_store *` parameter
rollback_packed_refs(): take a `packed_ref_store *` parameter
...
A performance test for writing the index to be able to
determine if changes to allocating ondisk structure help.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some distros provide SHA1 collision-detect code as a shared library.
It's the same code as we have in git tree (but may be with a different
init default for hash), and git can link with it as well; at least, it
may make maintenance easier, according to our security guys.
This patch allows user to build git linking with the external sha1dc
library instead of the built-in code. User needs to define
DC_SHA1_EXTERNAL explicitly. As default without it, the built-in
sha1dc code is used like before.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>