The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.
* jk/pack-idx-corruption-safety:
sha1_file.c: mark strings for translation
use_pack: handle signed off_t overflow
nth_packed_object_offset: bounds-check extended offset
t5313: test bounds-checks of corrupted/malicious pack/idx files
The "v(iew)" subcommand of the interactive "git am -i" command was
broken in 2.6.0 timeframe when the command was rewritten in C.
* jc/am-i-v-fix:
am -i: fix "v"iew
pager: factor out a helper to prepare a child process to run the pager
pager: lose a separate argv[]
Many codepaths forget to check return value from git_config_set();
the function is made to die() to make sure we do not proceed when
setting a configuration variable failed.
* ps/config-error:
config: rename git_config_set_or_die to git_config_set
config: rename git_config_set to git_config_set_gently
compat: die when unable to set core.precomposeunicode
sequencer: die on config error when saving replay opts
init-db: die on config errors when initializing empty repo
clone: die on config error in cmd_clone
remote: die on config error when manipulating remotes
remote: die on config error when setting/adding branches
remote: die on config error when setting URL
submodule--helper: die on config error when cloning module
submodule: die on config error when linking modules
branch: die on config error when editing branch description
branch: die on config error when unsetting upstream
branch: report errors in tracking branch setup
config: introduce set_or_die wrappers
If a pack .idx file has a corrupted offset for an object, we
may try to access an offset in the .idx or .pack file that
is larger than the file's size. For the .pack case, we have
use_pack() to protect us, which realizes the access is out
of bounds. But if the corrupted value asks us to look in the
.idx file's secondary 64-bit offset table, we blindly add it
to the mmap'd index data and access arbitrary memory.
We can fix this with a simple bounds-check compared to the
size we found when we opened the .idx file.
Note that there's similar code in index-pack that is
triggered only during "index-pack --verify". To support
both, we pull the bounds-check into a separate function,
which dies when it sees a corrupted file.
It would be nice if we could return an error, so that the
pack code could try to find a good copy of the object
elsewhere. Currently nth_packed_object_offset doesn't have
any way to return an error, but it could probably use "0" as
a sentinel value (since no object can start there). This is
the minimal fix, and we can improve the resilience later on
top.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Paths that have been told the index about with "add -N" are not
quite yet in the index, but a few commands behaved as if they
already are in a harmful way.
* nd/ita-cleanup:
grep: make it clear i-t-a entries are ignored
add and use a convenience macro ce_intent_to_add()
blame: remove obsolete comment
Rename git_config_set_or_die functions to git_config_set, leading
to the new default behavior of dying whenever a configuration
error occurs.
By now all callers that shall die on error have been transitioned
to the _or_die variants, thus making this patch a simple rename
of the functions.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The desired default behavior for `git_config_set` is to die
whenever an error occurs. Dying is the default for a lot of
internal functions when failures occur and is in this case the
right thing to do for most callers as otherwise we might run into
inconsistent repositories without noticing.
As some code may rely on the actual return values for
`git_config_set` we still require the ability to invoke these
functions without aborting. Rename the existing `git_config_set`
functions to `git_config_set_gently` to keep them available for
those callers.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running a pager, we need to run the program git_pager() gave
us, but we need to make sure we spawn it via the shell (i.e. it is
valid to say PAGER='less -S', for example) and give default values
to $LESS and $LV environment variables. Factor out these details
to a separate helper function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A lot of call-sites for the existing family of `git_config_set`
functions do not check for errors that may occur, e.g. when the
configuration file is locked. In many cases we simply want to die
when such a situation arises.
Introduce wrappers that will cause the program to die in those
cases. These wrappers are temporary only to ease the transition
to let `git_config_set` die by default. They will be removed
later on when `git_config_set` itself has been replaced by
`git_config_set_gently`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The underlying machinery used by "ls-files -o" and other commands
have been taught not to create empty submodule ref cache for a
directory that is not a submodule. This removes a ton of wasted
CPU cycles.
* jk/ref-cache-non-repository-optim:
resolve_gitlink_ref: ignore non-repository paths
clean: make is_git_repository a public function
A few unportable C construct have been spotted by clang compiler
and have been fixed.
* jk/clang-pedantic:
bswap: add NO_UNALIGNED_LOADS define
avoid shifting signed integers 31 bits
We have always had is_git_directory(), for looking at a
specific directory to see if it contains a git repo. In
0179ca7 (clean: improve performance when removing lots of
directories, 2015-06-15), we added is_git_repository() which
checks for a non-bare repository by looking at its ".git"
entry.
However, the fix in 0179ca7 needs to be applied other
places, too. Let's make this new helper globally available.
We need to give it a better name, though, to avoid confusion
with is_git_directory(). This patch does that, documents
both functions with a comment to reduce confusion, and
removes the clean-specific references in the comments.
Based-on-a-patch-by: Andreas Krey <a.krey@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We sometimes use 32-bit unsigned integers as bit-fields.
It's fine to access the MSB, because it's unsigned. However,
doing so as "1 << 31" is wrong, because the constant "1" is
a signed int, and we shift into the sign bit, causing
undefined behavior.
We can fix this by using "1U" as the constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"format-patch" has learned a new option to zero-out the commit
object name on the mbox "From " line.
* bc/format-patch-null-from-line:
format-patch: check that header line has expected format
format-patch: add an option to suppress commit hash
sha1_file.c: introduce a null_oid constant
null_oid is the struct object_id equivalent to null_sha1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
More transition from "unsigned char[40]" to "struct object_id".
This needed a few merge fixups, but is mostly disentangled from other
topics.
* bc/object-id:
remote: convert functions to struct object_id
Remove get_object_hash.
Convert struct object to object_id
Add several uses of get_object_hash.
object: introduce get_object_hash macro.
ref_newer: convert to use struct object_id
push_refs_with_export: convert to struct object_id
get_remote_heads: convert to struct object_id
parse_fetch: convert to use struct object_id
add_sought_entry_mem: convert to struct object_id
Convert struct ref to use object_id.
sha1_file: introduce has_object_file helper.
Code preparation for pluggable ref backends.
* dt/refs-backend-pre-vtable:
refs: break out ref conflict checks
files_log_ref_write: new function
initdb: make safe_create_dir public
refs: split filesystem-based refs code into a new file
refs/refs-internal.h: new header file
refname_is_safe(): improve docstring
pack_if_possible_fn(): use ref_type() instead of is_per_worktree_ref()
copy_msg(): rename to copy_reflog_msg()
verify_refname_available(): new function
verify_refname_available(): rename function
Apple's common crypto implementation of SHA1_Update() does not take
more than 4GB at a time, and we now have a compile-time workaround
for it.
* ad/sha1-update-chunked:
sha1: allow limiting the size of the data passed to SHA1_Update()
sha1: provide another level of indirection for the SHA-1 functions
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.
We may want to do the same for .bitmap (and notice but not prune
.keep) without corresponding .pack, but that can be a separate
topic.
* dk/gc-idx-wo-pack:
gc: remove garbage .idx files from pack dir
t5304: test cleaning pack garbage
prepare_packed_git(): refactor garbage reporting in pack directory
Apple's common crypto implementation of SHA1_Update() does not take
more than 4GB at a time, and we now have a compile-time workaround
for it.
* ad/sha1-update-chunked:
sha1: allow limiting the size of the data passed to SHA1_Update()
sha1: provide another level of indirection for the SHA-1 functions
Add has_object_file, which is a wrapper around has_sha1_file, but for
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.
* dk/gc-idx-wo-pack:
gc: remove garbage .idx files from pack dir
t5304: test cleaning pack garbage
prepare_packed_git(): refactor garbage reporting in pack directory
Soon we will want to create initdb functions for ref backends, and
code from initdb that calls this function needs to move into the files
backend. So this function needs to be public.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Using the previous commit's inredirection mechanism for SHA1,
support a chunked implementation of SHA1_Update() that limits the
amount of data in the chunk passed to SHA1_Update().
This is enabled by using the Makefile variable SHA1_MAX_BLOCK_SIZE
to specify chunk size. When using Apple's CommonCrypto library this
is set to 1GiB (the implementation cannot handle more 4GiB).
Signed-off-by: Atousa Pahlevan Duprat <apahlevan@ieee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git source uses git_SHA1_Update() and friends to call into the
code that computes the hashes. Traditionally, we used to map these
directly to underlying implementation of the SHA-1 hash (e.g.
SHA1_Update() from OpenSSL or blk_SHA1_Update() from block-sha1/).
This arrangement however makes it hard to tweak behaviour of the
underlying implementation without fully replacing. If we want to
introduce a tweaked_SHA1_Update() wrapper to implement the "Update"
in a slightly different way, for example, the implementation of the
wrapper still would want to call into the underlying implementation,
but tweaked_SHA1_Update() cannot call git_SHA1_Update() to get to
the underlying implementation (often but not always SHA1_Update()).
Add another level of indirection that maps platform_SHA1_Update()
and friends to their underlying implementations, and by default make
git_SHA1_Update() and friends map to platform_SHA1_* functions.
Doing it this way will later allow us to map git_SHA1_Update() to
tweaked_SHA1_Update(), and the latter can use platform_SHA1_Update()
in its implementation.
Signed-off-by: Atousa Pahlevan Duprat <apahlevan@ieee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
The submodule code has been taught to work better with separate
work trees created via "git worktree add".
* mk/submodule-gitdir-path:
path: implement common_dir handling in git_pathdup_submodule()
submodule refactor: use strbuf_git_path_submodule() in add_submodule_odb()
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.
* js/clone-dissociate:
clone --dissociate: avoid locking pack files
sha1_file.c: add a function to release all packs
sha1_file: consolidate code to close a pack's file descriptor
t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.
* jk/repository-extension:
introduce "preciousObjects" repository extension
introduce "extensions" form of core.repositoryformatversion
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.
* jk/repository-extension:
introduce "preciousObjects" repository extension
introduce "extensions" form of core.repositoryformatversion
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.
During merges, we free entries that we no longer need in the
destination index. But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.
To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry. This entails some refactoring of code
that expects the cache_entry.
Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.
Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.
* js/clone-dissociate:
clone --dissociate: avoid locking pack files
sha1_file.c: add a function to release all packs
sha1_file: consolidate code to close a pack's file descriptor
t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
The submodule code has been taught to work better with separate
work trees created via "git worktree add".
* mk/submodule-gitdir-path:
path: implement common_dir handling in git_pathdup_submodule()
submodule refactor: use strbuf_git_path_submodule() in add_submodule_odb()
On Windows, files that are in use cannot be removed or renamed. That
means that we have to release pack files when we are about to, say,
repack them. Let's introduce a convenient function to close all the
pack files and their idx files.
While at it, we consolidate the close windows/close fd/close index
stanza in `free_pack_by_name()` into the `close_pack()` function that
is used by the new `close_all_packs()` function to avoid repeated code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --date=local" used to only show the normal (default)
format in the local timezone. The command learned to take 'local'
as an instruction to use the local timezone with other formats,
e.g. "git show --date=rfc-local".
* jk/date-local:
t6300: add tests for "-local" date formats
t6300: make UTC and local dates different
date: make "local" orthogonal to date format
date: check for "local" before anything else
t6300: add test for "raw" date format
t6300: introduce test_date() helper
fast-import: switch crash-report date to iso8601
Documentation/rev-list: don't list date formats
Documentation/git-for-each-ref: don't list date formats
Documentation/config: don't list date formats
Documentation/blame-options: don't list date formats
We have the path "foo.idx", and we create a buffer big
enough to hold "foo.pack" and "foo.keep", and then strcpy
straight into it. This isn't a bug (we have enough space),
but it's very hard to tell from the strcpy that this is so.
Let's instead use strip_suffix to take off the ".idx",
record the size of our allocation, and use xsnprintf to make
sure we don't violate our assumptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sha1_to_hex and find_unique_abbrev functions always
write into reusable static buffers. There are a few problems
with this:
- future calls overwrite our result. This is especially
annoying with find_unique_abbrev, which does not have a
ring of buffers, so you cannot even printf() a result
that has two abbreviated sha1s.
- if you want to put the result into another buffer, we
often strcpy, which looks suspicious when auditing for
overflows.
This patch introduces sha1_to_hex_r and find_unique_abbrev_r,
which write into a user-provided buffer. Of course this is
just punting on the overflow-auditing, as the buffer
obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
much easier to audit, since that is a well-known size.
We retain the non-reentrant forms, which just become thin
wrappers around the reentrant ones. This patch also adds a
strbuf variant of find_unique_abbrev, which will be handy in
later patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a function that uses git_path a lot, but would
prefer to avoid the static buffers, it's useful to keep a
single scratch buffer locally and reuse it for each call.
You used to be able to do this with git_snpath:
char buf[PATH_MAX];
foo(git_snpath(buf, sizeof(buf), "foo"));
bar(git_snpath(buf, sizeof(buf), "bar"));
but since 1a83c24, git_snpath has been replaced with
strbuf_git_path. This is good, because it removes the
arbitrary PATH_MAX limit. But using strbuf_git_path is more
awkward for two reasons:
1. It adds to the buffer, rather than replacing it. This
is consistent with other strbuf functions, but makes
reuse of a single buffer more tedious.
2. It doesn't return the buffer, so you can't format
as part of a function's arguments.
The new git_path_buf solves both of these, so you can use it
like:
struct strbuf buf = STRBUF_INIT;
foo(git_path_buf(&buf, "foo"));
bar(git_path_buf(&buf, "bar"));
strbuf_release(&buf);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When submodule is a linked worktree, "git diff --submodule" and other
calls which directly access the submodule's object database do not correctly
calculate its path. Fix it by changing the git_pathdup_submodule() behavior,
to use either common or per-worktree directory.
Do it similarly as for parent repository, but ignore the GIT_COMMON_DIR
environment variable, because it would mean common directory for the parent
repository and does not make sense for submodule.
Also add test for functionality which uses this call.
Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our "--date" modes are about the format of the date:
which items we show and in what order. But "--date=local" is
a bit of an oddball. It means "show the date in the normal
format, but using the local timezone". The timezone we use
is orthogonal to the actual format, and there is no reason
we could not have "localized iso8601", etc.
This patch adds a "local" boolean field to "struct
date_mode", and drops the DATE_LOCAL element from the
date_mode_type enum (it's now just DATE_NORMAL plus
local=1). The new feature is accessible to users by adding
"-local" to any date mode (e.g., "iso-local"), and we retain
"local" as an alias for "default-local" for backwards
compatibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The client side codepaths in "git push" have been cleaned up
and the user can request to perform an optional "signed push",
i.e. sign only when the other end accepts signed push.
* db/push-sign-if-asked:
push: add a config option push.gpgSign for default signed pushes
push: support signing pushes iff the server supports it
builtin/send-pack.c: use parse_options API
config.c: rename git_config_maybe_bool_text and export it as git_parse_maybe_bool
transport: remove git_transport_options.push_cert
gitremote-helpers.txt: document pushcert option
Documentation/git-send-pack.txt: document --signed
Documentation/git-send-pack.txt: wrap long synopsis line
Documentation/git-push.txt: document when --signed may fail
Recent reimplementation of "git am" changed the format of state
files kept in $GIT_DIR/rebase-apply/ without meaning to do so,
primarily because write_file() API was cumbersome to use and it was
easy to mistakenly make text files with incomplete lines. Update
write_file() interface to make it harder to misuse.
* jc/am-state-fix:
write_file(): drop caller-supplied LF from calls to create a one-liner file
write_file_v(): do not leave incomplete line at the end
write_file(): drop "fatal" parameter
builtin/am: make sure state files are text
builtin/am: introduce write_state_*() helper functions
Because the configuration system does not allow "alias.0foo" and
"pager.0foo" as the configuration key, the user cannot use '0foo'
as a custom command name anyway, but "git 0foo" tried to look these
keys up and emitted useless warnings before saying '0foo is not a
git command'. These warning messages have been squelched.
* jk/fix-alias-pager-config-key-warnings:
config: silence warnings for command names with invalid keys
All callers except three passed 1 for the "fatal" parameter to ask
this function to die upon error, but to a casual reader of the code,
it was not all obvious what that 1 meant. Instead, split the
function into two based on a common write_file_v() that takes the
flag, introduce write_file_gently() as a new way to attempt creating
a file without dying on error, and make three callers to call it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are running the git command "foo", we may have to
look up the config keys "pager.foo" and "alias.foo". These
config schemes are mis-designed, as the command names can be
anything, but the config syntax has some restrictions. For
example:
$ git foo_bar
error: invalid key: pager.foo_bar
error: invalid key: alias.foo_bar
git: 'foo_bar' is not a git command. See 'git --help'.
You cannot name an alias with an underscore. And if you have
an external command with one, you cannot configure its
pager.
In the long run, we may develop a different config scheme
for these features. But in the near term (and because we'll
need to support the existing scheme indefinitely), we should
at least squelch the error messages shown above.
These errors come from git_config_parse_key. Ideally we
would pass a "quiet" flag to the config machinery, but there
are many layers between the pager code and the key parsing.
Passing a flag through all of those would be an invasive
change.
Instead, let's provide a config function to report on
whether a key is syntactically valid, and have the pager and
alias code skip lookup for bogus keys. We can build this
easily around the existing git_config_parse_key, with two
minor modifications:
1. We now handle a NULL store_key, to validate but not
write out the normalized key.
2. We accept a "quiet" flag to avoid writing to stderr.
This doesn't need to be a full-blown public "flags"
field, because we can make the existing implementation
a static helper function, keeping the mess contained
inside config.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4. Their uses have been reduced.
* jk/git-path:
memoize common git-path "constant" files
get_repo_path: refactor path-allocation
find_hook: keep our own static buffer
refs.c: remove_empty_directories can take a strbuf
refs.c: avoid git_path assignment in lock_ref_sha1_basic
refs.c: avoid repeated git_path calls in rename_tmp_log
refs.c: simplify strbufs in reflog setup and writing
path.c: drop git_path_submodule
refs.c: remove extra git_path calls from read_loose_refs
remote.c: drop extraneous local variable from migrate_file
prefer mkpathdup to mkpath in assignments
prefer git_pathdup to git_path in some possibly-dangerous cases
add_to_alternates_file: don't add duplicate entries
t5700: modernize style
cache.h: complete set of git_path_submodule helpers
cache.h: clarify documentation for git_path, et al