Add has_object_file, which is a wrapper around has_sha1_file, but for
struct object_id.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.
* dk/gc-idx-wo-pack:
gc: remove garbage .idx files from pack dir
t5304: test cleaning pack garbage
prepare_packed_git(): refactor garbage reporting in pack directory
Soon we will want to create initdb functions for ref backends, and
code from initdb that calls this function needs to move into the files
backend. So this function needs to be public.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
Using the previous commit's inredirection mechanism for SHA1,
support a chunked implementation of SHA1_Update() that limits the
amount of data in the chunk passed to SHA1_Update().
This is enabled by using the Makefile variable SHA1_MAX_BLOCK_SIZE
to specify chunk size. When using Apple's CommonCrypto library this
is set to 1GiB (the implementation cannot handle more 4GiB).
Signed-off-by: Atousa Pahlevan Duprat <apahlevan@ieee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git source uses git_SHA1_Update() and friends to call into the
code that computes the hashes. Traditionally, we used to map these
directly to underlying implementation of the SHA-1 hash (e.g.
SHA1_Update() from OpenSSL or blk_SHA1_Update() from block-sha1/).
This arrangement however makes it hard to tweak behaviour of the
underlying implementation without fully replacing. If we want to
introduce a tweaked_SHA1_Update() wrapper to implement the "Update"
in a slightly different way, for example, the implementation of the
wrapper still would want to call into the underlying implementation,
but tweaked_SHA1_Update() cannot call git_SHA1_Update() to get to
the underlying implementation (often but not always SHA1_Update()).
Add another level of indirection that maps platform_SHA1_Update()
and friends to their underlying implementations, and by default make
git_SHA1_Update() and friends map to platform_SHA1_* functions.
Doing it this way will later allow us to map git_SHA1_Update() to
tweaked_SHA1_Update(), and the latter can use platform_SHA1_Update()
in its implementation.
Signed-off-by: Atousa Pahlevan Duprat <apahlevan@ieee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
The submodule code has been taught to work better with separate
work trees created via "git worktree add".
* mk/submodule-gitdir-path:
path: implement common_dir handling in git_pathdup_submodule()
submodule refactor: use strbuf_git_path_submodule() in add_submodule_odb()
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.
* js/clone-dissociate:
clone --dissociate: avoid locking pack files
sha1_file.c: add a function to release all packs
sha1_file: consolidate code to close a pack's file descriptor
t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.
* jk/repository-extension:
introduce "preciousObjects" repository extension
introduce "extensions" form of core.repositoryformatversion
The name-hash subsystem that is used to cope with case insensitive
filesystems keeps track of directories and their on-filesystem
cases for all the paths in the index by holding a pointer to a
randomly chosen cache entry that is inside the directory (for its
ce->ce_name component). This pointer was not updated even when the
cache entry was removed from the index, leading to use after free.
This was fixed by recording the path for each directory instead of
borrowing cache entries and restructuring the API somewhat.
* dt/name-hash-dir-entry-fix:
name-hash: don't reuse cache_entry in dir_entry
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.
* jk/repository-extension:
introduce "preciousObjects" repository extension
introduce "extensions" form of core.repositoryformatversion
Stop reusing cache_entry in dir_entry; doing so causes a
use-after-free bug.
During merges, we free entries that we no longer need in the
destination index. But those entries might have also been stored in
the dir_entry cache, and when a later call to add_to_index found them,
they would be used after being freed.
To prevent this, change dir_entry to store a copy of the name instead
of a pointer to a cache_entry. This entails some refactoring of code
that expects the cache_entry.
Keith McGuigan <kmcguigan@twitter.com> diagnosed this bug and wrote
the initial patch, but this version does not use any of Keith's code.
Helped-by: Keith McGuigan <kmcguigan@twitter.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.
* js/clone-dissociate:
clone --dissociate: avoid locking pack files
sha1_file.c: add a function to release all packs
sha1_file: consolidate code to close a pack's file descriptor
t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
The submodule code has been taught to work better with separate
work trees created via "git worktree add".
* mk/submodule-gitdir-path:
path: implement common_dir handling in git_pathdup_submodule()
submodule refactor: use strbuf_git_path_submodule() in add_submodule_odb()
On Windows, files that are in use cannot be removed or renamed. That
means that we have to release pack files when we are about to, say,
repack them. Let's introduce a convenient function to close all the
pack files and their idx files.
While at it, we consolidate the close windows/close fd/close index
stanza in `free_pack_by_name()` into the `close_pack()` function that
is used by the new `close_all_packs()` function to avoid repeated code.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git log --date=local" used to only show the normal (default)
format in the local timezone. The command learned to take 'local'
as an instruction to use the local timezone with other formats,
e.g. "git show --date=rfc-local".
* jk/date-local:
t6300: add tests for "-local" date formats
t6300: make UTC and local dates different
date: make "local" orthogonal to date format
date: check for "local" before anything else
t6300: add test for "raw" date format
t6300: introduce test_date() helper
fast-import: switch crash-report date to iso8601
Documentation/rev-list: don't list date formats
Documentation/git-for-each-ref: don't list date formats
Documentation/config: don't list date formats
Documentation/blame-options: don't list date formats
We have the path "foo.idx", and we create a buffer big
enough to hold "foo.pack" and "foo.keep", and then strcpy
straight into it. This isn't a bug (we have enough space),
but it's very hard to tell from the strcpy that this is so.
Let's instead use strip_suffix to take off the ".idx",
record the size of our allocation, and use xsnprintf to make
sure we don't violate our assumptions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The sha1_to_hex and find_unique_abbrev functions always
write into reusable static buffers. There are a few problems
with this:
- future calls overwrite our result. This is especially
annoying with find_unique_abbrev, which does not have a
ring of buffers, so you cannot even printf() a result
that has two abbreviated sha1s.
- if you want to put the result into another buffer, we
often strcpy, which looks suspicious when auditing for
overflows.
This patch introduces sha1_to_hex_r and find_unique_abbrev_r,
which write into a user-provided buffer. Of course this is
just punting on the overflow-auditing, as the buffer
obviously needs to be GIT_SHA1_HEXSZ + 1 bytes. But it is
much easier to audit, since that is a well-known size.
We retain the non-reentrant forms, which just become thin
wrappers around the reentrant ones. This patch also adds a
strbuf variant of find_unique_abbrev, which will be handy in
later patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a function that uses git_path a lot, but would
prefer to avoid the static buffers, it's useful to keep a
single scratch buffer locally and reuse it for each call.
You used to be able to do this with git_snpath:
char buf[PATH_MAX];
foo(git_snpath(buf, sizeof(buf), "foo"));
bar(git_snpath(buf, sizeof(buf), "bar"));
but since 1a83c24, git_snpath has been replaced with
strbuf_git_path. This is good, because it removes the
arbitrary PATH_MAX limit. But using strbuf_git_path is more
awkward for two reasons:
1. It adds to the buffer, rather than replacing it. This
is consistent with other strbuf functions, but makes
reuse of a single buffer more tedious.
2. It doesn't return the buffer, so you can't format
as part of a function's arguments.
The new git_path_buf solves both of these, so you can use it
like:
struct strbuf buf = STRBUF_INIT;
foo(git_path_buf(&buf, "foo"));
bar(git_path_buf(&buf, "bar"));
strbuf_release(&buf);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When submodule is a linked worktree, "git diff --submodule" and other
calls which directly access the submodule's object database do not correctly
calculate its path. Fix it by changing the git_pathdup_submodule() behavior,
to use either common or per-worktree directory.
Do it similarly as for parent repository, but ignore the GIT_COMMON_DIR
environment variable, because it would mean common directory for the parent
repository and does not make sense for submodule.
Also add test for functionality which uses this call.
Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our "--date" modes are about the format of the date:
which items we show and in what order. But "--date=local" is
a bit of an oddball. It means "show the date in the normal
format, but using the local timezone". The timezone we use
is orthogonal to the actual format, and there is no reason
we could not have "localized iso8601", etc.
This patch adds a "local" boolean field to "struct
date_mode", and drops the DATE_LOCAL element from the
date_mode_type enum (it's now just DATE_NORMAL plus
local=1). The new feature is accessible to users by adding
"-local" to any date mode (e.g., "iso-local"), and we retain
"local" as an alias for "default-local" for backwards
compatibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The client side codepaths in "git push" have been cleaned up
and the user can request to perform an optional "signed push",
i.e. sign only when the other end accepts signed push.
* db/push-sign-if-asked:
push: add a config option push.gpgSign for default signed pushes
push: support signing pushes iff the server supports it
builtin/send-pack.c: use parse_options API
config.c: rename git_config_maybe_bool_text and export it as git_parse_maybe_bool
transport: remove git_transport_options.push_cert
gitremote-helpers.txt: document pushcert option
Documentation/git-send-pack.txt: document --signed
Documentation/git-send-pack.txt: wrap long synopsis line
Documentation/git-push.txt: document when --signed may fail
Recent reimplementation of "git am" changed the format of state
files kept in $GIT_DIR/rebase-apply/ without meaning to do so,
primarily because write_file() API was cumbersome to use and it was
easy to mistakenly make text files with incomplete lines. Update
write_file() interface to make it harder to misuse.
* jc/am-state-fix:
write_file(): drop caller-supplied LF from calls to create a one-liner file
write_file_v(): do not leave incomplete line at the end
write_file(): drop "fatal" parameter
builtin/am: make sure state files are text
builtin/am: introduce write_state_*() helper functions
Because the configuration system does not allow "alias.0foo" and
"pager.0foo" as the configuration key, the user cannot use '0foo'
as a custom command name anyway, but "git 0foo" tried to look these
keys up and emitted useless warnings before saying '0foo is not a
git command'. These warning messages have been squelched.
* jk/fix-alias-pager-config-key-warnings:
config: silence warnings for command names with invalid keys
All callers except three passed 1 for the "fatal" parameter to ask
this function to die upon error, but to a casual reader of the code,
it was not all obvious what that 1 meant. Instead, split the
function into two based on a common write_file_v() that takes the
flag, introduce write_file_gently() as a new way to attempt creating
a file without dying on error, and make three callers to call it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are running the git command "foo", we may have to
look up the config keys "pager.foo" and "alias.foo". These
config schemes are mis-designed, as the command names can be
anything, but the config syntax has some restrictions. For
example:
$ git foo_bar
error: invalid key: pager.foo_bar
error: invalid key: alias.foo_bar
git: 'foo_bar' is not a git command. See 'git --help'.
You cannot name an alias with an underscore. And if you have
an external command with one, you cannot configure its
pager.
In the long run, we may develop a different config scheme
for these features. But in the near term (and because we'll
need to support the existing scheme indefinitely), we should
at least squelch the error messages shown above.
These errors come from git_config_parse_key. Ideally we
would pass a "quiet" flag to the config machinery, but there
are many layers between the pager code and the key parsing.
Passing a flag through all of those would be an invasive
change.
Instead, let's provide a config function to report on
whether a key is syntactically valid, and have the pager and
alias code skip lookup for bogus keys. We can build this
easily around the existing git_config_parse_key, with two
minor modifications:
1. We now handle a NULL store_key, to validate but not
write out the normalized key.
2. We accept a "quiet" flag to avoid writing to stderr.
This doesn't need to be a full-blown public "flags"
field, because we can make the existing implementation
a static helper function, keeping the mess contained
inside config.c.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4. Their uses have been reduced.
* jk/git-path:
memoize common git-path "constant" files
get_repo_path: refactor path-allocation
find_hook: keep our own static buffer
refs.c: remove_empty_directories can take a strbuf
refs.c: avoid git_path assignment in lock_ref_sha1_basic
refs.c: avoid repeated git_path calls in rename_tmp_log
refs.c: simplify strbufs in reflog setup and writing
path.c: drop git_path_submodule
refs.c: remove extra git_path calls from read_loose_refs
remote.c: drop extraneous local variable from migrate_file
prefer mkpathdup to mkpath in assignments
prefer git_pathdup to git_path in some possibly-dangerous cases
add_to_alternates_file: don't add duplicate entries
t5700: modernize style
cache.h: complete set of git_path_submodule helpers
cache.h: clarify documentation for git_path, et al
This helper function does not complain about the config variable
but just silently reports failure to the caller. It is useful for
callers that need to parse any string that could be boolean or other
string (e.g. tristate yes/no/auto).
Signed-off-by: Dave Borowitz <dborowitz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hook to report "garbage" files in $GIT_OBJECT_DIRECTORY/pack/
could be generic but is too specific to count-object's needs.
Move the part to produce human-readable messages to count-objects,
and refine the interface to callback with the "bits" with values
defined in the cache.h header file, so that other callers (e.g.
prune) can later use the same mechanism to enumerate different
kinds of garbage files and do something intelligent about them,
other than reporting in textual messages.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are no callers of the slightly-dangerous static-buffer
git_path_submodule left. Let's drop it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git_path function has "git_pathdup" and
"strbuf_git_path" variants, but git_submodule_path only
comes in the dangerous, static-buffer variant. That makes
refactoring callers to use the safer functions hard (since
they don't exist).
Since we're already using a strbuf behind the scenes, it's
easy to expose all three of these interfaces with thin
wrappers.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The comment above these functions actually describes
sha1_file_name, and comes from the very first revision of
git. Commit 723c31f (Add "git_path()" and "head_ref()"
helper functions., 2005-07-05) added git_path, pushing the
comment away from the function it describes; later commits
added more functions in this block.
Let's fix the comment to describe these related functions in
more detail. Let's also make sure to point out their safer
alternatives (and move those alternatives below, which makes
more sense when reading the file).
Note that we do not need to move the existing comment to
sha1_file_name. Commit d40d535 (sha1_file.c: document a
bunch of functions defined in the file, 2014-02-21) already
added a much more descriptive comment to it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not
loosen, 2009-03-25), we kept reminding ourselves:
NEEDSWORK: this should be renamed to finalize_temp_file() as
"moving" is only a part of what it does, when no patch between
master to pu changes the call sites of this function.
without doing anything about it. Let's do so.
The purpose of this function was not to move but to finalize. The
detail of the primarily implementation of finalizing was to link the
temporary file to its final name and then to unlink, which wasn't
even "moving". The alternative implementation did "move" by calling
rename(2), which is a fun tangent.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).
* jk/date-mode-format:
strbuf: make strbuf_addftime more robust
introduce "format" date-mode
convert "enum date_mode" into a struct
show-branch: use DATE_RELATIVE instead of magic number
Clean up refs API and make "git clone" less intimate with the
implementation detail.
* mh/init-delete-refs-api:
delete_ref(): use the usual convention for old_sha1
cmd_update_ref(): make logic more straightforward
update_ref(): don't read old reference value before delete
check_branch_commit(): make first parameter const
refs.h: add some parameter names to function declarations
refs: move the remaining ref module declarations to refs.h
initial_ref_transaction_commit(): check for ref D/F conflicts
initial_ref_transaction_commit(): check for duplicate refs
refs: remove some functions from the module's public interface
initial_ref_transaction_commit(): function for initial ref creation
repack_without_refs(): make function private
prune_refs(): use delete_refs()
prune_remote(): use delete_refs()
delete_refs(): bail early if the packed-refs file cannot be rewritten
delete_refs(): make error message more generic
delete_refs(): new function for the refs API
delete_ref(): handle special case more explicitly
remove_branches(): remove temporary
delete_ref(): move declaration to refs.h
Replace "is this subdirectory a separate repository that should not
be touched?" check "git clean" does by checking if it has .git/HEAD
using the submodule-related code with a more optimized check.
* ee/clean-remove-dirs:
read_gitfile_gently: fix use-after-free
clean: improve performance when removing lots of directories
p7300: add performance tests for clean
t7300: add tests to document behavior of clean and nested git
setup: sanity check file size in read_gitfile_gently
setup: add gentle version of read_gitfile
Add an environment variable to tell Git to look into refs hierarchy
other than refs/replace/ for the object replacement data.
* mh/replace-refs:
Allow to control where the replace refs are looked for
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.
* jk/index-pack-reduce-recheck:
index-pack: avoid excessive re-reading of pack directory
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.
Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor. However, the tricky case is where we use the
enum labels as constants, like:
show_date(t, tz, DATE_NORMAL);
Ideally we could say:
show_date(t, tz, &{ DATE_NORMAL });
but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:
1. Manually add a "struct date_mode d = { DATE_NORMAL }"
definition to each caller, and pass "&d". This makes
the callers uglier, because they sometimes do not even
have their own scope (e.g., they are inside a switch
statement).
2. Provide a pre-made global "date_normal" struct that can
be passed by address. We'd also need "date_rfc2822",
"date_iso8601", and so forth. But at least the ugliness
is defined in one place.
3. Provide a wrapper that generates the correct struct on
the fly. The big downside is that we end up pointing to
a single global, which makes our wrapper non-reentrant.
But show_date is already not reentrant, so it does not
matter.
This patch implements 3, along with a minor macro to keep
the size of the callers sane.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If this extension is used in a repository, then no
operations should run which may drop objects from the object
storage. This can be useful if you are sharing that storage
with other repositories whose refs you cannot see.
For instance, if you do:
$ git clone -s parent child
$ git -C parent config extensions.preciousObjects true
$ git -C parent config core.repositoryformatversion 1
you now have additional safety when running git in the
parent repository. Prunes and repacks will bail with an
error, and `git gc` will skip those operations (it will
continue to pack refs and do other non-object operations).
Older versions of git, when run in the repository, will
fail on every operation.
Note that we do not set the preciousObjects extension by
default when doing a "clone -s", as doing so breaks
backwards compatibility. It is a decision the user should
make explicitly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally we try to avoid bumps of the whole-repository
core.repositoryformatversion field. However, it is
unavoidable if we want to safely change certain aspects of
git in a backwards-incompatible way (e.g., modifying the set
of ref tips that we must traverse to generate a list of
unreachable, safe-to-prune objects).
If we were to bump the repository version for every such
change, then any implementation understanding version `X`
would also have to understand `X-1`, `X-2`, and so forth,
even though the incompatibilities may be in orthogonal parts
of the system, and there is otherwise no reason we cannot
implement one without the other (or more importantly, that
the user cannot choose to use one feature without the other,
weighing the tradeoff in compatibility only for that
particular feature).
This patch documents the existing repositoryformatversion
strategy and introduces a new format, "1", which lets a
repository specify that it must run with an arbitrary set of
extensions. This can be used, for example:
- to inform git that the objects should not be pruned based
only on the reachability of the ref tips (e.g, because it
has "clone --shared" children)
- that the refs are stored in a format besides the usual
"refs" and "packed-refs" directories
Because we bump to format "1", and because format "1"
requires that a running git knows about any extensions
mentioned, we know that older versions of the code will not
do something dangerous when confronted with these new
formats.
For example, if the user chooses to use database storage for
refs, they may set the "extensions.refbackend" config to
"db". Older versions of git will not understand format "1"
and bail. Versions of git which understand "1" but do not
know about "refbackend", or which know about "refbackend"
but not about the "db" backend, will refuse to run. This is
annoying, of course, but much better than the alternative of
claiming that there are no refs in the repository, or
writing to a location that other implementations will not
read.
Note that we are only defining the rules for format 1 here.
We do not ever write format 1 ourselves; it is a tool that
is meant to be used by users and future extensions to
provide safety with older implementations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Disable "have we lost a race with competing repack?" check while
receiving a huge object transfer that runs index-pack.
* jk/index-pack-reduce-recheck:
index-pack: avoid excessive re-reading of pack directory
Some functions from the refs module were still declared in cache.h.
Move them to refs.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also
* Add a docstring
* Rename the second parameter to "old_sha1", to be consistent with the
convention used elsewhere in the refs module
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_gitfile_gently will allocate a buffer to fit the entire file that
should be read. Add a sanity check of the file size before opening to
avoid allocating a potentially huge amount of memory if we come across
a large file that someone happened to name ".git". The limit is set to
a sufficiently unreasonable size that should never be exceeded by a
genuine .git file.
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It can be useful to have grafts or replace refs for specific use-cases while
keeping the default "view" of the repository pristine (or with a different
set of grafts/replace refs).
It is possible to use a different graft file with GIT_GRAFT_FILE, but while
replace refs are more powerful, they don't have an equivalent override.
Add a GIT_REPLACE_REF_BASE environment variable to control where git is
going to look for replace refs.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read_gitfile will die on most error cases. This makes it unsuitable
for speculative calls. Extract the core logic and provide a gentle
version that returns NULL on failure.
The first usecase of the new gentle version will be to probe for
submodules during git clean.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we spend extra effort for
has_sha1_file to give the right answer when somebody else is
repacking. Usually this effort does not matter, because
after finding that the object does not exist, the next step
is usually to die().
However, some code paths make a large number of
has_sha1_file checks which are _not_ expected to return 1.
The collision test in index-pack.c is such a case. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.
This patch introduces a "quick" flag to has_sha1_file which
callers can use when they would prefer high performance at
the cost of false negatives during repacks. There may be
other code paths that can use this, but the index-pack one
is the most obviously critical, so we'll start with
switching that one.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to use the new function elsewhere in a moment.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up for xdg configuration path support.
* pt/xdg-config-path:
path.c: remove home_config_paths()
git-config: replace use of home_config_paths()
git-commit: replace use of home_config_paths()
credential-store.c: replace home_config_paths() with xdg_config_home()
dir.c: replace home_config_paths() with xdg_config_home()
attr.c: replace home_config_paths() with xdg_config_home()
path.c: implement xdg_config_home()
t0302: "unreadable" test needs POSIXPERM
t0302: test credential-store support for XDG_CONFIG_HOME
git-credential-store: support XDG_CONFIG_HOME
git-credential-store: support multiple credential files
"git cat-file --batch(-check)" learned the "--follow-symlinks"
option that follows an in-tree symbolic link when asked about an
object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at
Documentation/RelNotes/2.5.0.txt. With the new option, the command
behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as
input instead.
* dt/cat-file-follow-symlinks:
cat-file: add --follow-symlinks to --batch
sha1_name: get_sha1_with_context learns to follow symlinks
tree-walk: learn get_tree_entry_follow_symlinks
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.
* jc/hash-object:
write_sha1_file(): do not use a separate sha1[] array
t1007: add hash-object --literally tests
hash-object --literally: fix buffer overrun with extra-long object type
git-hash-object.txt: document --literally option
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.
* nd/untracked-cache: (24 commits)
git-status.txt: advertisement for untracked cache
untracked cache: guard and disable on system changes
mingw32: add uname()
t7063: tests for untracked cache
update-index: test the system before enabling untracked cache
update-index: manually enable or disable untracked cache
status: enable untracked cache
untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
untracked cache: mark index dirty if untracked cache is updated
untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
untracked cache: avoid racy timestamps
read-cache.c: split racy stat test to a separate function
untracked cache: invalidate at index addition or removal
untracked cache: load from UNTR index extension
untracked cache: save to an index extension
ewah: add convenient wrapper ewah_serialize_strbuf()
untracked cache: don't open non-existent .gitignore
untracked cache: mark what dirs should be recursed/saved
untracked cache: record/validate dir mtime and reuse cached output
untracked cache: make a wrapper around {open,read,close}dir()
...
Filter scripts were run with SIGPIPE disabled on the Git side,
expecting that they may not read what Git feeds them to filter.
We however treated a filter that does not read its input fully
before exiting as an error.
This changes semantics, but arguably in a good way. If a filter
can produce its output without consuming its input using whatever
magic, we now let it do so, instead of diagnosing it as a
programming error.
* jc/ignore-epipe-in-filter:
filter_buffer_or_fd(): ignore EPIPE
copy.c: make copy_fd() report its status silently
Wire up get_sha1_with_context to call get_tree_entry_follow_symlinks
when GET_SHA1_FOLLOW_SYMLINKS is passed in flags. G_S_FOLLOW_SYMLINKS
is incompatible with G_S_ONLY_TO_DIE because the diagnosis
that ONLY_TO_DIE triggers does not at present consider symlinks, and
it would be a significant amount of additional code to allow it to
do so.
Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When copy_fd() function encounters errors, it emits error messages
itself, which makes it impossible for callers to take responsibility
for reporting errors, especially when they want to ignore certain
errors.
Move the error reporting to its callers in preparation.
- copy_file() and copy_file_with_time() by indirection get their
own calls to error().
- hold_lock_file_for_append(), when told to die on error, used to
exit(128) relying on the error message from copy_fd(), but now it
does its own die() instead. Note that the callers that do not
pass LOCK_DIE_ON_ERROR need to be adjusted for this change, but
fortunately there is none ;-)
- filter_buffer_or_fd() has its own error() already, in addition to
the message from copy_fd(), so this will change the output but
arguably in a better way.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add the "--allow-unknown-type" option to "cat-file" to allow
inspecting loose objects of an experimental or a broken type.
* kn/cat-file-literally:
t1006: add tests for git cat-file --allow-unknown-type
cat-file: teach cat-file a '--allow-unknown-type' option
cat-file: make the options mutually exclusive
sha1_file: support reading from a loose object of unknown type
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.
* jk/prune-mtime:
sha1_file: only freshen packs once per run
sha1_file: freshen pack objects before loose
reachable: only mark local objects as recent
Code clean-up for xdg configuration path support.
* pt/xdg-config-path:
path.c: remove home_config_paths()
git-config: replace use of home_config_paths()
git-commit: replace use of home_config_paths()
credential-store.c: replace home_config_paths() with xdg_config_home()
dir.c: replace home_config_paths() with xdg_config_home()
attr.c: replace home_config_paths() with xdg_config_home()
path.c: implement xdg_config_home()
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.
* jc/hash-object:
write_sha1_file(): do not use a separate sha1[] array
t1007: add hash-object --literally tests
hash-object --literally: fix buffer overrun with extra-long object type
git-hash-object.txt: document --literally option
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.
* nd/multiple-work-trees: (41 commits)
prune --worktrees: fix expire vs worktree existence condition
t1501: fix test with split index
t2026: fix broken &&-chain
t2026 needs procondition SANITY
git-checkout.txt: a note about multiple checkout support for submodules
checkout: add --ignore-other-wortrees
checkout: pass whole struct to parse_branchname_arg instead of individual flags
git-common-dir: make "modules/" per-working-directory directory
checkout: do not fail if target is an empty directory
t2025: add a test to make sure grafts is working from a linked checkout
checkout: don't require a work tree when checking out into a new one
git_path(): keep "info/sparse-checkout" per work-tree
count-objects: report unused files in $GIT_DIR/worktrees/...
gc: support prune --worktrees
gc: factor out gc.pruneexpire parsing code
gc: style change -- no SP before closing parenthesis
checkout: clean up half-prepared directories in --to mode
checkout: reject if the branch is already checked out elsewhere
prune: strategies for linked checkouts
checkout: support checking out into a new working directory
...
Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional "typename" field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.
Add parse_sha1_header_extended() which acts as a wrapper around
parse_sha1_header() allowing more information to be obtained.
Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
unknown/corrupt objects which have a unknown sha1 header size to
a strbuf structure. This was written by Junio C Hamano but tested
by me.
Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Hepled-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
home_config_paths() combines distinct functionality already implemented
by expand_user_path() and xdg_config_home(), and it also hard-codes the
path ~/.gitconfig, which makes it unsuitable to use for other home
config file paths. Since its use will just add unnecessary complexity to
the code, remove it.
Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The XDG base dir spec[1] specifies that configuration files be stored in
a subdirectory in $XDG_CONFIG_HOME. To construct such a configuration
file path, home_config_paths() can be used. However, home_config_paths()
combines distinct functionality:
1. Retrieve the home git config file path ~/.gitconfig
2. Construct the XDG config path of the file specified by `file`.
This function was introduced in commit 21cf3227 ("read (but not write)
from $XDG_CONFIG_HOME/git/config file"). While the intention of the
function was to allow the home directory configuration file path and the
xdg directory configuration file path to be retrieved with one function
call, the hard-coding of the path ~/.gitconfig prevents it from being
used for other configuration files. Furthermore, retrieving a file path
relative to the user's home directory can be done with
expand_user_path(). Hence, it can be seen that home_config_paths()
introduces unnecessary complexity, especially if a user just wants to
retrieve the xdg config file path.
As such, implement a simpler function xdg_config_home() for constructing
the XDG base dir spec configuration file path. This function, together
with expand_user_path(), can replace all uses of home_config_paths().
[1] http://standards.freedesktop.org/basedir-spec/basedir-spec-0.7.html
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.
* jk/prune-mtime:
sha1_file: only freshen packs once per run
sha1_file: freshen pack objects before loose
reachable: only mark local objects as recent
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.
* bc/object-id:
apply: convert threeway_stage to object_id
patch-id: convert to use struct object_id
commit: convert parts to struct object_id
diff: convert struct combine_diff_path to object_id
bulk-checkin.c: convert to use struct object_id
zip: use GIT_SHA1_HEXSZ for trailers
archive.c: convert to use struct object_id
bisect.c: convert leaf functions to use struct object_id
define utility functions for object IDs
define a structure for object IDs
"hash-object" learned in 5ba9a93 (hash-object: add --literally
option, 2014-09-11) to allow crafting a corrupt/broken object of
unknown type.
When the user-provided type is particularly long, however, it can
overflow the relatively small stack-based character array handed to
write_sha1_file_prepare() by hash_sha1_file() and write_sha1_file(),
leading to stack corruption (and crash). Introduce a custom helper
to allow arbitrarily long typenames just for "hash-object --literally".
[jc: Eric's original used a strbuf in the more common codepaths, and
I rewrote it to avoid penalizing the non-literally code. Bugs are mine]
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).
We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.
An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When pruning and repacking a repository that has an
alternate object store configured, we may traverse a large
number of objects in the alternate. This serves no purpose,
and may be expensive to do. A longer explanation is below.
Commits d3038d2 and abcb865 taught prune and pack-objects
(respectively) to treat "recent" objects as tips for
reachability, so that we keep whole chunks of history. They
built on the object traversal in 660c889 (sha1_file: add
for_each iterators for loose and packed objects,
2014-10-15), which covers both local and alternate objects.
In both cases, covering alternate objects is unnecessary, as
both commands can only drop objects from the local
repository. In the case of prune, we traverse only the local
object directory. And in the case of repacking, while we may
or may not include local objects in our pack, we will never
reach into the alternate with "repack -d". The "-l" option
is only a question of whether we are migrating objects from
the alternate into our repository, or leaving them
untouched.
It is possible that we may drop an object that is depended
upon by another object in the alternate. For example,
imagine two repositories, A and B, with A pointing to B as
an alternate. Now imagine a commit that is in B which
references a tree that is only in A. Traversing from recent
objects in B might prevent A from dropping that tree. But
this case isn't worth covering. Repo B should take
responsibility for its own objects. It would never have had
the commit in the first place if it did not also have the
tree, and assuming it is using the same "keep recent chunks
of history" scheme, then it would itself keep the tree, as
well.
So checking the alternate objects is not worth doing, and
come with a significant performance impact. In both cases,
we skip any recent objects that have already been marked
SEEN (i.e., that we know are already reachable for prune, or
included in the pack for a repack). So there is a slight
waste of time in opening the alternate packs at all, only to
notice that we have already considered each object. But much
worse, the alternate repository may have a large number of
objects that are not reachable from the local repository at
all, and we end up adding them to the traversal.
We can fix this by considering only local unseen objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git prune" used to largely ignore broken refs when deciding which
objects are still being used, which could spread an existing small
damage and make it a larger one.
* jk/prune-with-corrupt-refs:
refs.c: drop curate_packed_refs
repack: turn on "ref paranoia" when doing a destructive repack
prune: turn on ref_paranoia flag
refs: introduce a "ref paranoia" flag
t5312: test object deletion code paths in a corrupted repository
The expected call sequence is for the caller to use match_pathspec()
repeatedly on a set of pathspecs, accumulating the "hits" in a
separate array, and then call this function to diagnose a pathspec
that never matched anything, as that can indicate a typo from the
command line, e.g. "git commit Maekfile".
Many builtin commands use this function from builtin/ls-files.c,
which is not a very healthy arrangement. ls-files might have been
the first command to feel the need for such a helper, but the need
is shared by everybody who uses the "match and then report" pattern.
Move it to dir.c where match_pathspec() is defined.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most operations that iterate over refs are happy to ignore
broken cruft. However, some operations should be performed
with knowledge of these broken refs, because it is better
for the operation to choke on a missing object than it is to
silently pretend that the ref did not exist (e.g., if we are
computing the set of reachable tips in order to prune
objects).
These processes could just call for_each_rawref, except that
ref iteration is often hidden behind other interfaces. For
instance, for a destructive "repack -ad", we would have to
inform "pack-objects" that we are destructive, and then it
would in turn have to tell the revision code that our
"--all" should include broken refs.
It's much simpler to just set a global for "dangerous"
operations that includes broken refs in all iterations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are several utility functions (hashcmp and friends) that are used
for comparing object IDs (SHA-1 values). Using these functions, which
take pointers to unsigned char, with struct object_id requires tiresome
access to the sha1 member, which bloats code and violates the desired
encapsulation. Provide wrappers around these functions for struct
object_id for neater, more maintainable code. Use the new constants to
avoid the hard-coded 20s and 40s throughout the original functions.
These functions simply call the underlying pointer-to-unsigned-char
versions to ensure that any performance improvements will be passed
through to the new functions.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values). This leads to lots of hardcoded numbers throughout the
codebase. It also leads to confusion about the purposes of a buffer.
Introduce a structure for object IDs. This allows us to obtain the benefits
of compile-time checking for misuse. The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.
* jk/prune-mtime:
sha1_file: fix iterating loose alternate objects
for_each_loose_file_in_objdir: take an optional strbuf path
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.
* jk/decimal-width-for-uintmax:
decimal_width: avoid integer overflow
Simplify the ref transaction API around how "the ref should be
pointing at this object" is specified.
* mh/refs-have-new:
refs.h: remove duplication in function docstrings
update_ref(): improve documentation
ref_transaction_verify(): new function to check a reference's value
ref_transaction_delete(): check that old_sha1 is not null_sha1
ref_transaction_create(): check that new_sha1 is valid
commit: avoid race when creating orphan commits
commit: add tests of commit races
ref_transaction_delete(): remove "have_old" parameter
ref_transaction_update(): remove "have_old" parameter
struct ref_update: move "have_old" into "flags"
refs.c: change some "flags" to "unsigned int"
refs: remove the gap in the REF_* constant values
refs: move REF_DELETING to refs.c
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.
* jk/prune-mtime:
sha1_file: fix iterating loose alternate objects
for_each_loose_file_in_objdir: take an optional strbuf path
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.
* jk/decimal-width-for-uintmax:
decimal_width: avoid integer overflow
Change the following functions' "flags" arguments from "int" to
"unsigned int":
* ref_transaction_update()
* ref_transaction_create()
* ref_transaction_delete()
* update_ref()
* delete_ref()
* lock_ref_sha1_basic()
Also change the "flags" member in "struct ref_update" to unsigned.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.
We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The decimal_width function originally appeared in blame.c as
"lineno_width", and was designed for calculating the
print-width of small-ish integer values (line numbers in
text files). In ec7ff5b, it was made into a reusable
function, and in dc801e7, we started using it to align
diffstats.
Binary files in a diffstat show byte counts rather than line
numbers, meaning they can be quite large (e.g., consider
adding or removing a 2GB file). decimal_width is not up to
the challenge for two reasons:
1. It takes the value as an "int", whereas large files may
easily surpass this. The value may be truncated, in
which case we will produce an incorrect value.
2. It counts "up" by repeatedly multiplying another
integer by 10 until it surpasses the value. This can
cause an infinite loop when the value is close to the
largest representable integer.
For example, consider using a 32-bit signed integer,
and a value of 2,140,000,000 (just shy of 2^31-1).
We will count up and eventually see that 1,000,000,000
is smaller than our value. The next step would be to
multiply by 10 and see that 10,000,000,000 is too
large, ending the loop. But we can't represent that
value, and we have signed overflow.
This is technically undefined behavior, but a common
behavior is to lose the high bits, in which case our
iterator will certainly be less than the number. So
we'll keep multiplying, overflow again, and so on.
This patch changes the argument to a uintmax_t (the same
type we use to store the diffstat information for binary
filese), and counts "down" by repeatedly dividing our value
by 10.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Long overdue departure from the assumption that S_IFMT is shared by
everybody made in 2005.
* dm/compat-s-ifmt-for-zos:
compat: convert modes to use portable file type values
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.
On NTFS (and FAT32), there exist so-called "short names" for
backwards-compatibility: 8.3 compliant names that refer to the same files
as their long names. As ".git" is not an 8.3 compliant name, a short name
is generated automatically, typically "git~1".
Depending on the Windows version, any combination of trailing spaces and
periods are ignored, too, so that both "git~1." and ".git." still refer
to the Git directory. The reason is that 8.3 stores file names shorter
than 8 characters with trailing spaces. So literally, it does not matter
for the short name whether it is padded with spaces or whether it is
shorter than 8 characters, it is considered to be the exact same.
The period is the separator between file name and file extension, and
again, an empty extension consists just of spaces in 8.3 format. So
technically, we would need only take care of the equivalent of this
regex:
(\.git {0,4}|git~1 {0,3})\. {0,3}
However, there are indications that at least some Windows versions might
be more lenient and accept arbitrary combinations of trailing spaces and
periods and strip them out. So we're playing it real safe here. Besides,
there can be little doubt about the intention behind using file names
matching even the more lenient pattern specified above, therefore we
should be fine with disallowing such patterns.
Extra care is taken to catch names such as '.\\.git\\booh' because the
backslash is marked as a directory separator only on Windows, and we want
to use this new helper function also in fsck on other platforms.
A big thank you goes to Ed Thomson and an unnamed Microsoft engineer for
the detailed analysis performed to come up with the corresponding fixes
for libgit2.
This commit adds a function to detect whether a given file name can refer
to the Git directory by mistake.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>