We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.
* jk/decimal-width-for-uintmax:
decimal_width: avoid integer overflow
Simplify the ref transaction API around how "the ref should be
pointing at this object" is specified.
* mh/refs-have-new:
refs.h: remove duplication in function docstrings
update_ref(): improve documentation
ref_transaction_verify(): new function to check a reference's value
ref_transaction_delete(): check that old_sha1 is not null_sha1
ref_transaction_create(): check that new_sha1 is valid
commit: avoid race when creating orphan commits
commit: add tests of commit races
ref_transaction_delete(): remove "have_old" parameter
ref_transaction_update(): remove "have_old" parameter
struct ref_update: move "have_old" into "flags"
refs.c: change some "flags" to "unsigned int"
refs: remove the gap in the REF_* constant values
refs: move REF_DELETING to refs.c
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.
* jk/prune-mtime:
sha1_file: fix iterating loose alternate objects
for_each_loose_file_in_objdir: take an optional strbuf path
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.
* jk/decimal-width-for-uintmax:
decimal_width: avoid integer overflow
Change the following functions' "flags" arguments from "int" to
"unsigned int":
* ref_transaction_update()
* ref_transaction_create()
* ref_transaction_delete()
* update_ref()
* delete_ref()
* lock_ref_sha1_basic()
Also change the "flags" member in "struct ref_update" to unsigned.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.
We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The decimal_width function originally appeared in blame.c as
"lineno_width", and was designed for calculating the
print-width of small-ish integer values (line numbers in
text files). In ec7ff5b, it was made into a reusable
function, and in dc801e7, we started using it to align
diffstats.
Binary files in a diffstat show byte counts rather than line
numbers, meaning they can be quite large (e.g., consider
adding or removing a 2GB file). decimal_width is not up to
the challenge for two reasons:
1. It takes the value as an "int", whereas large files may
easily surpass this. The value may be truncated, in
which case we will produce an incorrect value.
2. It counts "up" by repeatedly multiplying another
integer by 10 until it surpasses the value. This can
cause an infinite loop when the value is close to the
largest representable integer.
For example, consider using a 32-bit signed integer,
and a value of 2,140,000,000 (just shy of 2^31-1).
We will count up and eventually see that 1,000,000,000
is smaller than our value. The next step would be to
multiply by 10 and see that 10,000,000,000 is too
large, ending the loop. But we can't represent that
value, and we have signed overflow.
This is technically undefined behavior, but a common
behavior is to lose the high bits, in which case our
iterator will certainly be less than the number. So
we'll keep multiplying, overflow again, and so on.
This patch changes the argument to a uintmax_t (the same
type we use to store the diffstat information for binary
filese), and counts "down" by repeatedly dividing our value
by 10.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Long overdue departure from the assumption that S_IFMT is shared by
everybody made in 2005.
* dm/compat-s-ifmt-for-zos:
compat: convert modes to use portable file type values
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on NTFS nor FAT32.
In practice this probably doesn't matter, though, as
the restricted names are rather obscure and almost
certainly would never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.
On NTFS (and FAT32), there exist so-called "short names" for
backwards-compatibility: 8.3 compliant names that refer to the same files
as their long names. As ".git" is not an 8.3 compliant name, a short name
is generated automatically, typically "git~1".
Depending on the Windows version, any combination of trailing spaces and
periods are ignored, too, so that both "git~1." and ".git." still refer
to the Git directory. The reason is that 8.3 stores file names shorter
than 8 characters with trailing spaces. So literally, it does not matter
for the short name whether it is padded with spaces or whether it is
shorter than 8 characters, it is considered to be the exact same.
The period is the separator between file name and file extension, and
again, an empty extension consists just of spaces in 8.3 format. So
technically, we would need only take care of the equivalent of this
regex:
(\.git {0,4}|git~1 {0,3})\. {0,3}
However, there are indications that at least some Windows versions might
be more lenient and accept arbitrary combinations of trailing spaces and
periods and strip them out. So we're playing it real safe here. Besides,
there can be little doubt about the intention behind using file names
matching even the more lenient pattern specified above, therefore we
should be fine with disallowing such patterns.
Extra care is taken to catch names such as '.\\.git\\booh' because the
backslash is marked as a directory separator only on Windows, and we want
to use this new helper function also in fsck on other platforms.
A big thank you goes to Ed Thomson and an unnamed Microsoft engineer for
the detailed analysis performed to come up with the corresponding fixes
for libgit2.
This commit adds a function to detect whether a given file name can refer
to the Git directory by mistake.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.
We make this check optional for two reasons:
1. It restricts the set of allowable filenames, which is
unnecessary for people who are not on HFS+. In practice
this probably doesn't matter, though, as the restricted
names are rather obscure and almost certainly would
never come up in practice.
2. It has a minor performance penalty for every path we
insert into the index.
This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds simple wrapper functions around calls to stat(), fstat(),
and lstat() that translate the operating system's native file type
bits to those used by most operating systems. It also rewrites the
S_IF* macros to the common values, so all file type processing is
performed using the translated modes. This makes projects portable
across operating systems that use different file type definitions.
Only the file type bits may be affected by these compatibility
functions; the file permission bits are assumed to be 07777 and are
passed through unchanged.
Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In linked checkouts, borrowed parts like config is taken from
$GIT_COMMON_DIR. $GIT_DIR/config is never used. Report them as
garbage.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The repo setup procedure is updated to detect $GIT_DIR/commondir and
set $GIT_COMMON_DIR properly.
The core.worktree is ignored when $GIT_COMMON_DIR is set. This is
because the config file is shared in multi-checkout setup, but
checkout directories _are_ different. Making core.worktree effective
in all checkouts mean it's back to a single checkout.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This variable is intended to support multiple working directories
attached to a repository. Such a repository may have a main working
directory, created by either "git init" or "git clone" and one or more
linked working directories. These working directories and the main
repository share the same repository directory.
In linked working directories, $GIT_COMMON_DIR must be defined to point
to the real repository directory and $GIT_DIR points to an unused
subdirectory inside $GIT_COMMON_DIR. File locations inside the
repository are reorganized from the linked worktree view point:
- worktree-specific such as HEAD, logs/HEAD, index, other top-level
refs and unrecognized files are from $GIT_DIR.
- the rest like objects, refs, info, hooks, packed-refs, shallow...
are from $GIT_COMMON_DIR (except info/sparse-checkout, but that's
a separate patch)
Scripts are supposed to retrieve paths in $GIT_DIR with "git rev-parse
--git-path", which will take care of "$GIT_DIR vs $GIT_COMMON_DIR"
business.
The redirection is done by git_path(), git_pathdup() and
strbuf_git_path(). The selected list of paths goes to $GIT_COMMON_DIR,
not the other way around in case a developer adds a new
worktree-specific file and it's accidentally promoted to be shared
across repositories (this includes unknown files added by third party
commands)
The list of known files that belong to $GIT_DIR are:
ADD_EDIT.patch BISECT_ANCESTORS_OK BISECT_EXPECTED_REV BISECT_LOG
BISECT_NAMES CHERRY_PICK_HEAD COMMIT_MSG FETCH_HEAD HEAD MERGE_HEAD
MERGE_MODE MERGE_RR NOTES_EDITMSG NOTES_MERGE_WORKTREE ORIG_HEAD
REVERT_HEAD SQUASH_MSG TAG_EDITMSG fast_import_crash_* logs/HEAD
next-index-* rebase-apply rebase-merge rsync-refs-* sequencer/*
shallow_*
Path mapping is NOT done for git_path_submodule(). Multi-checkouts are
not supported as submodules.
Helped-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We allow the user to relocate certain paths out of $GIT_DIR via
environment variables, e.g. GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE and
GIT_GRAFT_FILE. Callers are not supposed to use git_path() or
git_pathdup() to get those paths. Instead they must use
get_object_directory(), get_index_file() and get_graft_file()
respectively. This is inconvenient and could be missed in review (for
example, there's git_path("objects/info/alternates") somewhere in
sha1_file.c).
This patch makes git_path() and git_pathdup() understand those
environment variables. So if you set GIT_OBJECT_DIRECTORY to /foo/bar,
git_path("objects/abc") should return /foo/bar/abc. The same is done
for the two remaining env variables.
"git rev-parse --git-path" is the wrapper for script use.
This patch kinda reverts a0279e1 (setup_git_env: use git_pathdup
instead of xmalloc + sprintf - 2014-06-19) because using git_pathdup
here would result in infinite recursion:
setup_git_env() -> git_pathdup("objects") -> .. -> adjust_git_path()
-> get_object_directory() -> oops, git_object_directory is NOT set
yet -> setup_git_env()
I wanted to make git_pathdup_literal() that skips adjust_git_path().
But that won't work because later on when $GIT_COMMON_DIR is
introduced, git_pathdup_literal("objects") needs adjust_git_path() to
replace $GIT_DIR with $GIT_COMMON_DIR.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the previous patch, git_snpath() is modified to allocate a new
strbuf buffer because vsnpath() needs that. But that makes it
awkward because git_snpath() receives a pre-allocated buffer from
outside and has to copy data back. Rename it to strbuf_git_path()
and make it receive strbuf directly.
Using git_path() in update_refs_for_switch() which used to call
git_snpath() is safe because that function and all of its callers do
not keep any pointer to the round-robin buffer pool allocated by
get_pathname().
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.
After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.
Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.
This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.
* jk/prune-mtime: (28 commits)
drop add_object_array_with_mode
revision: remove definition of unused 'add_object' function
pack-objects: double-check options before discarding objects
repack: pack objects mentioned by the index
pack-objects: use argv_array
reachable: use revision machinery's --indexed-objects code
rev-list: add --indexed-objects option
rev-list: document --reflog option
t5516: test pushing a tag of an otherwise unreferenced blob
traverse_commit_list: support pending blobs/trees with paths
make add_object_array_with_context interface more sane
write_sha1_file: freshen existing objects
pack-objects: match prune logic for discarding objects
pack-objects: refactor unpack-unreachable expiration check
prune: keep objects reachable from recent objects
sha1_file: add for_each iterators for loose and packed objects
count-objects: use for_each_loose_file_in_objdir
count-objects: do not use xsize_t when counting object size
prune-packed: use for_each_loose_file_in_objdir
reachable: mark index blobs as SEEN
...
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.
Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).
We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently do not handle badly named refs well:
$ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
$ git branch
fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
$ git branch -D master.....@\*@\\.
error: branch 'master.....@*@\.' not found.
Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.
So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them. Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.
Details:
In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").
In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.
Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.
Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially. For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.
In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":
$ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
$ git branch -d nonsense
error: branch 'nonsense' not found.
Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:
$ git update-ref -d refs/heads/nonsense
error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links
Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.
Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading). Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.
While at it, swap two of the arguments in the function to put output
arguments at the end. As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.
Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lockfile API and its users have been cleaned up.
* mh/lockfile: (38 commits)
lockfile.h: extract new header file for the functions in lockfile.c
hold_locked_index(): move from lockfile.c to read-cache.c
hold_lock_file_for_append(): restore errno before returning
get_locked_file_path(): new function
lockfile.c: rename static functions
lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
commit_lock_file_to(): refactor a helper out of commit_lock_file()
trim_last_path_component(): replace last_path_elm()
resolve_symlink(): take a strbuf parameter
resolve_symlink(): use a strbuf for internal scratch space
lockfile: change lock_file::filename into a strbuf
commit_lock_file(): use a strbuf to manage temporary space
try_merge_strategy(): use a statically-allocated lock_file object
try_merge_strategy(): remove redundant lock_file allocation
struct lock_file: declare some fields volatile
lockfile: avoid transitory invalid states
git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
dump_marks(): remove a redundant call to rollback_lock_file()
api-lockfile: document edge cases
commit_lock_file(): rollback lock file on failure to rename
...
When running a required clean filter, we do not have to mmap the
original before feeding the filter. Instead, stream the file
contents directly to the filter and process its output.
* sp/stream-clean-filter:
sha1_file: don't convert off_t to size_t too early to avoid potential die()
convert: stream from fd to required clean filter to reduce used address space
copy_fd(): do not close the input file descriptor
mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
config.c: add git_env_ulong() to parse environment variable
convert: drop arguments other than 'path' from would_convert_to_git()
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a function to return the path of the file that is locked by a
lock_file object. This reduces the knowledge that callers have to have
about the lock_file layout.
Suggested-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it harder to misread the name as LOCK_NODE_REF.
Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function remove_lock_file_on_signal() is used as a signal handler.
It is not realistic to make the signal handler conform strictly to the
C standard, which is very restrictive about what a signal handler is
allowed to do. But let's increase the likelihood that it will work:
The lock_file_list global variable and several fields from struct
lock_file are used by the signal handler. Declare those values
"volatile" to (1) force the main process to write the values to RAM
promptly, and (2) prevent updates to these fields from being reordered
in a way that leaves an opportunity for a jump to the signal handler
while the object is in an inconsistent state.
We don't mark the filename field volatile because that would prevent
the use of strcpy(), and it is anyway unlikely that a compiler
re-orders a strcpy() call across other expressions. So in practice it
should be possible to get away without "volatile" in the "filename"
case.
Suggested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state. And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.
This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename. It
is wrong to assume that this string can be updated atomically; for
example, even
strcpy(lk->filename, value)
is unsafe. But the old code was even more reckless; for example,
strcpy(lk->filename, path);
if (!(flags & LOCK_NODEREF))
resolve_symlink(lk->filename, max_path_len);
strcat(lk->filename, ".lock");
During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile. If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!
We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.
So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose. Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few places that use these values, so define constants for
them.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is used for other things besides the index, so rename it
accordingly.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
Optimize the code path to write out the packed-refs file, which
especially matters in a repository with a large number of refs.
* jk/write-packed-refs-via-stdio:
refs: write packed_refs file using stdio
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
Code clean-up.
* jk/commit-author-parsing:
determine_author_info(): copy getenv output
determine_author_info(): reuse parsing functions
date: use strbufs in date-formatting functions
record_author_date(): use find_commit_header()
record_author_date(): fix memory leak on malformed commit
commit: provide a function to find a header in a buffer
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable. A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.
* bb/date-iso-strict:
pretty: provide a strict ISO 8601 date format
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use the new caching config-set API in git_config() calls.
* ta/config-set-1:
add tests for `git_config_get_string_const()`
add a test for semantic errors in config files
rewrite git_config() to use the config-set API
config: add `git_die_config()` to the config-set API
change `git_config()` return value to void
add line number and file name info to `config_set`
config.c: fix accuracy of line number in errors
config.c: mark error and warnings strings for translation
We write each line of a new packed-refs file individually
using a write() syscall (and sometimes 2, if the ref is
peeled). Since each line is only about 50-100 bytes long,
this creates a lot of system call overhead.
We can instead open a stdio handle around our descriptor and
use fprintf to write to it. The extra buffering is not a
problem for us, because nobody will read our new packed-refs
file until we call commit_lock_file (by which point we have
flushed everything).
On a pathological repository with 8.5 million refs, this
dropped the time to run `git pack-refs` from 20s to 6s.
Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add in-core caching layer to let us avoid reading the same
configuration files number of times.
* ta/config-set:
test-config: add tests for the config_set API
add `config_set` API for caching config-like files
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any. This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.
* mm/config-edit-global:
commit: advertise config --global --edit on guessed identity
home_config_paths(): let the caller ignore xdg path
config --global --edit: create a template file if needed
There are cases where you lock and open to write a file, close it to
show the updated contents to external processes, and then have to
update the file again while still holding the lock, but the lockfile
API lacked support for such an access pattern.
* jc/reopen-lock-file:
lockfile: allow reopening a closed but still locked file
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.
The output from "--date=iso" deviates from ISO 8601 in these ways:
- a space instead of the `T` date/time delimiter
- a space between time and time zone
- no colon between hours and minutes of the time zone
Add a strict ISO 8601 date format for displaying committer and
author dates. Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.
See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.
Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new function parses an integeral value that fits in unsigned
long in human readable form, i.e. possibly with unit suffix, e.g.
10k = 10240, etc., from an environment variable. Parsing of
GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).
Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:
1. In test-date.c, where we don't care about efficiency.
2. In determine_author_info, which is not performance
critical (and where the use of a strbuf will help later
refactoring).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.
Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.
It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,
if (!git_config_get_value(key, &value)){
if (!strcmp(value, "foo"))
git_config_die(key, "value: `%s` is illegal", value);
else
/* do work */
}
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.
Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.
Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.
Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Store file name and line number for each key-value pair in the cache
during parsing of the configuration files.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.
Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.
Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.
Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.
While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user has no ~/.gitconfig file, git config --global --edit used
to launch an editor on an nonexistant file name.
Instead, create a file with a default content before launching the
editor. The template contains only commented-out entries, to save a few
keystrokes for the user. If the values are guessed properly, the user
will only have to uncomment the entries.
Advanced users teaching newbies can create a minimalistic configuration
faster for newbies. Beginners reading a tutorial advising to run "git
config --global --edit" as a first step will be slightly more guided for
their first contact with Git.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
* kb/perf-trace:
api-trace.txt: add trace API documentation
progress: simplify performance measurement by using getnanotime()
wt-status: simplify performance measurement by using getnanotime()
git: add performance tracing for git's main() function to debug scripts
trace: add trace_performance facility to debug performance issues
trace: add high resolution timer function to debug performance issues
trace: add 'file:line' to all trace output
trace: move code around, in preparation to file:line output
trace: add current timestamp to all trace output
trace: disable additional trace output for unit tests
trace: add infrastructure to augment trace output with additional info
sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
trace: improve trace performance
trace: remove redundant printf format attribute
trace: consistently name the format parameter
trace: move trace declarations from cache.h to new trace.h
Early part of the "ref transaction" topic.
* rs/ref-transaction-0:
refs.c: change ref_transaction_update() to do error checking and return status
refs.c: remove the onerr argument to ref_transaction_commit
update-ref: use err argument to get error from ref_transaction_commit
refs.c: make update_ref_write update a strbuf on failure
refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
refs.c: log_ref_write should try to return meaningful errno
refs.c: make resolve_ref_unsafe set errno to something meaningful on error
refs.c: commit_packed_refs to return a meaningful errno on failure
refs.c: make remove_empty_directories always set errno to something sane
refs.c: verify_lock should set errno to something meaningful
refs.c: make sure log_ref_setup returns a meaningful errno
refs.c: add an err argument to repack_without_refs
lockfile.c: make lock_file return a meaningful errno on failurei
lockfile.c: add a new public function unable_to_lock_message
refs.c: add a strbuf argument to ref_transaction_commit for error logging
refs.c: allow passing NULL to ref_transaction_free
refs.c: constify the sha arguments for ref_transaction_create|delete|update
refs.c: ref_transaction_commit should not free the transaction
refs.c: remove ref_transaction_rollback
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
In some code paths (e.g. giving "add -i" to prepare the contents to
be committed interactively inside "commit -p") where a caller takes
a lock, writes the new content, give chance for others to use it
while still holding the lock, and then releases the lock when all is
done. As an extension, allow the caller to re-update an already
closed file while still holding the lock (i.e. not yet committed) by
re-opening the file, to be followed by updating the contents and
then by the usual close_lock_file() or commit_lock_file().
This is necessary if we want to add code to rebuild the cache-tree
and write the resulting index out after "add -i" returns the control
to "commit -p", for example.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Making errno when returning from resolve_ref_unsafe() meaningful,
which should fix
* a bug in lock_ref_sha1_basic, where it assumes EISDIR
means it failed due to a directory being in the way
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Introducing a new unable_to_lock_message helper, which has nicer
semantics than unable_to_lock_error and cleans up lockfile.c a little.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.
While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename cache_def_free to cache_def_clear as it doesn't free the struct
cache_def, but just clears its content.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.
Remove the limitation by using strbuf instead.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running. Detect such a race and avoid overwriting the index.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.
To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically. In addition, the cache_name_compare() function in
read-cache.c is nearly identical. The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.
Unify these three functions by using the implementation from
cache_name_compare(). This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.
Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also include direct dependencies (strbuf.h and git-compat-util.h for
__attribute__) so that trace.h can be used independently of cache.h, e.g.
in test programs.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
[efl: moved MinGW-specific part to compat/]
[jes: fixed compilation on non-Windows]
Eric Sunshine fixed mingw_offset_1st_component() to return
consistently "foo" for UNC "//machine/share/foo", cf
http://groups.google.com/group/msysgit/browse_thread/thread/c0af578549b5dda0
Author: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Cezary Zawadka <czawadka@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/commit-date-approxidate:
commit: accept more date formats for "--date"
commit: print "Date" line when the user has set date
pretty: make show_ident_date public
commit: use split_ident_line to compare author/committer
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.
Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath. While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file. So in that sense that
can be done as a less urgent follow-up for this topic.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel. These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such. And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.
* ks/tree-diff-nway:
mingw: activate alloca
combine-diff: speed it up, by using multiparent diff tree-walker directly
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Portable alloca for Git
tree-diff: reuse base str(buf) memory on sub-tree recursion
tree-diff: no need to call "full" diff_tree_sha1 from show_path()
tree-diff: rework diff_tree interface to be sha1 based
tree-diff: diff_tree() should now be static
tree-diff: remove special-case diff-emitting code for empty-tree cases
tree-diff: simplify tree_entry_pathcmp
tree-diff: show_path prototype is not needed anymore
tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
tree-diff: move all action-taking code out of compare_tree_entry()
tree-diff: don't assume compare_tree_entry() returns -1,0,1
tree-diff: consolidate code for emitting diffs and recursion in one place
tree-diff: show_tree() is not needed
tree-diff: no need to pass match to skip_uninteresting()
tree-diff: no need to manually verify that there is no mode change for a path
combine-diff: move changed-paths scanning logic into its own function
combine-diff: move show_log_first logic/action out of paths scanning
When core.commentChar is "auto", the comment char starts with '#' as
in default but if it's already in the prepared message, find another
char in a small subset. This should stop surprises because git strips
some lines unexpectedly.
Note that git is not smart enough to recognize '#' as the comment char
in custom templates and convert it if the final comment char is
different. It thinks '#' lines in custom templates as part of the
commit message. So don't use this with custom templates.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e208f9c converted error() into a macro to make its
constant return value more apparent to calling code. Commit
5ded807 prevents us using this macro with clang, since
clang's -Wunused-value is smart enough to realize that the
constant "-1" is useless in some contexts.
However, since the last commit puts the constant behind an
inline function call, this is enough to prevent the
-Wunused-value warning on both modern gcc and clang. So we
can now re-enable the macro when compiling with clang.
Tested with clang 3.3, 3.4, and 3.5.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit e208f9c introduced a macro to turn error() calls
into:
(error(), -1)
to make the constant return value more visible to the
calling code (and thus let the compiler make better
decisions about the code).
This works well for code like:
return error(...);
but the "-1" is superfluous in code that just calls error()
without caring about the return value. In older versions of
gcc, that was fine, but gcc 4.9 complains with -Wunused-value.
We can work around this by encapsulating the constant return
value in a static inline function, as gcc specifically
avoids complaining about unused function returns unless the
function has been specifically marked with the
warn_unused_result attribute.
We also use the same trick for config_error_nonbool and
opterror, which learned the same error technique in a469a10.
Reported-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We use this function internally to format "Date" lines in
commit logs, but other parts of the code will want it, too.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.
In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.
1. process A calls git-rebase (or does anything that uses the index)
2. process A applies 1st commit
3. process B calls git-status (or does anything that updates the index)
4. process B reads index
5. process A applies 2nd commit
6. process B takes the lock, then overwrites process A's changes.
7. process A applies 3rd commit
As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.
Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Codepaths that parse timestamps in commit objects have been
tightened.
* jk/commit-dates-parsing-fix:
show_ident_date: fix tz range check
log: do not segfault on gmtime errors
log: handle integer overflow in timestamps
date: check date overflow against time_t
fsck: report integer overflow in author timestamps
t4212: test bogus timestamps with git-log
"merge-recursive" was broken in 1.7.7 era and stopped working in an
empty (temporary) working tree, when there are renames involved.
This has been corrected.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
* mh/replace-refs-variable-rename:
Document some functions defined in object.c
Add docstrings for lookup_replace_object() and do_lookup_replace_object()
rename read_replace_refs to check_replace_refs
* mh/object-code-cleanup:
sha1_file.c: document a bunch of functions defined in the file
sha1_file_name(): declare to return a const string
find_pack_entry(): document last_found_pack
replace_object: use struct members instead of an array
Tighten codepaths that parse timestamps in commit objects.
* jk/commit-dates-parsing-fix:
show_ident_date: fix tz range check
log: do not segfault on gmtime errors
log: handle integer overflow in timestamps
date: check date overflow against time_t
fsck: report integer overflow in author timestamps
t4212: test bogus timestamps with git-log
"git config" learned to read from the standard input when "-" is
given as the value to its "--file" parameter (attempting an
operation to update the configuration in the standard input of
course is rejected).
* ks/config-file-stdin:
config: teach "git config --file -" to read from the standard input
config: change git_config_with_options() interface
builtin/config.c: rename check_blob_write() -> check_write()
config: disallow relative include paths from blobs
Allow running "gc --auto" in the background.
* nd/daemonize-gc:
gc: config option for running --auto in background
daemon: move daemonize() to libgit.a
--sort=version:refname (or --sort=v:refname for short) sorts tags as
if they are versions. --sort=-refname reverses the order (with or
without ":version").
versioncmp() is copied from string/strverscmp.c in glibc commit
ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding
style. The implementation is under LGPL-2.1 and according to [1] I can
relicense it to GPLv2.
[1] http://www.gnu.org/licenses/gpl-faq.html#AllCompatibility
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.
* nd/submodule-pathspec-ending-with-slash:
clean: use cache_name_is_other()
clean: replace match_pathspec() with dir_path_match()
pathspec: pass directory indicator to match_pathspec_item()
match_pathspec: match pathspec "foo/" against directory "foo"
dir.c: prepare match_pathspec_item for taking more flags
pathspec: rename match_pathspec_depth() to match_pathspec()
pathspec: convert some match_pathspec_depth() to dir_path_match()
pathspec: convert some match_pathspec_depth() to ce_path_match()
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
db5360f3f4 (name-hash: refactor polymorphic index_name_exists();
2013-09-17) split index_name_exists() into index_file_exists() and
index_dir_exists() but retained index_name_exists() as a thin wrapper
to avoid disturbing possible in-flight topics. Since this change
landed in 'master' some time ago and there are no in-flight topics
referencing index_name_exists(), retire it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the make_cache_entry boolean 'refresh' argument to a more
general 'refresh_options' argument. Pass the value through to the
underlying refresh_cache_ent call. Add option CE_MATCH_REFRESH to
enable stat refresh. Update call sites to use the new signature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move lstat ENOENT handling from refresh_index to refresh_cache_ent and
activate it with a new CE_MATCH_IGNORE_MISSING option. This will allow
other call paths into refresh_cache_ent to use the feature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions. We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.) Change callers
accordingly, deleting a few superfluous temporary variables along the
way.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The semantics of this flag was changed in commit
e1111cef23 inline lookup_replace_object() calls
but wasn't renamed at the time to minimize code churn. Rename it now,
and add a comment explaining its use.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch extends git config --file interface to allow read config from
stdin.
Editing stdin or setting value in stdin is an error.
Include by absolute path is allowed in stdin config, but not by relative
path.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to have more options for config source.
Let's alter git_config_with_options() interface to accept struct with
all source options.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
We used to use two separate rules for the normal ref resolution
dwimming and dwimming done to decide which remote ref to grab. The
third parameter to refname_match() selected which rules to use.
When these two rules were harmonized in
2011-11-04 dd621df9cd refs DWIMmery: use the same rule for both "git fetch" and others
, ref_fetch_rules was #defined to avoid potential breakages for
in-flight topics.
It is now safe to remove the backwards-compatibility code, so remove
refname_match()'s third parameter, make ref_rev_parse_rules private to
refs.c, and remove ref_fetch_rules entirely.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.
* jk/oi-delta-base:
cat-file: provide %(deltabase) batch format
sha1_object_info_extended: provide delta base sha1s
Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED. This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared. This probably indicates that another process deleted the
directory while we were working. If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.
Why doesn't safe_create_leading_directories() do the retrying
internally? Because an empty directory isn't really ever safe until
it holds a file. So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum. Add
a docstring.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.
However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.
For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.
We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().
The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This may be needed when a hook is run after a new shallow pack is
received, but .git/shallow is not settled yet. A temporary shallow
file to plug all loose ends should be used instead. GIT_SHALLOW_FILE
is overriden by --shallow-file.
--shallow-file does not work in this case because the hook may spawn
many git subprocesses and the launch commands do not have
--shallow-file as it's a recent addition.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.
The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.
Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)
So the steps to figure out what ref need what new shallow commits are:
1. Split the sender shallow commit list into "ours" and "theirs" list
by has_sha1_file. Those that exist in current repo in "ours", the
remaining in "theirs".
2. Check the receiver .git/shallow, remove from "ours" the ones that
also exist in .git/shallow.
3. Fetch the new pack. Either install or unpack it.
4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
has_sha1_file. Obviously the new pack does not need them.
5. If the pack is kept, remove from "ours" the ones that do not exist
in the new pack.
6. Walk the new refs to answer the question "what shallow commits,
both ours and theirs, are required in .git/shallow in order to add
this ref?". Shallow commits not associated to any refs are removed
from their respective list.
7. (*) Check reachability (from the current refs) of all remaining
commits in "ours". Those reachable are removed. We do not want to
cut any part of our (reachable) history. We only check up
commits. True reachability test is done by
check_everything_connected() at the end as usual.
8. Combine the final "ours" and "theirs" and add them all to
.git/shallow. Install new refs. The case where some hook rejects
some refs on a push is explained in more detail in the push
patches.
Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:
1) the sender has longer history than the receiver
All shallow commits from the sender will be put into "theirs" list
at step 1 because none of them exists in current repo. In the
common case, "theirs" becomes empty at step 4 and exit early.
2) the sender has shorter history than the receiver
All shallow commits from the sender are likely in "ours" list at
step 1. In the common case, if the new pack is kept, we could empty
"ours" and exit early at step 5.
If the pack is not kept, we hit the expensive step 6 then exit
after "ours" is emptied. There'll be only a handful of objects to
walk in fast-forward case. If it's forced update, we may need to
walk to the bottom.
3) the sender has same .git/shallow as the receiver
This is similar to case 2 except that "ours" should be emptied at
step 2 and exit early.
A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.
A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.
This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.
(*) in previous versions step 7 was put before step 3. I reorder it so
that the common case that keeps the pack does not need to walk
commits at all. In future if we implement faster commit
reachability check (maybe with the help of pack bitmaps or commit
cache), step 7 could become cheap and be moved up before 6 again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.
The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.
Keep the CE_HASHED flag to prevent adding entries twice.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>