* ph/transport-with-gitfile:
Fix is_gitfile() for files too small or larger than PATH_MAX to be a gitfile
Add test showing git-fetch groks gitfiles
Teach transport about the gitfile mechanism
Learn to handle gitfiles in enter_repo
enter_repo: do not modify input
It's possible that while pack-objects is running, a
simultaneously running prune process might delete a pack
that we are interested in. Because we load the pack indices
early on, we know that the pack contains our item, but by
the time we try to open and map it, it is gone.
Since c715f78, we already protect against this in the normal
object access code path, but pack-objects accesses the packs
at a lower level. In the normal access path, we call
find_pack_entry, which will call find_pack_entry_one on each
pack index, which does the actual lookup. If it gets a hit,
we will actually open and verify the validity of the
matching packfile (using c715f78's is_pack_valid). If we
can't open it, we'll issue a warning and pretend that we
didn't find it, causing us to go on to the next pack (or on
to loose objects).
Furthermore, we will cache the descriptor to the opened
packfile. Which means that later, when we actually try to
access the object, we are likely to still have that packfile
opened, and won't care if it has been unlinked from the
filesystem.
Notice the "likely" above. If there is another pack access
in the interim, and we run out of descriptors, we could
close the pack. And then a later attempt to access the
closed pack could fail (we'll try to re-open it, of course,
but it may have been deleted). In practice, this doesn't
happen because we tend to look up items and then access them
immediately.
Pack-objects does not follow this code path. Instead, it
accesses the packs at a much lower level, using
find_pack_entry_one directly. This means we skip the
is_pack_valid check, and may end up with the name of a
packfile, but no open descriptor.
We can add the same is_pack_valid check here. Unfortunately,
the access patterns of pack-objects are not quite as nice
for keeping lookup and object access together. We look up
each object as we find out about it, and the only later when
writing the packfile do we necessarily access it. Which
means that the opened packfile may be closed in the interim.
In practice, however, adding this check still has value, for
three reasons.
1. If you have a reasonable number of packs and/or a
reasonable file descriptor limit, you can keep all of
your packs open simultaneously. If this is the case,
then the race is impossible to trigger.
2. Even if you can't keep all packs open at once, you
may end up keeping the deleted one open (i.e., you may
get lucky).
3. The race window is shortened. You may notice early that
the pack is gone, and not try to access it. Triggering
the problem without this check means deleting the pack
any time after we read the list of index files, but
before we access the looked-up objects. Triggering it
with this check means deleting the pack means deleting
the pack after we do a lookup (and successfully access
the packfile), but before we access the object. Which
is a smaller window.
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/check-ref-format-3: (23 commits)
add_ref(): verify that the refname is formatted correctly
resolve_ref(): expand documentation
resolve_ref(): also treat a too-long SHA1 as invalid
resolve_ref(): emit warnings for improperly-formatted references
resolve_ref(): verify that the input refname has the right format
remote: avoid passing NULL to read_ref()
remote: use xstrdup() instead of strdup()
resolve_ref(): do not follow incorrectly-formatted symbolic refs
resolve_ref(): extract a function get_packed_ref()
resolve_ref(): turn buffer into a proper string as soon as possible
resolve_ref(): only follow a symlink that contains a valid, normalized refname
resolve_ref(): use prefixcmp()
resolve_ref(): explicitly fail if a symlink is not readable
Change check_refname_format() to reject unnormalized refnames
Inline function refname_format_print()
Make collapse_slashes() allocate memory for its result
Do not allow ".lock" at the end of any refname component
Refactor check_refname_format()
Change check_ref_format() to take a flags argument
Change bad_ref_char() to return a boolean value
...
* cb/common-prefix-unification:
rename pathspec_prefix() to common_prefix() and move to dir.[ch]
consolidate pathspec_prefix and common_prefix
remove prefix argument from pathspec_prefix
When core.ignorecase is turned on and there are stale index
entries, "git commit" can sometimes report directories as
untracked, even though they contain tracked files.
You can see an example of this with:
# make a case-insensitive repo
git init repo && cd repo &&
git config core.ignorecase true &&
# with some tracked files in a subdir
mkdir subdir &&
> subdir/one &&
> subdir/two &&
git add . &&
git commit -m base &&
# now make the index entries stale
touch subdir/* &&
# and then ask commit to update those entries and show
# us the status template
git commit -a
which will report "subdir/" as untracked, even though it
clearly contains two tracked files. What is happening in the
commit program is this:
1. We load the index, and for each entry, insert it into the index's
name_hash. In addition, if ignorecase is turned on, we make an
entry in the name_hash for the directory (e.g., "contrib/"), which
uses the following code from 5102c61's hash_index_entry_directories:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
}
Note that we only add the directory entry if there is not already an
entry.
2. We run add_files_to_cache, which gets updated information for each
cache entry. It helpfully inserts this information into the cache,
which calls replace_index_entry. This in turn calls
remove_name_hash() on the old entry, and add_name_hash() on the new
one. But remove_name_hash doesn't actually remove from the hash, it
only marks it as "no longer interesting" (from cache.h):
/*
* We don't actually *remove* it, we can just mark it invalid so that
* we won't find it in lookups.
*
* Not only would we have to search the lists (simple enough), but
* we'd also have to rehash other hash buckets in case this makes the
* hash bucket empty (common). So it's much better to just mark
* it.
*/
static inline void remove_name_hash(struct cache_entry *ce)
{
ce->ce_flags |= CE_UNHASHED;
}
This is OK in the specific-file case, since the entries in the hash
form a linked list, and we can just skip the "not here anymore"
entries during lookup.
But for the directory hash entry, we will _not_ write a new entry,
because there is already one there: the old one that is actually no
longer interesting!
3. While traversing the directories, we end up in the
directory_exists_in_index_icase function to see if a directory is
interesting. This in turn checks index_name_exists, which will
look up the directory in the index's name_hash. We see the old,
deleted record, and assume there is nothing interesting. The
directory gets marked as untracked, even though there are index
entries in it.
The problem is in the code I showed above:
hash = hash_name(ce->name, ptr - ce->name);
if (!lookup_hash(hash, &istate->name_hash)) {
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
}
Having a single cache entry that represents the directory is
not enough; that entry may go away if the index is changed.
It may be tempting to say that the problem is in our removal
method; if we removed the entry entirely instead of simply
marking it as "not here anymore", then we would know we need
to insert a new entry. But that only covers this particular
case of remove-replace. In the more general case, consider
something like this:
1. We add "foo/bar" and "foo/baz" to the index. Each gets
their own entry in name_hash, plus we make a "foo/"
entry that points to "foo/bar".
2. We remove the "foo/bar" entry from the index, and from
the name_hash.
3. We ask if "foo/" exists, and see no entry, even though
"foo/baz" exists.
So we need that directory entry to have the list of _all_
cache entries that indicate that the directory is tracked.
So that implies making a linked list as we do for other
entries, like:
hash = hash_name(ce->name, ptr - ce->name);
pos = insert_hash(hash, &istate->name_hash);
if (pos) {
ce->next = *pos;
*pos = ce;
}
But that's not right either. In fact, it shows a second bug
in the current code, which is that the "ce->next" pointer is
supposed to be linking entries for a specific filename
entry, but here we are overwriting it for the directory
entry. So the same cache entry ends up in two linked lists,
but they share the same "next" pointer.
As it turns out, this second bug can't be triggered in the
current code. The "if (pos)" conditional is totally dead
code; pos will only be non-NULL if there was an existing
hash entry, and we already checked that there wasn't one
through our call to lookup_hash.
But fixing the first bug means taking out that call to
lookup_hash, which is going to activate the buggy dead code,
and we'll end up splicing the two linked lists together.
So we need to have a separate next pointer for the list in
the directory bucket, and we need to traverse that list in
index_name_exists when we are looking up a directory.
This bloats "struct cache_entry" by a few bytes. Which is
annoying, because it's only necessary when core.ignorecase
is enabled. There's not an easy way around it, short of
separating out the "next" pointers from cache_entry entirely
(i.e., having a separate "cache_entry_list" struct that gets
stored in the name_hash). In practice, it probably doesn't
matter; we have thousands of cache entries, compared to the
millions of objects (where adding 4 bytes to the struct
actually does impact performance).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This code calls git_config from a helper function to parse the config entry
it is interested in. Calling git_config in this way may cause a problem if
the helper function can be called after a previous call to git_config by
another function since the second call to git_config may reset some
variable to the value in the config file which was previously overridden.
The above is not a problem in this case since the function passed to
git_config only parses one config entry and the variable it sets is not
assigned outside of the parsing function. But a programmer who desires
all of the standard config options to be parsed may be tempted to modify
git_attr_config() so that it falls back to git_default_config() and then it
_would_ be vulnerable to the above described behavior.
So, move the call to git_config up into the top-level cmd_* function and
move the responsibility for parsing core.attributesfile into the main
config file parser.
Which is only the logical thing to do ;-)
Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Record information about resolve_ref(), hard-won via reverse
engineering, in a comment for future spelunkers.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously, get_sha1_hex() would read one character past the end of a
null-terminated string whose strlen was an even number less than 40.
Although the function correctly returned -1 in these cases, the extra
memory access might have been to uninitialized (or even, conceivably,
unallocated) memory.
Add a check to avoid reading past the end of a string.
This problem was discovered by Thomas Rast <trast@student.ethz.ch>
using valgrind.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rr/revert-cherry-pick-continue:
builtin/revert.c: make commit_list_append() static
revert: Propagate errors upwards from do_pick_commit
revert: Introduce --continue to continue the operation
revert: Don't implicitly stomp pending sequencer operation
revert: Remove sequencer state when no commits are pending
reset: Make reset remove the sequencer state
revert: Introduce --reset to remove sequencer state
revert: Make pick_commits functionally act on a commit list
revert: Save command-line options for continuing operation
revert: Save data for continuing after conflict resolution
revert: Don't create invalid replay_opts in parse_args
revert: Separate cmdline parsing from functional code
revert: Introduce struct to keep command-line options
revert: Eliminate global "commit" variable
revert: Rename no_replay to record_origin
revert: Don't check lone argument in get_encoding
revert: Simplify and inline add_message_to_msg
config: Introduce functions to write non-standard file
advice: Introduce error_resolve_conflict
entr_repo(..., 0) currently modifies the input to strip away
trailing slashes. This means that we some times need to copy the
input to keep the original.
Change it to unconditionally copy it into the used_path buffer so
we can safely use the input without having to copy it. Also store
a working copy in validated_path up-front before we start
resolving anything.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Phil Hord <hordp@cisco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also make common_prefix_len() static as this refactoring makes dir.c
itself the only caller of this helper function.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Passing a prefix to a function that is supposed to find the prefix is
strange. And it's really only used if the pathspec is NULL. Make the
callers handle this case instead.
As we are always returning a fresh copy of a string (or NULL), change the
type of the returned value to non-const "char *".
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function was not gentle at all to the callers and died without giving
them a chance to deal with possible errors. Rename it to read_gitfile(),
and update all the callers.
As no existing caller needs a true "gently" variant, we do not bother
adding one at this point.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Check if <path> is a valid git-dir or a valid git-file that points
to a valid git-dir.
We want tests to be independent from the fact that a git-dir may
be a git-file. Thus we changed tests to use this feature.
Signed-off-by: Fredrik Gustafsson <iveqy@iveqy.com>
Mentored-by: Jens Lehmann <Jens.Lehmann@web.de>
Mentored-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The following sequence of commands reveals an issue with error
reporting of relative paths:
$ mkdir sub
$ cd sub
$ git ls-files --error-unmatch ../bbbbb
error: pathspec 'b' did not match any file(s) known to git.
$ git commit --error-unmatch ../bbbbb
error: pathspec 'b' did not match any file(s) known to git.
This bug is visible only if the normalized path (i.e., the relative
path from the repository root) is longer than the prefix.
Otherwise, the code skips over the normalized path and reads from
an unused memory location which still contains a leftover of the
original command line argument.
So instead, use the existing facilities to deal with relative paths
correctly.
Also fix inconsistency between "checkout" and "commit", e.g.
$ cd Documentation
$ git checkout nosuch.txt
error: pathspec 'Documentation/nosuch.txt' did not match...
$ git commit nosuch.txt
error: pathspec 'nosuch.txt' did not match...
by propagating the prefix down the codepath that reports the error.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce two new functions corresponding to "git_config_set" and
"git_config_set_multivar" to write a non-standard configuration file.
Expose these new functions in cache.h for other git programs to use.
Helped-by: Jeff King <peff@peff.net>
Helped-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to do partial commits, git-commit overlays a tree on the
cache and checks pathspecs against the result. Currently, the
overlaying is done using "prefix" which prevents relative pathspecs
with ".." and absolute pathspec from matching when they refer to
files not under "prefix" and absent from the index, but still in
the tree (i.e. files staged for removal).
The point of providing a prefix at all is performance optimization.
If we say there is no common prefix for the files of interest, then
we have to read the entire tree into the index.
But even if we cannot use the working directory as a prefix, we can
still figure out if there is a common prefix for all given paths,
and use that instead. The pathspec_prefix() routine from ls-files.c
does exactly that.
Any use of global variables is removed from pathspec_prefix() so
that it can be called from commit.c.
Reported-by: Reuben Thomas <rrt@sc3d.org>
Analyzed-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/index-pack:
verify-pack: use index-pack --verify
index-pack: show histogram when emulating "verify-pack -v"
index-pack: start learning to emulate "verify-pack -v"
index-pack: a miniscule refactor
index-pack --verify: read anomalous offsets from v2 idx file
write_idx_file: need_large_offset() helper function
index-pack: --verify
write_idx_file: introduce a struct to hold idx customization options
index-pack: group the delta-base array entries also by type
Conflicts:
builtin/verify-pack.c
cache.h
sha1_file.c
* jk/clone-cmdline-config:
clone: accept config options on the command line
config: make git_config_parse_parameter a public function
remote: use new OPT_STRING_LIST
parse-options: add OPT_STRING_LIST helper
* jc/zlib-wrap:
zlib: allow feeding more than 4GB in one go
zlib: zlib can only process 4GB at a time
zlib: wrap deflateBound() too
zlib: wrap deflate side of the API
zlib: wrap inflateInit2 used to accept only for gzip format
zlib: wrap remaining calls to direct inflate/inflateEnd
zlib wrapper: refactor error message formatter
Conflicts:
sha1_file.c
In a workload other than "git log" (without pathspec nor any option that
causes us to inspect trees and blobs), the recency pack order is said to
cause the access jump around quite a bit. Add a hook to allow us observe
how bad it is.
"git config core.logpackaccess /var/tmp/pal.txt" will give you the log
in the specified file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for dividing the refs of a single repository into multiple
namespaces, each of which can have its own branches, tags, and HEAD.
Git can expose each namespace as an independent repository to pull from
and push to, while sharing the object store, and exposing all the refs
to operations such as git-gc.
Storing multiple repositories as namespaces of a single repository
avoids storing duplicate copies of the same objects, such as when
storing multiple branches of the same source. The alternates mechanism
provides similar support for avoiding duplicates, but alternates do not
prevent duplication between new objects added to the repositories
without ongoing maintenance, while namespaces do.
To specify a namespace, set the GIT_NAMESPACE environment variable to
the namespace. For each ref namespace, git stores the corresponding
refs in a directory under refs/namespaces/. For example,
GIT_NAMESPACE=foo will store refs under refs/namespaces/foo/. You can
also specify namespaces via the --namespace option to git.
Note that namespaces which include a / will expand to a hierarchy of
namespaces; for example, GIT_NAMESPACE=foo/bar will store refs under
refs/namespaces/foo/refs/namespaces/bar/. This makes paths in
GIT_NAMESPACE behave hierarchically, so that cloning with
GIT_NAMESPACE=foo/bar produces the same result as cloning with
GIT_NAMESPACE=foo and cloning from that repo with GIT_NAMESPACE=bar. It
also avoids ambiguity with strange namespace paths such as
foo/refs/heads/, which could otherwise generate directory/file conflicts
within the refs directory.
Add the infrastructure for ref namespaces: handle the GIT_NAMESPACE
environment variable and --namespace option, and support iterating over
refs in a namespace.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/streaming-filter:
t0021: test application of both crlf and ident
t0021-conversion.sh: fix NoTerminatingSymbolAtEOF test
streaming: filter cascading
streaming filter: ident filter
Add LF-to-CRLF streaming conversion
stream filter: add "no more input" to the filters
Add streaming filter API
convert.h: move declarations for conversion from cache.h
* jc/streaming:
sha1_file: use the correct type (ssize_t, not size_t) for read-style function
streaming: read loose objects incrementally
sha1_file.c: expose helpers to read loose objects
streaming: read non-delta incrementally from a pack
streaming_write_entry(): support files with holes
convert: CRLF_INPUT is a no-op in the output codepath
streaming_write_entry(): use streaming API in write_entry()
streaming: a new API to read from the object store
write_entry(): separate two helper functions out
unpack_object_header(): make it public
sha1_object_info_extended(): hint about objects in delta-base cache
sha1_object_info_extended(): expose a bit more info
packed_object_info_detail(): do not return a string
* ef/maint-win-verify-path:
verify_dotfile(): do not assume '/' is the path seperator
verify_path(): simplify check at the directory boundary
verify_path: consider dos drive prefix
real_path: do not assume '/' is the path seperator
A Windows path starting with a backslash is absolute
We use this internally to parse "git -c core.foo=bar", but
the general format of "key=value" is useful for other
places.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of objects we read from the repository and data we try to put
into the repository are represented in "unsigned long", so that on larger
architectures we can handle objects that weigh more than 4GB.
But the interface defined in zlib.h to communicate with inflate/deflate
limits avail_in (how many bytes of input are we calling zlib with) and
avail_out (how many bytes of output from zlib are we ready to accept)
fields effectively to 4GB by defining their type to be uInt.
In many places in our code, we allocate a large buffer (e.g. mmap'ing a
large loose object file) and tell zlib its size by assigning the size to
avail_in field of the stream, but that will truncate the high octets of
the real size. The worst part of this story is that we often pass around
z_stream (the state object used by zlib) to keep track of the number of
used bytes in input/output buffer by inspecting these two fields, which
practically limits our callchain to the same 4GB limit.
Wrap z_stream in another structure git_zstream that can express avail_in
and avail_out in unsigned long. For now, just die() when the caller gives
a size that cannot be given to a single zlib call. In later patches in the
series, we would make git_inflate() and git_deflate() internally loop to
give callers an illusion that our "improved" version of zlib interface can
operate on a buffer larger than 4GB in one go.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap deflateInit, deflate, and deflateEnd for everybody, and the sole use
of deflateInit2 in remote-curl.c to tell the library to use gzip header
and trailer in git_deflate_init_gzip().
There is only one caller that cares about the status from deflateEnd().
Introduce git_deflate_end_gently() to let that sole caller retrieve the
status and act on it (i.e. die) for now, but we would probably want to
make inflate_end/deflate_end die when they ran out of memory and get
rid of the _gently() kind.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
http-backend.c uses inflateInit2() to tell the library that it wants to
accept only gzip format. Wrap it in a helper function so that readers do
not have to wonder what the magic numbers 15 and 16 are for.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This finally gets rid of the inefficient verify-pack implementation that
walks objects in the packfile in their object name order and replaces it
with a call to index-pack --verify. As a side effect, it also removes
packed_object_info_detail() API which is rather expensive.
As this changes the way errors are reported (verify-pack used to rely on
the usual runtime error detection routine unpack_entry() to diagnose the
CRC errors in an entry in the *.idx file; index-pack --verify checks the
whole *.idx file in one go), update a test that expected the string "CRC"
to appear in the error message.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/maint-config-alias-fix:
handle_options(): do not miscount how many arguments were used
config: always parse GIT_CONFIG_PARAMETERS during git_config
git_config: don't peek at global config_parameters
config: make environment parsing routines static
* jk/maint-config-alias-fix:
handle_options(): do not miscount how many arguments were used
config: always parse GIT_CONFIG_PARAMETERS during git_config
git_config: don't peek at global config_parameters
config: make environment parsing routines static
Conflicts:
config.c
This fixes prefix_path() not recognizing e.g. \foo\bar as an absolute path
on Windows.
Signed-off-by: Theo Niessink <theo@taletn.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before adding the streaming filter API to the conversion layer,
move the existing declarations related to the conversion to its
own header file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/git-connection-deadlock-fix:
test core.gitproxy configuration
send-pack: avoid deadlock on git:// push with failed pack-objects
connect: let callers know if connection is a socket
connect: treat generic proxy processes like ssh processes
Conflicts:
connect.c
* jk/git-connection-deadlock-fix:
test core.gitproxy configuration
send-pack: avoid deadlock on git:// push with failed pack-objects
connect: let callers know if connection is a socket
connect: treat generic proxy processes like ssh processes
Conflicts:
connect.c
* jc/bigfile:
Bigfile: teach "git add" to send a large file straight to a pack
index_fd(): split into two helper functions
index_fd(): turn write_object and format_check arguments into one flag
Nobody outside of git_config_from_parameters should need
to use the GIT_CONFIG_PARAMETERS parsing functions, so let's
make them private.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/magic-pathspec:
setup.c: Fix some "symbol not declared" sparse warnings
t3703: Skip tests using directory name ":" on Windows
revision.c: leave a note for "a lone :" enhancement
t3703, t4208: add test cases for magic pathspec
rev/path disambiguation: further restrict "misspelled index entry" diag
fix overslow :/no-such-string-ever-existed diagnostics
fix overstrict :<path> diagnosis
grep: use get_pathspec() correctly
pathspec: drop "lone : means no pathspec" from get_pathspec()
Revert "magic pathspec: add ":(icase)path" to match case insensitively"
magic pathspec: add ":(icase)path" to match case insensitively
magic pathspec: futureproof shorthand form
magic pathspec: add tentative ":/path/from/top/level" pathspec support
Make map_sha1_file(), parse_sha1_header() and unpack_sha1_header()
available to the streaming read API by exporting them via cache.h header
file.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the output to a path does not have to be converted, we can read from
the object database from the streaming API and write to the file in the
working tree, without having to hold everything in the memory.
The ident, auto- and safe- crlf conversions inherently require you to read
the whole thing before deciding what to do, so while it is technically
possible to support them by using a buffer of an unbound size or rewinding
and reading the stream twice, it is less practical than the traditional
"read the whole thing in core and convert" approach.
Adding streaming filters for the other conversions on top of this should
be doable by tweaking the can_bypass_conversion() function (it should be
renamed to can_filter_stream() when it happens). Then the streaming API
can be extended to wrap the git_istream streaming_write_entry() opens on
the underlying object in another git_istream that reads from it, filters
what is read, and let the streaming_write_entry() read the filtered
result. But that is outside the scope of this series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An object found in the delta-base cache is not guaranteed to
stay there, but we know it came from a pack and it is likely
to give us a quick access if we read_sha1_file() it right now,
which is a piece of useful information.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/replacing:
read_sha1_file(): allow selective bypassing of replacement mechanism
inline lookup_replace_object() calls
read_sha1_file(): get rid of read_sha1_file_repl() madness
t6050: make sure we test not just commit replacement
Declare lookup_replace_object() in cache.h, not in commit.h
Conflicts:
environment.c
* jk/git-connection-deadlock-fix:
test core.gitproxy configuration
send-pack: avoid deadlock on git:// push with failed pack-objects
connect: let callers know if connection is a socket
connect: treat generic proxy processes like ssh processes
Conflicts:
connect.c
The original interface for sha1_object_info() takes an object name and
gives back a type and its size (the latter is given only when it was
asked). The new interface wraps its implementation and exposes a bit
more pieces of information that the interface used to discard, namely:
- where the object is stored (loose? cached? packed?)
- if packed, where in which packfile?
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
* In the earlier round, this used u.pack.delta to record the length of
the delta chain, but the caller is not necessarily interested in the
length of the delta chain per-se, but may only want to know if it is a
delta against another object or is stored as a deflated data. Calling
packed_object_info_detail() involves walking the reverse index chain to
compute the store size of the object and is unnecessarily expensive.
We could resurrect the code if a new caller wants to know, but I doubt
it.
The return codes of git_config_set() and friends are magic numbers right
in the source. #define them in cache.h where the functions are declared,
and use the constants in the source.
Also, mention the resulting exit codes of "git config" in its man page
(and complete the list).
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead return an integer that can be given to typename() if
the caller wants a string, just like everybody else does.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
They might care because they want to do a half-duplex close.
With pipes, that means simply closing the output descriptor;
with a socket, you must actually call shutdown.
Instead of exposing the magic no_fork child_process struct,
let's encapsulate the test in a function.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/convert:
convert: make it harder to screw up adding a conversion attribute
convert: make it safer to add conversion attributes
convert: give saner names to crlf/eol variables, types and functions
convert: rename the "eol" global variable to "core_eol"
* jc/bigfile:
Bigfile: teach "git add" to send a large file straight to a pack
index_fd(): split into two helper functions
index_fd(): turn write_object and format_check arguments into one flag
* jc/replacing:
read_sha1_file(): allow selective bypassing of replacement mechanism
inline lookup_replace_object() calls
read_sha1_file(): get rid of read_sha1_file_repl() madness
t6050: make sure we test not just commit replacement
Declare lookup_replace_object() in cache.h, not in commit.h
The way "object replacement" mechanism was tucked to the read_sha1_file()
interface was suboptimal in a couple of ways:
- Callers that want it to die with useful diagnosis upon seeing a corrupt
object does not have a way to say that they do not want any object
replacement.
- Callers who do not want it to die but want to handle the errors
themselves are told to arrange to call read_object(), but the function
does not use the replacement mechanism, and also it is a file scope
static function that not many callers can call to begin with.
This adds a read_sha1_file_extended() that takes a set of flags; the
callers of read_sha1_file() passes a flag READ_SHA1_FILE_REPLACE to ask
for object replacement mechanism to kick in.
Later, we could add another flag bit to tell the function to return an
error instead of dying and then remove the misguided "call read_object()
yourself".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a repository without object replacement, lookup_replace_object() should
be a no-op. Check the flag "read_replace_refs" on the side of the caller,
and bypess a function call when we know we are not dealing with replacement.
Also, even when we are set up to replace objects, if we do not find any
replacement defined, flip that flag off to avoid function call overhead
for all the later object accesses.
As this change the semantics of the flag from "do we need read the
replacement definition?" to "do we need to check with the lookup table?"
the flag needs to be renamed later to something saner, e.g. "use_replace",
when the codebase is calmer, but not now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callers want to silently get a replacement object, and they do not
care what the real name of the replacement object is. Worse yet, no sane
interface to return the underlying object without replacement is provided.
Remove the function and make only the few callers that want the name of
the replacement object find it themselves.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The declaration is misplaced as the replace API is supposed to affect
not just commits, but all types of objects.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git cmd :/no-such-string-ever-existed" runs an extra round of get_sha1()
since 009fee4 (Detailed diagnosis when parsing an object name fails.,
2009-12-07). Once without error diagnosis to see there is no commit with
such a string in the log message (hence "it cannot be a ref"), and after
seeing that :/no-such-string-ever-existed is not a filename (hence "it
cannot be a path, either"), another time to give "better diagnosis".
The thing is, the second time it runs, we already know that traversing the
history all the way down to the root will _not_ find any matching commit.
Rename misguided "gently" parameter, which is turned off _only_ when the
"detailed diagnosis" codepath knows that it cannot be a ref and making the
call only for the caller to die with a message. Flip its meaning (and
adjust the callers) and call it "only_to_die", which is not a great name,
but it describes far more clearly what the codepaths that switches their
behaviour based on this variable do.
On my box, the command spends ~1.8 seconds without the patch to make the
report; with the patch it spends ~1.12 seconds.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Yes, it is clear that "eol" wants to mean some sort of end-of-line thing,
but as the name of a global variable, it is way too short to describe what
kind of end-of-line thing it wants to represent. Besides, there are many
codepaths that want to use their own local "char *eol" variable to point
at the end of the current line they are processing.
This global variable holds what we read from core.eol configuration
variable. Name it as such.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.
Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/struct-pathspec:
pathspec: rename per-item field has_wildcard to use_wildcard
Improve tree_entry_interesting() handling code
Convert read_tree{,_recursive} to support struct pathspec
Reimplement read_tree_recursive() using tree_entry_interesting()
The pack-objects command should take notice of the object file and
refrain from attempting to delta large ones, to be consistent with
the fast-import command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As the point of the last change is to allow use of strings as
literals no matter what characters are in them, "has_wildcard"
does not match what we use this field for anymore.
It is used to decide if the wildcard matching should be used, so
rename it to match the usage better.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The git-new-workdir script in contrib/ makes a new work tree by sharing
many subdirectories of the .git directory with the original repository.
When rerere.enabled is set in the original repository, but the user has
not encountered any conflicts yet, the original repository may not yet
have .git/rr-cache directory.
When rerere wants to run in a new work tree created from such a young
original repository, it fails to mkdir(2) .git/rr-cache that is a symlink
to a yet-to-be-created directory.
There are three possible approaches to this:
- A naive solution is not to create a symlink in the git-new-workdir
script to a directory the original does not have (yet). This is not a
solution, as we tend to lazily create subdirectories of .git/, and
having rerere.enabled configuration set is a strong indication that the
user _wants_ to have this lazy creation to happen;
- We could always create .git/rr-cache upon repository creation. This is
tempting but will not help people with existing repositories.
- Detect this case by seeing that mkdir(2) failed with EEXIST, checking
that the path is a symlink, and try running mkdir(2) on the link
target.
This patch solves the issue by doing the third one.
Strictly speaking, this is incomplete. It does not attempt to handle
relative symbolic link that points into the original repository, but this
is good enough to help people who use contrib/workdir/git-new-workdir
script.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jn/test-sanitize-git-env:
tests: scrub environment of GIT_* variables
config: drop support for GIT_CONFIG_NOGLOBAL
gitattributes: drop support for GIT_ATTR_NOGLOBAL
tests: suppress system gitattributes
tests: stop worrying about obsolete environment variables
When we had to refresh the index internally before running diff or status,
we opportunistically updated the $GIT_INDEX_FILE so that later invocation
of git can use the lstat(2) we already did in this invocation.
Make them share a helper function to do so.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-fd-limit:
sha1_file.c: Don't retain open fds on small packs
mingw: add minimum getrlimit() compatibility stub
Limit file descriptors used by packs
* ab/i18n-basic:
i18n: "make distclean" should clean up after "make pot"
i18n: Makefile: "pot" target to extract messages marked for translation
i18n: add stub Q_() wrapper for ngettext
i18n: do not poison translations unless GIT_GETTEXT_POISON envvar is set
i18n: add GETTEXT_POISON to simulate unfriendly translator
i18n: add no-op _() and N_() wrappers
commit, status: use status_printf{,_ln,_more} helpers
commit: refer to commit template as s->fp
wt-status: add helpers for printing wt-status lines
Conflicts:
builtin/commit.c
* jk/trace-sifter:
trace: give repo_setup trace its own key
add packet tracing debug code
trace: add trace_strbuf
trace: factor out "do we want to trace" logic
trace: refactor to support multiple env variables
trace: add trace_vprintf
--separate-git-dir tells git to create git dir at the specified
location, instead of where it is supposed to be. A .git file that
points to that location will be put in place so that it appears normal
to repo discovery process.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rename the make_*_path functions so it's clearer what they do, in
particlar make clear what the differnce between make_absolute_path and
make_nonrelative_path is by renaming them real_path and absolute_path
respectively. make_relative_path has an understandable name and is
renamed to relative_path to maintain the name convention.
The function calls have been replaced 1-to-1 in their usage.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
Prepare draft release notes to 1.7.4.2
gitweb: highlight: replace tabs with spaces
make_absolute_path: return the input path if it points to our buffer
valgrind: ignore SSE-based strlen invalid reads
diff --submodule: split into bite-sized pieces
cherry: split off function to print output lines
branch: split off function that writes tracking info and commit subject
standardize brace placement in struct definitions
compat: make gcc bswap an inline function
enums: omit trailing comma for portability
Conflicts:
RelNotes
Since v1.7.2-rc0~23^2~2 (Add per-repository eol normalization,
2010-05-19), building with gcc -std=gnu89 -pedantic produces warnings
like the following:
convert.c:21:11: warning: comma at end of enumerator list [-pedantic]
gcc is right to complain --- these commas are not permitted in C89.
In the spirit of v1.7.2-rc0~32^2~16 (2010-05-14), remove them.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As "gcc -pedantic" notices, a two's complement 1-bit signed integer
cannot represent the value '1'.
dir.c: In function 'init_pathspec':
dir.c:1291:4: warning: overflow in implicit constant conversion [-Woverflow]
In the spirit of v1.7.1-rc1~10 (2010-04-06), 'unsigned' is what was
intended, so let's make the flags unsigned.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-fd-limit:
sha1_file.c: Don't retain open fds on small packs
mingw: add minimum getrlimit() compatibility stub
Limit file descriptors used by packs
Now that test-lib sets $HOME to protect against pollution from user
settings, GIT_CONFIG_NOGLOBAL is not needed for use by the test
suite any more. And as luck would have it, a quick code search
reveals no other users in the wild.
This patch does not affect GIT_CONFIG_NOSYSTEM, which is still
needed.
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The default of 7 comes from fairly early in git development, when
seven hex digits was a lot (it covers about 250+ million hash
values). Back then I thought that 65k revisions was a lot (it was what
we were about to hit in BK), and each revision tends to be about 5-10
new objects or so, so a million objects was a big number.
These days, the kernel isn't even the largest git project, and even
the kernel has about 220k revisions (_much_ bigger than the BK tree
ever was) and we are approaching two million objects. At that point,
seven hex digits is still unique for a lot of them, but when we're
talking about just two orders of magnitude difference between number
of objects and the hash size, there _will_ be collisions in truncated
hash values. It's no longer even close to unrealistic - it happens all
the time.
We should both increase the default abbrev that was unrealistically
small, _and_ add a way for people to set their own default per-project
in the git config file.
This is the first step to first make it configurable; the default of 7
is not raised yet.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This reverts commit 72a5b561fc, as adding
fixed number of hexdigits more than necessary to make one object name
locally unique does not help in futureproofing the uniqueness of names
we generate today.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This shows a trace of all packets coming in or out of a given
program. This can help with debugging object negotiation or
other protocol issues.
To keep the code changes simple, we operate at the lowest
level, meaning we don't necessarily understand what's in the
packets. The one exception is a packet starting with "PACK",
which causes us to skip that packet and turn off tracing
(since the gigantic pack data will not be interesting to
read, at least not in the trace format).
We show both written and read packets. In the local case,
this may mean you will see packets twice (written by the
sender and read by the receiver). However, for cases where
the other end is remote, this allows you to see the full
conversation.
Packet tracing can be enabled with GIT_TRACE_PACKET=<foo>,
where <foo> takes the same arguments as GIT_TRACE.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you happen to have a strbuf, it is a little more readable
and a little more efficient to be able to print it directly
instead of jamming it through the trace_printf interface.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As we add more tracing areas, this will avoid repeated code.
Technically, trace_printf already checks this and will avoid
printing if the trace key is not set. However, callers may
want to find out early whether or not tracing is enabled so
they can avoid doing work in the common non-trace case.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now you turn all tracing off and on with GIT_TRACE. To
support new types of tracing without forcing the user to see
all of them, we will soon support turning each tracing area
on with GIT_TRACE_*.
This patch lays the groundwork by providing an interface
which does not assume GIT_TRACE. However, we still maintain
the trace_printf interface so that existing callers do not
need to be refactored.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a necessary cleanup to adding new types of trace
functions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The _ function is for translating strings into the user's chosen
language. The N_ macro just marks translatable strings for the
xgettext(1) tool without translating them; it is intended for use in
contexts where a function call cannot be used. So, for example:
fprintf(stderr, _("Expansion of alias '%s' failed; "
"'%s' is not a git command\n"),
cmd, argv[0]);
and
const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
/* ERROR_WOULD_OVERWRITE */
N_("Entry '%s' would be overwritten by merge. Cannot merge."),
[...]
Define such _ and N_ in a new gettext.h and include it in cache.h, so
they can be used everywhere. Each just returns its argument for now.
_ is a function rather than a macro like N_ to avoid the temptation to
use _("foo") as a string literal (which would be a compile-time error
once _(s) expands to an expression for the translation of s).
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a pack file is small enough that its entire contents fits within
one mmap window, mmap the file and then immediately close its file
descriptor. This reduces the number of file descriptors that are
needed to read from repositories with many tiny pack files, such
as one that has received 1000 pushes (and created 1000 small pack
files) since its last repack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/struct-pathspec: (22 commits)
t6004: add pathspec globbing test for log family
t7810: overlapping pathspecs and depth limit
grep: drop pathspec_matches() in favor of tree_entry_interesting()
grep: use writable strbuf from caller for grep_tree()
grep: use match_pathspec_depth() for cache/worktree grepping
grep: convert to use struct pathspec
Convert ce_path_match() to use match_pathspec_depth()
Convert ce_path_match() to use struct pathspec
struct rev_info: convert prune_data to struct pathspec
pathspec: add match_pathspec_depth()
tree_entry_interesting(): optimize wildcard matching when base is matched
tree_entry_interesting(): support wildcard matching
tree_entry_interesting(): fix depth limit with overlapping pathspecs
tree_entry_interesting(): support depth limit
tree_entry_interesting(): refactor into separate smaller functions
diff-tree: convert base+baselen to writable strbuf
glossary: define pathspec
Move tree_entry_interesting() to tree-walk.c and export it
tree_entry_interesting(): remove dependency on struct diff_options
Convert struct diff_options to use struct pathspec
...
Sanity-check config variable names when adding and retrieving them. As a side
effect code duplication between git_config_set_multivar and get_value (in
builtin/config.c) was removed and the common functionality was placed in
git_config_parse_key.
This breaks a test in t1300 which used invalid section-less keys in the tests
for "git -c". However, allowing such names there was useless, since there was
no way to set them via config file, and no part of git actually tried to use
section-less keys. This patch updates the test to use more realistic examples
as well as adding its own test.
Signed-off-by: Libor Pechacek <lpechacek@suse.cz>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Users are sometimes confused with two different types of "tracking" behavior
in Git: "remote-tracking" branches (e.g. refs/remotes/*/*) versus the
merge/rebase relationship between a local branch and its @{upstream}
(controlled by branch.foo.remote and branch.foo.merge config settings).
When the push.default is set to 'tracking', it specifies that a branch should
be pushed to its @{upstream} branch. In other words, setting push.default to
'tracking' applies only to the latter of the above two types of "tracking"
behavior.
In order to make this more understandable to the user, we rename the
push.default == 'tracking' option to push.default == 'upstream'.
push.default == 'tracking' is left as a deprecated synonym for 'upstream'.
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Functions such as hashcmp that expect a binary SHA-1 value take
parameters of type "unsigned char *" to avoid accepting a textual
SHA-1 passed by mistake. Unfortunately, this means passing the string
literal EMPTY_TREE_SHA1_BIN requires an ugly cast. Tweak the
definition of EMPTY_TREE_SHA1_BIN to produce a value of more
convenient type.
In the future the definition might change to
extern const unsigned char empty_tree_sha1_bin[20];
#define EMPTY_TREE_SHA1_BIN empty_tree_sha1_bin
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.
Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
never_interesting optimization is disabled if there is any wildcard
pathspec, even if it only matches exactly on trees.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is needed to replace pathspec_matches() in builtin/grep.c.
max_depth == -1 means infinite depth. Depth limit is only effective
when pathspec.recursive == 1. When pathspec.recursive == 0, the
behavior depends on match functions: non-recursive for
tree_entry_interesting() and recursive for match_pathspec{,_depth}
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The old pathspec structure remains as pathspec.raw[]. New things are
stored in pathspec.items[]. There's no guarantee that the pathspec
order in raw[] is exactly as in items[].
raw[] is external (source) data and is untouched by pathspec
manipulation functions. It eases migration from old const char ** to
this new struct.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/setup: (47 commits)
setup_work_tree: adjust relative $GIT_WORK_TREE after moving cwd
git.txt: correct where --work-tree path is relative to
Revert "Documentation: always respect core.worktree if set"
t0001: test git init when run via an alias
Remove all logic from get_git_work_tree()
setup: rework setup_explicit_git_dir()
setup: clean up setup_discovered_git_dir()
t1020-subdirectory: test alias expansion in a subdirectory
setup: clean up setup_bare_git_dir()
setup: limit get_git_work_tree()'s to explicit setup case only
Use git_config_early() instead of git_config() during repo setup
Add git_config_early()
git-rev-parse.txt: clarify --git-dir
t1510: setup case #31
t1510: setup case #30
t1510: setup case #29
t1510: setup case #28
t1510: setup case #27
t1510: setup case #26
t1510: setup case #25
...
* nd/maint-fix-add-typo-detection:
Revert "excluded_1(): support exclude files in index"
unpack-trees: fix sparse checkout's "unable to match directories"
unpack-trees: move all skip-worktree checks back to unpack_trees()
dir.c: add free_excludes()
cache.h: realign and use (1 << x) form for CE_* constants
This version of git_config() will be used during repository setup.
As a repository is being set up, $GIT_DIR is not nailed down yet,
git_pathdup() should not be used to get $GIT_DIR/config.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This logic is now only used by cmd_init_db(). setup_* functions do not
rely on it any more. Move all the logic to cmd_init_db() and turn
get_git_work_tree() into a simple function.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_git_work_tree() takes input as core.worktree, core.bare,
GIT_WORK_TREE and decides correct worktree setting.
Unfortunately it does not do its job well. core.worktree and
GIT_WORK_TREE should only be taken into account, if GIT_DIR is set
(which is handled by setup_explicit_git_dir). For other setup cases,
only core.bare matters.
Add a temporary variable setup_explicit to adjust get_git_work_tree()
behavior as such. This variable will be gone once setup_* rework is
done.
Also remove is_bare_repository_cfg check in set_git_work_tree() to
ease the rework. We are going to check for core.bare and core.worktree
early before setting worktree. For example, if core.bare is true, no
need to set worktree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/extended-sha1-relpath:
get_sha1: teach ":$n:<path>" the same relative path logic
get_sha1: support relative path ":path" syntax
Make prefix_path() return char* without const
Conflicts:
sha1_name.c
* jn/parse-options-extra:
update-index: migrate to parse-options API
setup: save prefix (original cwd relative to toplevel) in startup_info
parse-options: make resuming easier after PARSE_OPT_STOP_AT_NON_OPTION
parse-options: allow git commands to invent new option types
parse-options: never suppress arghelp if LITERAL_ARGHELP is set
parse-options: do not infer PARSE_OPT_NOARG from option type
parse-options: sanity check PARSE_OPT_NOARG flag
parse-options: move NODASH sanity checks to parse_options_check
parse-options: clearer reporting of API misuse
parse-options: Don't call parse_options_check() so much
prefix_path() allocates new buffer. There's no reason for it to keep
the buffer for itself and waste memory.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Save the path from the original cwd to the cwd at the end of the
setup procedure in the startup_info struct introduced in e37c1329
(2010-08-05). The value cannot vary from thread to thread anyway,
since the cwd is global.
So now in your builtin command, instead of passing prefix around,
when you want to convert a user-supplied path to a cwd-relative
path, you can use startup_info->prefix directly.
Caveat: As with the return value from setup_git_directory_gently(),
startup_info->prefix would be NULL when the original cwd is not a
subdir of the toplevel.
Longer term, this would allow the prefix to be reused when several
noncooperating functions require access to the same repository (for
example, when accessing configuration before running a builtin).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* pn/commit-autosquash:
add tests of commit --squash
commit: --squash option for use with rebase --autosquash
add tests of commit --fixup
commit: --fixup option for use with rebase --autosquash
pretty.c: teach format_commit_message() to reencode the output
commit: helper methods to reduce redundant blocks of code
Conflicts:
Documentation/git-commit.txt
t/t3415-rebase-autosquash.sh
A new whitespace "rule" is added that sets the tab width to use for
whitespace checks and fix-ups and replaces the hard-coded constant 8.
Since the setting is part of the rules, it can be set per file using
.gitattributes.
The new configuration is backwards compatible because older git versions
simply ignore unknown whitespace rules.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cb/leading-path-removal:
use persistent memory for rejected paths
do not overwrite files in leading path
lstat_cache: optionally return match_len
add function check_ok_to_remove()
t7607: add leading-path tests
t7607: use test-lib functions and check MERGE_HEAD
Conflicts:
t/t7607-merge-overwrite.sh
Earlier, the will_have_skip_worktree() checks are done in various
places, which makes it hard to traverse the index tree-alike, required
by excluded_from_list(). This patch moves all the checks into two
loops in unpack_trees().
Entries in index in this operation can be classified into two
groups: ones already in index before unpack_trees() is called and ones
added to index after traverse_trees() is called.
In both groups, before checking file status on worktree, the future
skip-worktree bit must be checked, so that if an entry will be outside
worktree, worktree should not be checked.
For the first group, the future skip-worktree bit is precomputed and
stored as CE_NEW_SKIP_WORKTREE in the first loop before
traverse_trees() is called so that *way_merge() function does not need
to compute it again.
For the second group, because we don't know what entries will be in
this group until traverse_trees() finishes, operations that need
future skip-worktree check is delayed until CE_NEW_SKIP_WORKTREE is
computed in the second loop. CE_ADDED is used to mark entries in the
second group.
CE_ADDED and CE_NEW_SKIP_WORKTREE are temporary flags used in
unpack_trees(). CE_ADDED is only used by add_to_index(), which should
not be called while unpack_trees() is running.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* builtin/commit.c: Replace block of code with a one-liner call to
logmsg_reencode().
* commit.c: new function for looking up a comit by name
* pretty.c: helper methods for getting output encodings
Add helpers get_log_output_encoding() and
get_commit_output_encoding() that eliminate some messy and duplicate
if-blocks.
Signed-off-by: Pat Notz <patnotz@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even though git makes sure that it uses enough hexdigits to show an
abbreviated object name unambiguously, as more objects are added to the
repository over time, a short name that used to be unique will stop being
unique. Git uses this many extra hexdigits that are more than necessary
to make the object name currently unique, in the hope that its output will
stay unique a bit longer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the work tree contains an untracked file x, and
unpack-trees wants to checkout a path x/*, the
file x is removed unconditionally.
Instead, apply the same checks that are normally
used for untracked files, and abort if the file
cannot be removed.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
The explanatory comment before the definition of ALLOC_GROW carefully
lists arguments that will be used more than once and thus cannot have
side-effects; a lazy reader might conclude that the arguments not
listed are used only once and side effects safe.
Correct it to list all three arguments, avoiding this confusion.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are buggy implementations of S_ISxxx(m) macros on some platforms
(e.g. NetBSD). The issue is that NetBSD doesn't take care to wrap its
macro arguments in parentheses, so on Linux and sane systems we have
S_ISREG(m) defined as something like:
(((m) & S_IFMT) == S_IFREG)
But on NetBSD:
((m & _S_IFMT) == _S_IFREG)
Since a caller in builtin/diff.c called our macro as `S_IFREG | 0644'
this bug introduced a logic error on NetBSD, since the precedence of
bit-wise & is higher than | in C.
[jc: took change description from Ævar Arnfjörð Bjarmason's patch]
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kf/askpass-config:
Extend documentation of core.askpass and GIT_ASKPASS.
Allow core.askpass to override SSH_ASKPASS.
Add a new option 'core.askpass'.
* jn/merge-renormalize:
merge-recursive --renormalize
rerere: never renormalize
rerere: migrate to parse-options API
t4200 (rerere): modernize style
ll-merge: let caller decide whether to renormalize
ll-merge: make flag easier to populate
Documentation/technical: document ll_merge
merge-trees: let caller decide whether to renormalize
merge-trees: push choice to renormalize away from low level
t6038 (merge.renormalize): check that it can be turned off
t6038 (merge.renormalize): try checkout -m and cherry-pick
t6038 (merge.renormalize): style nitpicks
Don't expand CRLFs when normalizing text during merge
Try normalizing files to avoid delete/modify conflicts when merging
Avoid conflicts when merging branches with mixed normalization
Conflicts:
builtin/rerere.c
t/t4200-rerere.sh
* jn/paginate-fix:
t7006 (pager): add missing TTY prerequisites
merge-file: run setup_git_directory_gently() sooner
var: run setup_git_directory_gently() sooner
ls-remote: run setup_git_directory_gently() sooner
index-pack: run setup_git_directory_gently() sooner
config: run setup_git_directory_gently() sooner
bundle: run setup_git_directory_gently() sooner
apply: run setup_git_directory_gently() sooner
grep: run setup_git_directory_gently() sooner
shortlog: run setup_git_directory_gently() sooner
git wrapper: allow setup_git_directory_gently() be called earlier
setup: remember whether repository was found
git wrapper: introduce startup_info struct
Conflicts:
builtin/index-pack.c
Setting this option has the same effect as setting the environment variable
'GIT_ASKPASS'.
Signed-off-by: Knut Franke <k.franke@science-computing.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like $GIT_CONFIG, $GIT_CONFIG_PARAMETERS needs to be suppressed by
"git push" and its cousins when running local transport helpers to
imitate remote transport well.
Noticed-by: Jeff King <peff@peff.net>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git uses the "-c foo=bar" parameters to set a config
variable for a single git invocation. We currently do this
by making a list in the current process and consulting that
list in git_config.
This works fine for built-ins, but the config changes are
silently ignored by subprocesses, including dashed externals
and invocations to "git config" from shell scripts.
This patch instead puts them in an environment variable
which we consult when looking at config (both internally and
via calls "git config").
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* nd/fix-sparse-checkout:
unpack-trees: mark new entries skip-worktree appropriately
unpack-trees: do not check for conflict entries too early
unpack-trees: let read-tree -u remove index entries outside sparse area
unpack-trees: only clear CE_UPDATE|CE_REMOVE when skip-worktree is always set
t1011 (sparse checkout): style nitpicks
* hv/submodule-find-ff-merge:
Implement automatic fast-forward merge for submodules
setup_revisions(): Allow walking history in a submodule
Teach ref iteration module about submodules
Conflicts:
submodule.c
This allows the caller to add its own error message to that returned
by split_cmdline. Thus error output following a failed split_cmdline
can be of the form
fatal: Bad alias.test string: cmdline ends with \
rather than
error: cmdline ends with \
fatal: Bad alias.test string
Signed-off-by: Greg Brockman <gdb@mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As v1.7.2~16^2 (git --paginate: paginate external commands
again, 2010-07-14) explains, builtins (like git config) that
do not use RUN_SETUP are not finding GIT_DIR set correctly when
it is time to launch the pager from run_builtin(). If they
were to search for a repository sooner, then the outcome of such
early repository accesses would be more predictable and reliable.
The cmd_*() functions learn whether a repository was found through the
*nongit_ok return value from setup_git_directory_gently(). If
run_builtin() is to take care of the repository search itself, that
datum needs to be retrievable from somewhere else. Use the
startup_info struct for this.
As a bonus, this information becomes available to functions such as
git_config() which might want to avoid trying to access a repository
when none is present.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The startup_info struct will collect information managed by the git
setup code, such as the prefix for relative paths passed on the
command line (i.e., path to the starting cwd from the toplevel of
the work tree) and whether a git repository has been found.
In other words, startup_info is intended to be a collection of global
variables with results that were previously returned from setup
functions. This state is global anyway (since the cwd is), even
if it is not currently tracked that way. Letting these values persist
means there is more flexibility in deciding when to run setup.
For now, the struct is empty.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To avoid touching the worktree outside a sparse checkout,
when the update flag is enabled unpack_trees() clears the
CE_UPDATE and CE_REMOVE flags on entries that do not match the
sparse pattern before actually committing any updates to the
index file or worktree.
The effect on the index was unintentional; sparse checkout was
never meant to prevent index updates outside the area checked
out. And the result is very confusing: for example, after a
failed merge, currently "git reset --hard" does not reset the
state completely but an additional "git reset --mixed" will.
So stop clearing the CE_REMOVE flag. Instead, maintain a
CE_WT_REMOVE flag to separately track whether a particular
file removal should apply to the worktree in addition to the
index or not.
The CE_WT_REMOVE flag is used already to mark files that
should be removed because of a narrowing checkout area. That
usage will still apply; do not clear the CE_WT_REMOVE flag
in that case (detectable because the CE_REMOVE flag is not
set).
This bug masked some other bugs illustrated by the test
suite, which will be addressed by later patches.
Reported-by: Frédéric Brière <fbriere@fbriere.net>
Fixes: http://bugs.debian.org/583699
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git merge-recursive" a --renormalize option to enable the
merge.renormalize configuration. The --no-renormalize option can
be used to override it in the negative.
So in the future, you might be able to, e.g.:
git checkout -m -Xrenormalize otherbranch
or
git revert -Xrenormalize otherpatch
or
git pull --rebase -Xrenormalize
The bad part: merge.renormalize is still not honored for most
commands. And it reveals lots of places that -X has not been plumbed
in (so we get "git merge -Xrenormalize" but not much else).
NEEDSWORK: tests
Cc: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
approxidate() is not appropriate for reading machine-written dates
because it guesses instead of erroring out on malformed dates.
parse_date() is less convenient since it returns its output as a
string. So export the underlying function that writes a timestamp.
While at it, change the return value to match the usual convention:
return 0 for success and -1 for failure.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We will use this in a later patch to extend setup_revisions() to
load revisions directly from a submodule.
Signed-off-by: Heiko Voigt <hvoigt@hvoigt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, merging across changes in line ending normalization is
painful since files containing CRLF will conflict with normalized files,
even if the only difference between the two versions is the line
endings. Additionally, any "real" merge conflicts that exist are
obscured because every line in the file has a conflict.
Assume you start out with a repo that has a lot of text files with CRLF
checked in (A):
o---C
/ \
A---B---D
B: Add "* text=auto" to .gitattributes and normalize all files to
LF-only
C: Modify some of the text files
D: Try to merge C
You will get a ridiculous number of LF/CRLF conflicts when trying to
merge C into D, since the repository contents for C are "wrong" wrt the
new .gitattributes file.
Fix ll-merge so that the "base", "theirs" and "ours" stages are passed
through convert_to_worktree() and convert_to_git() before a three-way
merge. This ensures that all three stages are normalized in the same
way, removing from consideration differences that are only due to
normalization.
This feature is optional for now since it changes a low-level mechanism
and is not necessary for the majority of users. The "merge.renormalize"
config variable enables it.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cp/textconv-cat-file:
git-cat-file.txt: Document --textconv
t/t8007: test textconv support for cat-file
textconv: support for cat_file
sha1_name: add get_sha1_with_context()
* gv/portable:
test-lib: use DIFF definition from GIT-BUILD-OPTIONS
build: propagate $DIFF to scripts
Makefile: Tru64 portability fix
Makefile: HP-UX 10.20 portability fixes
Makefile: HPUX11 portability fixes
Makefile: SunOS 5.6 portability fix
inline declaration does not work on AIX
Allow disabling "inline"
Some platforms lack socklen_t type
Make NO_{INET_NTOP,INET_PTON} configured independently
Makefile: some platforms do not have hstrerror anywhere
git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition
test_cmp: do not use "diff -u" on platforms that lack one
fixup: do not unconditionally disable "diff -u"
tests: use "test_cmp", not "diff", when verifying the result
Do not use "diff" found on PATH while building and installing
enums: omit trailing comma for portability
Makefile: -lpthread may still be necessary when libc has only pthread stubs
Rewrite dynamic structure initializations to runtime assignment
Makefile: pass CPPFLAGS through to fllow customization
Conflicts:
Makefile
wt-status.h
Textconv is defined by the diff driver, which is associated with a pathname,
not a blob. This fonction permits to know the context for the sha1 you're
looking for, especially his pathname
Signed-off-by: Clément Poulain <clement.poulain@ensimag.imag.fr>
Signed-off-by: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Signed-off-by: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce a new configuration variable, "core.eol", that allows the user
to set which line endings to use for end-of-line-normalized files in the
working directory. It defaults to "native", which means CRLF on Windows
and LF everywhere else.
Note that "core.autocrlf" overrides core.eol. This means that
[core]
autocrlf = true
puts CRLFs in the working directory even if core.eol is set to "lf".
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this patch at least IBM VisualAge C 5.0 (I have 5.0.2) on AIX
5.1 fails to compile git.
enum style is inconsistent already, with some enums declared on one
line, some over 3 lines with the enum values all on the middle line,
sometimes with 1 enum value per line... and independently of that the
trailing comma is sometimes present and other times absent, often
mixing with/without trailing comma styles in a single file, and
sometimes in consecutive enum declarations.
Clearly, omitting the comma is the more portable style, and this patch
changes all enum declarations to use the portable omitted dangling
comma style consistently.
Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sp/maint-dumb-http-pack-reidx:
http.c::new_http_pack_request: do away with the temp variable filename
http-fetch: Use temporary files for pack-*.idx until verified
http-fetch: Use index-pack rather than verify-pack to check packs
Allow parse_pack_index on temporary files
Extract verify_pack_index for reuse from verify_pack
Introduce close_pack_index to permit replacement
http.c: Remove unnecessary strdup of sha1_to_hex result
http.c: Don't store destination name in request structures
http.c: Drop useless != NULL test in finish_http_pack_request
http.c: Tiny refactoring of finish_http_pack_request
t5550-http-fetch: Use subshell for repository operations
http.c: Remove bad free of static block
* 'ld/discovery-limit-to-fs' (early part):
Rename ONE_FILESYSTEM to DISCOVERY_ACROSS_FILESYSTEM
GIT_ONE_FILESYSTEM: flip the default to stop at filesystem boundaries
Add support for GIT_ONE_FILESYSTEM
truncate cwd string before printing error message
config.c: remove static keyword from git_env_bool()
* ar/config-from-command-line:
Complete prototype of git_config_from_parameters()
Use strbufs instead of open-coded string manipulation
Allow passing of configuration parameters in the command line
Add the missing argument list. (Its lack triggered a compiler warning
for me.)
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the semantics of the "crlf" attribute so that it enables
end-of-line normalization when it is set, regardless of "core.autocrlf".
Add a new setting for "crlf": "auto", which enables end-of-line
conversion but does not override the automatic text file detection.
Add a new attribute "eol" with possible values "crlf" and "lf". When
set, this attribute enables normalization and forces git to use CRLF or
LF line endings in the working directory, respectively.
The line ending style to be used for normalized text files in the
working directory is set using "core.autocrlf". When it is set to
"true", CRLFs are used in the working directory; when set to "input" or
"false", LFs are used.
Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cw/ws-indent-with-tab:
whitespace: tests for git-apply --whitespace=fix with tab-in-indent
whitespace: add tab-in-indent support for --whitespace=fix
whitespace: replumb ws_fix_copy to take a strbuf *dst instead of char *dst
whitespace: tests for git-diff --check with tab-in-indent error class
whitespace: add tab-in-indent error class
whitespace: we cannot "catch all errors known to git" anymore
The easiest way to verify a pack index is to open it through the
standard parse_pack_index function, permitting the header check
to happen when the file is mapped. However, the dumb HTTP client
needs to verify a pack index before its moved into its proper file
name within the objects/pack directory, to prevent a corrupt index
from being made available. So permit the caller to specify the
exact path of the index file.
For now we're still using the final destination name within the
sole call site in http.c, but eventually we will start to parse
the temporary path instead.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By closing the pack index, a caller can later overwrite the index
with an updated index file, possibly after converting from v1 to
the v2 format. Because p->index_data is NULL after close, on the
next access the index will be opened again and the other members
will be updated with new data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To implement --whitespace=fix for tab-in-indent, we have to allow for the
possibility that whitespace can increase in size when it is fixed, expanding
tabs to to multiple spaces in the initial indent.
Signed-off-by: Chris Webb <chris@arachsys.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some projects and languages use coding style where no tab character is used to
indent the lines.
This only adds support and documentation for "apply --whitespace=warn" and
"diff --check"; later patches add "apply --whitespace=fix" and tests.
Signed-off-by: Chris Webb <chris@arachsys.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until now, this has been part of the commit-tree builtin.
However, it is already used by other builtins (like commit,
merge, and notes), and it would be useful to access it from
library code.
The check_valid helper has to come along, too, but is given
a more library-ish name of "assert_sha1_type".
Otherwise, the code is unchanged. There are still a few
rough edges for a library function, like printing the utf8
warning to stderr, but we can address those if and when they
come up as inappropriate.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These should take const buffers as input data, but zlib's
next_in pointer is not const-correct. Let's fix it at the
zlib level, though, so the cast happens in one obvious
place. This should be safe, as a similar cast is used in
zlib's example code for a const array.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cc/cherry-pick-ff:
revert: fix tiny memory leak in cherry-pick --ff
rebase -i: use new --ff cherry-pick option
Documentation: describe new cherry-pick --ff option
cherry-pick: add tests for new --ff option
revert: add --ff option to allow fast forward when cherry-picking
builtin/merge: make checkout_fast_forward() non static
parse-options: add parse_options_concat() to concat options
The values passed this way will override whatever is defined
in the config files.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this function is the preferred way to handle boolean environment
variables it's useful to have it available to other files.
Signed-off-by: Lars R. Damerow <lars@pixar.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/notes-display:
git-notes(1): add a section about the meaning of history
notes: track whether notes_trees were changed at all
notes: add shorthand --ref to override GIT_NOTES_REF
commit --amend: copy notes to the new commit
rebase: support automatic notes copying
notes: implement helpers needed for note copying during rewrite
notes: implement 'git notes copy --stdin'
rebase -i: invoke post-rewrite hook
rebase: invoke post-rewrite hook
commit --amend: invoke post-rewrite hook
Documentation: document post-rewrite hook
Support showing notes from more than one notes tree
test-lib: unset GIT_NOTES_REF to stop it from influencing tests
Conflicts:
git-am.sh
refs.c
Implement helper functions to load the rewriting config, and to
actually copy the notes. Also document the config.
Secondly, also implement an undocumented --for-rewrite=<cmd> option to
'git notes copy' which is used like --stdin, but also puts the
configuration for <cmd> into effect. It will be needed to support the
copying in git-rebase.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With this patch, you can set notes.displayRef to a glob that points at
your favourite notes refs, e.g.,
[notes]
displayRef = refs/notes/*
Then git-log and friends will show notes from all trees.
Thanks to Junio C Hamano for lots of feedback, which greatly
influenced the design of the entire series and this commit in
particular.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Acked-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* gb/maint-submodule-env:
is_submodule_modified(): clear environment properly
submodules: ensure clean environment when operating in a submodule
shell setup: clear_local_git_env() function
rev-parse: --local-env-vars option
Refactor list of of repo-local env vars
* mm/mkstemps-mode-for-packfiles:
Use git_mkstemp_mode instead of plain mkstemp to create object files
git_mkstemps_mode: don't set errno to EINVAL on exit.
Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
Move gitmkstemps to path.c
Add a testcase for ACL with restrictive umask.
* gb/maint-submodule-env:
is_submodule_modified(): clear environment properly
submodules: ensure clean environment when operating in a submodule
shell setup: clear_local_git_env() function
rev-parse: --local-env-vars option
Refactor list of of repo-local env vars
* nd/root-git:
Add test for using Git at root of file system
Support working directory located at root
Move offset_1st_component() to path.c
init-db, rev-parse --git-dir: do not append redundant slash
make_absolute_path(): Do not append redundant slash
Conflicts:
setup.c
sha1_file.c
* mm/mkstemps-mode-for-packfiles:
Use git_mkstemp_mode instead of plain mkstemp to create object files
git_mkstemps_mode: don't set errno to EINVAL on exit.
Use git_mkstemp_mode and xmkstemp_mode in odb_mkstemp, not chmod later.
git_mkstemp_mode, xmkstemp_mode: variants of gitmkstemps with mode argument.
Move gitmkstemps to path.c
Add a testcase for ACL with restrictive umask.
git tries to read a password from the terminal in imap-send and
when talking to a http server that requires authentication.
When a GUI is driving git, however, the end user is not paying
attention to the terminal (there may not even be a terminal).
GUI would appear to hang forever.
Fix this problem by allowing a password-retrieving command
to be specified in GIT_ASKPASS
Signed-off-by: Frank Li <lznuaa@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the list of GIT_* environment variables that are local to a
repository into a static list in environment.c, as it is also
useful elsewhere. Also add the missing GIT_CONFIG variable to the
list.
Make it easy to use the list both by NULL-termination and by size;
the latter (excluding the terminating NULL) is stored in the
local_repo_env_size define.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
gitmkstemps emulates the behavior of mkstemps, which is usually used
to create files in a shared directory like /tmp/, hence, it creates
files with permission 0600.
Add git_mkstemps_mode() that allows us to specify the desired mode, and
make git_mkstemps() a wrapper that always uses 0600 to call it. Later we
will use git_mkstemps_mode() when creating pack files.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some configuration variables can take boolean values in addition to
enumeration specific to them. Introduce git_config_maybe_bool() that
returns 0 or 1 if the given value is boolean, or -1 if not, so that
a parser for such a variable can check for boolean first and then
parse other kinds of values as a fallback.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The implementation is also lightly modified to use is_dir_sep()
instead of hardcoding '/'.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Scripted commands that want to use git’s configured pager know better
than ‘git var’ does whether stdout is going to be a tty at the
appropriate time. Checking isatty(1) as git_pager() does now won’t
cut it, since the output of git var itself is almost never a terminal.
The symptom is that when used by humans, ‘git var GIT_PAGER’ behaves
as it should, but when used by scripts, it always returns ‘cat’!
So avoid tricks with isatty() and just always print the configured
pager.
This does not fix the callers to check isatty(1) themselves yet.
Nevertheless, this patch alone is enough to fix 'am --interactive'.
Thanks to Sebastian Celis for the report and Jeff King for the
analysis.
Reported-by: Sebastian Celis <sebastian@sebastiancelis.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-reflog-bad-timestamp:
t0101: use a fixed timestamp when searching in the reflog
Update @{bogus.timestamp} fix not to die()
approxidate_careful() reports errorneous date string
* jc/maint-reflog-bad-timestamp:
t0101: use a fixed timestamp when searching in the reflog
Update @{bogus.timestamp} fix not to die()
approxidate_careful() reports errorneous date string
For a long time, the time based reflog syntax (e.g. master@{yesterday})
didn't complain when the "human readable" timestamp was misspelled, as
the underlying mechanism tried to be as lenient as possible. The funny
thing was that parsing of "@{now}" even relied on the fact that anything
not recognized by the machinery returned the current timestamp.
Introduce approxidate_careful() that takes an optional pointer to an
integer, that gets assigned 1 when the input does not make sense as a
timestamp.
As I am too lazy to fix all the callers that use approxidate(), most of
the callers do not take advantage of the error checking, but convert the
code to parse reflog to use it as a demonstration.
Tests are mostly from Jeff King.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fix-tree-walk:
read-tree --debug-unpack
unpack-trees.c: look ahead in the index
unpack-trees.c: prepare for looking ahead in the index
Aggressive three-way merge: fix D/F case
traverse_trees(): handle D/F conflict case sanely
more D/F conflict tests
tests: move convenience regexp to match object names to test-lib.sh
Conflicts:
builtin-read-tree.c
unpack-trees.c
unpack-trees.h
* ap/merge-backend-opts:
Document that merge strategies can now take their own options
Extend merge-subtree tests to test -Xsubtree=dir.
Make "subtree" part more orthogonal to the rest of merge-recursive.
pull: Fix parsing of -X<option>
Teach git-pull to pass -X<option> to git-merge
git merge -X<option>
git-merge-file --ours, --theirs
Conflicts:
git-compat-util.h
* jc/cache-unmerge:
rerere forget path: forget recorded resolution
rerere: refactor rerere logic to make it independent from I/O
rerere: remove silly 1024-byte line limit
resolve-undo: teach "update-index --unresolve" to use resolve-undo info
resolve-undo: "checkout -m path" uses resolve-undo information
resolve-undo: allow plumbing to clear the information
resolve-undo: basic tests
resolve-undo: record resolved conflicts in a new index extension section
builtin-merge.c: use standard active_cache macros
Conflicts:
builtin-ls-files.c
builtin-merge.c
builtin-rerere.c
* jc/ident:
ident.c: replace fprintf with fputs to suppress compiler warning
user_ident_sufficiently_given(): refactor the logic to be usable from elsewhere
ident.c: treat $EMAIL as giving user.email identity explicitly
ident.c: check explicit identity for name and email separately
ident.c: remove unused variables
* jc/symbol-static:
date.c: mark file-local function static
Replace parse_blob() with an explanatory comment
symlinks.c: remove unused functions
object.c: remove unused functions
strbuf.c: remove unused function
sha1_file.c: remove unused function
mailmap.c: remove unused function
utf8.c: mark file-local function static
submodule.c: mark file-local function static
quote.c: mark file-local function static
remote-curl.c: mark file-local function static
read-cache.c: mark file-local functions static
parse-options.c: mark file-local function static
entry.c: mark file-local function static
http.c: mark file-local functions static
pretty.c: mark file-local function static
builtin-rev-list.c: mark file-local function static
bisect.c: mark file-local function static
Add --set-upstream option to branch that works like --track, except that
when branch exists already, its upstream info is changed without changing
the ref value.
Based-on-patch-from: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Ilari Liusvaara <ilari.liusvaara@elisanet.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes "subtree" more orthogonal to the rest of recursive merge, so
that you can use subtree and ours/theirs features at the same time. For
example, you can now say:
git merge -s subtree -Xtheirs other
to merge with "other" branch while shifting it up or down to match the
shape of the tree of the current branch, and resolving conflicts favoring
the changes "other" branch made over changes made in the current branch.
It also allows the prefix used to shift the trees to be specified using
the "-Xsubtree=$prefix" option. Giving an empty prefix tells the command
to figure out how much to shift trees automatically as we have always
done. "merge -s subtree" is the same as "merge -s recursive -Xsubtree="
(or "merge -s recursive -Xsubtree").
Based on an old patch done back in the days when git-merge was a script;
Avery ported the script part to builtin-merge.c. Bugs in shift_tree()
is mine.
Signed-off-by: Avery Pennarun <apenwarr@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/checkout-merge-base:
rebase -i: teach --onto A...B syntax
rebase: fix --onto A...B parsing and add tests
"rebase --onto A...B" replays history on the merge base between A and B
"checkout A...B" switches to the merge base between A and B
* cc/reset-more:
t7111: check that reset options work as described in the tables
Documentation: reset: add some missing tables
Fix bit assignment for CE_CONFLICTED
"reset --merge": fix unmerged case
reset: use "unpack_trees()" directly instead of "git read-tree"
reset: add a few tests for "git reset --merge"
Documentation: reset: add some tables to describe the different options
reset: improve mixed reset error message when in a bare repo
* nd/sparse: (25 commits)
t7002: test for not using external grep on skip-worktree paths
t7002: set test prerequisite "external-grep" if supported
grep: do not do external grep on skip-worktree entries
commit: correctly respect skip-worktree bit
ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
tests: rename duplicate t1009
sparse checkout: inhibit empty worktree
Add tests for sparse checkout
read-tree: add --no-sparse-checkout to disable sparse checkout support
unpack-trees(): ignore worktree check outside checkout area
unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
unpack-trees.c: generalize verify_* functions
unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
Introduce "sparse checkout"
dir.c: export excluded_1() and add_excludes_from_file_1()
excluded_1(): support exclude files in index
unpack-trees(): carry skip-worktree bit over in merged_entry()
Read .gitignore from index if it is skip-worktree
Avoid writing to buffer in add_excludes_from_file_1()
...
Conflicts:
.gitignore
Documentation/config.txt
Documentation/git-update-index.txt
Makefile
entry.c
t/t7002-grep.sh
bb1ae3f (commit: Show committer if automatic, 2008-05-04) added a logic to
check both name and email were given explicitly by the end user, but it
assumed that fmt_ident() is never called before git_default_user_config()
is called, which was fragile. The former calls setup_ident() and fills
the "default" name and email, so the check in the config parser would have
mistakenly said both are given even if only user.name was provided.
Make the logic more robust by keeping track of name and email separately.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Santi Béjar <santi@agolina.net>
This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.
A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:
O A B
t t
t-i t-i t-i
t-j t-j
t/1
t/2
The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead. After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".
The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed. When this happens, the pointer o->pos still points at
"t-i" that is the first entry. We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".
While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed. o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 9e8ecea (Add 'merge' mode to 'git reset', 2008-12-01) disallowed
"git reset --merge" when there was unmerged entries. But it wished if
unmerged entries were reset as if --hard (instead of --merge) has been
used. This makes sense because all "mergy" operations makes sure that
any path involved in the merge does not have local modifications before
starting, so resetting such a path away won't lose any information.
The previous commit changed the behavior of --merge to accept resetting
unmerged entries if they are reset to a different state than HEAD, but it
did not reset the changes in the work tree, leaving the conflict markers
in the resulting file in the work tree.
Fix it by doing three things:
- Update the documentation to match the wish of original "reset --merge"
better, namely, "An unmerged entry is a sign that the path didn't have
any local modification and can be safely resetted to whatever the new
HEAD records";
- Update read_index_unmerged(), which reads the index file into the cache
while dropping any higher-stage entries down to stage #0, not to copy
the object name from the higher stage entry. The code used to take the
object name from the a stage entry ("base" if you happened to have
stage #1, or "ours" if both sides added, etc.), which essentially meant
that you are getting random results depending on what the merge did.
The _only_ reason we want to keep a previously unmerged entry in the
index at stage #0 is so that we don't forget the fact that we have
corresponding file in the work tree in order to be able to remove it
when the tree we are resetting to does not have the path. In order to
differentiate such an entry from ordinary cache entry, the cache entry
added by read_index_unmerged() is marked as CE_CONFLICTED.
- Update merged_entry() and deleted_entry() so that they pay attention to
cache entries marked as CE_CONFLICTED. They are previously unmerged
entries, and the files in the work tree that correspond to them are
resetted away by oneway_merge() to the version from the tree we are
resetting to.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The update-index plumbing command had a hacky --unresolve implementation
that was written back in the days when merge was the only way for users to
end up with higher stages in the index, and assumed that stage #2 must
have come from HEAD, stage #3 from MERGE_HEAD and didn't bother to compute
the stage #1 information.
There were several issues with this approach:
- These days, merge is not the only command, and conflicts coming from
commands like cherry-pick, "am -3", etc. cannot be recreated by looking
at MERGE_HEAD;
- For a conflict that came from a merge that had renames, picking up the
same path from MERGE_HEAD and HEAD wouldn't help recreating it, either;
- It may have been Ok not to recreate stage #1 back when it was written,
because "diff --ours/--theirs" were the only availble ways to review
conflicts and they don't need stage #1 information. "diff --cc" that
was invented much later is a lot more useful way but it needs stage #1.
We can use resolve-undo information recorded in the index extension to
solve all of these issues.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Once you resolved conflicts by "git add path", you cannot recreate the
conflicted state with "git checkout -m path", because you lost information
from higher stages in the index when you resolved them.
Since we record the necessary information in the resolve-undo index
extension these days, we can reproduce the unmerged state in the index and
check it out.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When resolving a conflict using "git add" to create a stage #0 entry, or
"git rm" to remove entries at higher stages, remove_index_entry_at()
function is eventually called to remove unmerged (i.e. higher stage)
entries from the index. Introduce a "resolve_undo_info" structure and
keep track of the removed cache entries, and save it in a new index
extension section in the index_state.
Operations like "read-tree -m", "merge", "checkout [-m] <branch>" and
"reset" are signs that recorded information in the index is no longer
necessary. The data is removed from the index extension when operations
start; they may leave conflicted entries in the index, and later user
actions like "git add" will record their conflicted states afresh.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and
skip-worktree bits. While the two bits have similar behaviour, sharing
this flag means "git update-index --really-refresh" will ignore
skip-worktree while it should not. Instead another flag is
introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only
applies to valid bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous error message was the same in many situations (unknown
revision or path not in the working tree). We try to help the user as
much as possible to understand the error, especially with the
sha1:filename notation. In this case, we say whether the sha1 or the
filename is problematic, and diagnose the confusion between
relative-to-root and relative-to-$PWD confusion precisely.
The 7 new error messages are tested.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mm/config-pathname-tilde-expand:
Documentation: avoid xmlto input error
expand_user_path: expand ~ to $HOME, not to the actual homedir.
Expand ~ and ~user in core.excludesfile, commit.template
* cc/replace:
Documentation: talk a little bit about GIT_NO_REPLACE_OBJECTS
Documentation: fix typos and spelling in replace documentation
replace: use a GIT_NO_REPLACE_OBJECTS env variable
* mm/config-pathname-tilde-expand:
Documentation: avoid xmlto input error
expand_user_path: expand ~ to $HOME, not to the actual homedir.
Expand ~ and ~user in core.excludesfile, commit.template
* 'jh/notes' (early part):
Add selftests verifying concatenation of multiple notes for the same commit
Refactor notes code to concatenate multiple notes annotating the same object
Add selftests verifying that we can parse notes trees with various fanouts
Teach the notes lookup code to parse notes trees with various fanout schemes
Teach notes code to free its internal data structures on request
Add '%N'-format for pretty-printing commit notes
Add flags to get_commit_notes() to control the format of the note string
t3302-notes-index-expensive: Speed up create_repo()
fast-import: Add support for importing commit notes
Teach "-m <msg>" and "-F <file>" to "git notes edit"
Add an expensive test for git-notes
Speed up git notes lookup
Add a script to edit/inspect notes
Introduce commit notes
Conflicts:
.gitignore
Documentation/pretty-formats.txt
pretty.c
* sp/smart-http: (37 commits)
http-backend: Let gcc check the format of more printf-type functions.
http-backend: Fix access beyond end of string.
http-backend: Fix bad treatment of uintmax_t in Content-Length
t5551-http-fetch: Work around broken Accept header in libcurl
t5551-http-fetch: Work around some libcurl versions
http-backend: Protect GIT_PROJECT_ROOT from /../ requests
Git-aware CGI to provide dumb HTTP transport
http-backend: Test configuration options
http-backend: Use http.getanyfile to disable dumb HTTP serving
test smart http fetch and push
http tests: use /dumb/ URL prefix
set httpd port before sourcing lib-httpd
t5540-http-push: remove redundant fetches
Smart HTTP fetch: gzip requests
Smart fetch over HTTP: client side
Smart push over HTTP: client side
Discover refs via smart HTTP server when available
http-backend: more explict LocationMatch
http-backend: add example for gitweb on same URL
http-backend: use mod_alias instead of mod_rewrite
...
Conflicts:
.gitignore
remote-curl.c
* jn/editor-pager:
Provide a build time default-pager setting
Provide a build time default-editor setting
am -i, git-svn: use "git var GIT_PAGER"
add -i, send-email, svn, p4, etc: use "git var GIT_EDITOR"
Teach git var about GIT_PAGER
Teach git var about GIT_EDITOR
Suppress warnings from "git var -l"
Do not use VISUAL editor on dumb terminals
Handle more shell metacharacters in editor names
This has the same effect as --no-replace-objects option; git ignores the
replace refs. When --no-replace-objects option is passed to git, this
environment variable is set to "1" and exported to subprocesses in order
to propagate the same setting.
It is useful for example for scripts, as the git commands used in them can
now be aware that they must not read replace refs.
Tested-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These config variables are parsed to substitute ~ and ~user with getpw
entries.
user_path() refactored into new function expand_user_path(), to allow
dynamically allocating the return buffer.
Original patch by Karl Chen, modified by Matthieu Moy, and further
amended by Junio C Hamano.
Signed-off-by: Karl Chen <quarl@quarl.org>
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-blank-at-eof:
diff -B: colour whitespace errors
diff.c: emit_add_line() takes only the rest of the line
diff.c: split emit_line() from the first char and the rest of the line
diff.c: shuffling code around
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
We already have these checks in many printf-type functions that have
prototypes which are in header files. Add these same checks to some
more prototypes in header functions and to static functions in .c
files.
cc: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Tarmigan Casebolt <tarmigan+git@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the command found by setup_pager() for scripts to use.
Scripts can use this to avoid repeating the logic to look for a
proper pager in each command.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Expose the command used by launch_editor() for scripts to use.
This should allow one to avoid searching for a proper editor
separately in each command.
git_editor(void) uses the logic to decide which editor to use
that used to live in launch_editor(). The function returns NULL
if there is no suitable editor; the caller is expected to issue
an error message when appropriate.
launch_editor() uses git_editor() and gives the error message the
same way as before when EDITOR is not set.
"git var GIT_EDITOR" gives the editor name, or an error message
when there is no appropriate one.
"git var -l" gives GIT_EDITOR=name only if there is an
appropriate editor.
Originally-submitted-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Eons ago HPA taught git-daemon how to protect itself from /../
attacks, which Junio brought back into service in d79374c7b5
("daemon.c and path.enter_repo(): revamp path validation").
I did not carry this into git-http-backend as originally we relied
only upon PATH_TRANSLATED, and assumed the HTTP server had done
its access control checks to validate the resolved path was within
a directory permitting access from the remote client. This would
usually be sufficient to protect a server from requests for its
/etc/passwd file by http://host/smart/../etc/passwd sorts of URLs.
However in 917adc0360 Mark Lodato added GIT_PROJECT_ROOT as an
additional method of configuring the CGI. When this environment
variable is used the web server does not generate the final access
path and therefore may blindly pass through "/../etc/passwd"
in PATH_INFO under the assumption that "/../" might have special
meaning to the invoked CGI.
Instead of permitting these sorts of malformed path requests, we
now reject them back at the client, with an error message for the
server log. This matches git-daemon behavior.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 41cb7488 Linus moved this function to connect.c for reuse inside
of the git-clone-pack command. That was 2005, but in 2006 Junio
retired git-clone-pack in commit efc7fa53. Since then the only
caller has been fetch-pack. Since this ACK/NAK exchange is only
used by the fetch-pack/upload-pack protocol we should move it back
to be a private detail of fetch-pack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
This patch has been improved by the following contributions:
- Thomas Rast: fix core.notesRef documentation
- Tor Arne Vestbø: fix printing of multi-line notes
- Alex Riesen: Using char array instead of char pointer costs less BSS
- Johan Herland: Plug leak when msg is good, but msglen or type causes return
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Tor Arne Vestbø <tavestbo@trolltech.com>
Signed-off-by: Johan Herland <johan@herland.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
get_commit_notes(): Plug memory leak when 'if' triggers, but not because of read_sha1_file() failure
When flipping commits around on topic branches, I often end up doing
this sequence:
* Run "log --oneline next..jc/frotz" to find out the first commit
on 'jc/frotz' branch not yet merged to 'next';
* Run "checkout $that_commit^" to detach HEAD to the parent of it;
* Rebuild the series on top of that commit; and
* "show-branch jc/frotz HEAD" and "diff jc/frotz HEAD" to verify.
Introduce a new syntax to "git checkout" to name the commit to switch to,
to make the first two steps easier. When the branch to switch to is
specified as A...B (you can omit either A or B but not both, and HEAD
is used instead of the omitted side), the merge base between these two
commits are computed, and if there is one unique one, we detach the HEAD
at that commit.
With this, I can say "checkout next...jc/frotz".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make it possible to invole the logic of verify_filename() to make sure the
pathname arguments are unambiguous without actually dying. The caller may
want to do something different.
* jc/maint-blank-at-eof:
diff -B: colour whitespace errors
diff.c: emit_add_line() takes only the rest of the line
diff.c: split emit_line() from the first char and the rest of the line
diff.c: shuffling code around
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
Previously the old error message just told the user that it was not
possible to delete the ref from the packed-refs file. Give instructions
on how to resolve the problem.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* 'jc/maint-1.6.0-blank-at-eof' (early part):
diff --whitespace: fix blank lines at end
core.whitespace: split trailing-space into blank-at-{eol,eof}
diff --color: color blank-at-eof
diff --whitespace=warn/error: fix blank-at-eof check
diff --whitespace=warn/error: obey blank-at-eof
diff.c: the builtin_diff() deals with only two-file comparison
apply --whitespace: warn blank but not necessarily empty lines at EOF
apply --whitespace=warn/error: diagnose blank at EOF
apply.c: split check_whitespace() into two
apply --whitespace=fix: detect new blank lines at eof correctly
apply --whitespace=fix: fix handling of blank lines at the eof
A local clone without hardlinks copies all objects, including dangling
ones, to the new repository. Since the mtimes are renewed, those
dangling objects cannot be pruned by "git gc --prune", even if they
would have been old enough for pruning in the original repository.
Instead, preserve mtime during copy. "git gc --prune" will then work
in the clone just like it did in the original.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In 2d14d65 (Use a clearer style to issue commands to remote helpers,
2009-09-03) I happened to notice two changes like this:
- write_in_full(helper->in, "list\n", 5);
+
+ strbuf_addstr(&buf, "list\n");
+ write_in_full(helper->in, buf.buf, buf.len);
+ strbuf_reset(&buf);
IMHO, it would be better to define a new function,
static inline ssize_t write_str_in_full(int fd, const char *str)
{
return write_in_full(fd, str, strlen(str));
}
and then use it like this:
- strbuf_addstr(&buf, "list\n");
- write_in_full(helper->in, buf.buf, buf.len);
- strbuf_reset(&buf);
+ write_str_in_full(helper->in, "list\n");
Thus not requiring the added allocation, and still avoiding
the maintenance risk of literal string lengths.
These days, compilers are good enough that strlen("literal")
imposes no run-time cost.
Transformed via this:
perl -pi -e \
's/write_in_full\((.*?), (".*?"), \d+\)/write_str_in_full($1, $2)/'\
$(git grep -l 'write_in_full.*"')
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This message is designed to help new users understand what
has happened when refs fail to push. However, it does not
help experienced users at all, and significantly clutters
the output, frequently dwarfing the regular status table and
making it harder to see.
This patch introduces a general configuration mechanism for
optional messages, with this push message as the first
example.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
People who configured trailing-space depended on it to catch both extra
white space at the end of line, and extra blank lines at the end of file.
Earlier attempt to introduce only blank-at-eof gave them an escape hatch
to keep the old behaviour, but it is a regression until they explicitly
specify the new error class.
This introduces a blank-at-eol that only catches extra white space at the
end of line, and makes the traditional trailing-space a convenient synonym
to catch both blank-at-eol and blank-at-eof. This way, people who used
trailing-space continue to catch both classes of errors.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" strips new blank lines at EOF under --whitespace=fix option,
but neigher --whitespace=warn nor --whitespace=error paid any attention to
these errors.
Introduce a new whitespace error class, blank-at-eof, to make the
whitespace error handling more consistent.
The patch adds a new "linenr" field to the struct fragment in order to
record which line the hunk started in the input file, but this is needed
solely for reporting purposes. The detection of this class of whitespace
errors cannot be done while parsing a patch like we do for all the other
classes of whitespace errors. It instead has to wait until we find where
to apply the hunk, but at that point, we do not have an access to the
original line number in the input file anymore, hence the new field.
Depending on your point of view, this may be a bugfix that makes warn and
error in line with fix. Or you could call it a new feature. The line
between them is somewhat fuzzy in this case.
Strictly speaking, triggering more errors than before is a change in
behaviour that is not backward compatible, even though the reason for the
change is because the code was not checking for an error that it should
have. People who do not want added blank lines at EOF to trigger an error
can disable the new error class.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/approxidate:
fix approxidate parsing of relative months and years
tests: add date printing and parsing tests
refactor test-date interface
Add date formatting and parsing functions relative to a given time
Further 'approxidate' improvements
Improve on 'approxidate'
Conflicts:
date.c
The main purpose is to allow predictable testing of the code.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch introduces core.sparseCheckout, which will control whether
sparse checkout support is enabled in unpack_trees()
It also loads sparse-checkout file that will be used in the next patch.
I split it out so the next patch will be shorter, easier to read.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
CE_REMOVE now removes both worktree and index versions. Sparse
checkout must be able to remove worktree version while keep the
index intact when checkout area is narrowed.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Detail about this bit is in Documentation/git-update-index.txt.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git reset without argument displays a summary of the local modification,
like this:
$ git reset
Makefile: locally modified
Some people have problems with this; they look like an error message.
This patch makes its output mimic how "git checkout $another_branch"
reports the paths with local modifications. "git add --refresh --verbose"
is changed in the same way.
It also adds a header to make it clear that the output is informative,
and not an error.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
The change in the output is going to become more general than just saying
"changed", so let's make the variable name more general too.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* cc/replace:
t6050: check pushing something based on a replaced commit
Documentation: add documentation for "git replace"
Add git-replace to .gitignore
builtin-replace: use "usage_msg_opt" to give better error messages
parse-options: add new function "usage_msg_opt"
builtin-replace: teach "git replace" to actually replace
Add new "git replace" command
environment: add global variable to disable replacement
mktag: call "check_sha1_signature" with the replacement sha1
replace_object: add a test case
object: call "check_sha1_signature" with the replacement sha1
sha1_file: add a "read_sha1_file_repl" function
replace_object: add mechanism to replace objects found in "refs/replace/"
refs: add a "for_each_replace_ref" function
Merlyn noticed that Documentation/install-doc-quick.sh no longer correctly
removes old installed documents when the target directory has a leading
path that is a symlink. It turns out that "checkout-index --prefix" was
broken by recent b6986d8 (git-checkout: be careful about untracked
symlinks, 2009-07-29).
I suspect has_symlink_leading_path() could learn the third parameter
(prefix that is allowed to be symlinked directories) to allow us to retire
a similar function has_dirs_only_path().
Another avenue of fixing this I considered was to get rid of base_dir and
base_dir_len from "struct checkout", and instead make "git checkout-index"
when run with --prefix mkdir the leading path and chdir in there. It
might be the best longer term solution to this issue, as the base_dir
feature is used only by that rather obscure codepath as far as I know.
But at least this patch should fix this breakage.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce --ignore-whitespace option and corresponding config bool to
ignore whitespace differences while applying patches, akin to the
'patch' program.
'git am', 'git rebase' and the bash git completion are made aware of
this option.
Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the case where an untracked symlink that points at a directory
with tracked paths confuses the checkout logic, demostrated in t6035.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/maint-graft-unhide-true-parents:
git repack: keep commits hidden by a graft
Add a test showing that 'git repack' throws away grafted-away parents
Conflicts:
git-repack.sh
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.
So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.
As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If the message said "we will be changing the default in the future, so
this is to warn people who want to keep the current default what to do",
it would have made some sense, but as it stands, the message is merely an
unsolicited advertisement for a new feature, which it is not helpful at
all. Squelch it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The threaded index preloading will want it, so that it can avoid
locking by simply using a per-thread symlink/directory cache.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Naturally, prep_temp_blob() did not care about filenames.
As a result, GIT_EXTERNAL_DIFF and textconv generated
filenames such as ".diff_XXXXXX".
This modifies prep_temp_blob() to generate user-friendly
filenames when creating temporary files.
Diffing "name.ext" now generates "XXXXXX_name.ext".
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new function will replace "read_sha1_file". This latter function
becoming just a stub to call the former will a NULL "replacement"
argument.
This new function is needed because sometimes we need to use the
replacement sha1.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Essentially; s/type* /type */ as per the coding guidelines.
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"Unreliable hardlinks" is a misleading description for what is happening.
So rename it to something less misleading.
Suggested by Linus Torvalds.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It seems that accessing NTFS partitions with ufsd (at least on my EeePC)
has an unnerving bug: if you link() a file and unlink() it right away,
the target of the link() will have the correct size, but consist of NULs.
It seems as if the calls are simply not serialized correctly, as single-stepping
through the function move_temp_to_file() works flawlessly.
As ufsd is "Commertial software" (sic!), I cannot fix it, and have to work
around it in Git.
At the same time, it seems that this fixes msysGit issues 222 and 229 to
assume that Windows cannot handle link() && unlink().
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/name-branch:
Don't permit ref/branch names to end with ".lock"
check_ref_format(): tighten refname rules
strbuf_check_branch_ref(): a helper to check a refname for a branch
Fix branch -m @{-1} newname
check-ref-format --branch: give Porcelain a way to grok branch shorthand
strbuf_branchname(): a wrapper for branch name shorthands
Rename interpret/substitute nth_last_branch functions
Conflicts:
Documentation/git-check-ref-format.txt
* jc/shared-literally:
t1301: loosen test for forced modes
set_shared_perm(): sometimes we know what the final mode bits should look like
move_temp_to_file(): do not forget to chmod() in "Coda hack" codepath
Move chmod(foo, 0444) into move_temp_to_file()
"core.sharedrepository = 0mode" should set, not loosen
* jc/maint-1.6.0-keep-pack:
pack-objects: don't loosen objects available in alternate or kept packs
t7700: demonstrate repack flaw which may loosen objects unnecessarily
Remove --kept-pack-only option and associated infrastructure
pack-objects: only repack or loosen objects residing in "local" packs
git-repack.sh: don't use --kept-pack-only option to pack-objects
t7700-repack: add two new tests demonstrating repacking flaws
Conflicts:
t/t7700-repack.sh
adjust_shared_perm() first obtains the mode bits from lstat(2), expecting
to find what the result of applying user's umask is, and then tweaks it
as necessary. When the file to be adjusted is created with mkstemp(3),
however, the mode thusly obtained does not have anything to do with user's
umask, and we would need to start from 0444 in such a case and there is no
point running lstat(2) for such a path.
This introduces a new API set_shared_perm() to bypass the lstat(2) and
instead force setting the mode bits to the desired value directly.
adjust_shared_perm() becomes a thin wrapper to the function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* fg/push-default:
builtin-push.c: Fix typo: "anythig" -> "anything"
Display warning for default git push with no push.default config
New config push.default to decide default behavior for push
Conflicts:
Documentation/config.txt
These allow you to say "git checkout @{-2}" to switch to the branch two
"branch switching" ago by pretending as if you typed the name of that
branch. As it is likely that we will be introducing more short-hands to
write the name of a branch without writing it explicitly, rename the
functions from "nth_last_branch" to more generic "branch_name", to prepare
for different semantics.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack. It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/remote-improvements: (23 commits)
builtin-remote.c: no "commented out" code, please
builtin-remote: new show output style for push refspecs
builtin-remote: new show output style
remote: make guess_remote_head() use exact HEAD lookup if it is available
builtin-remote: add set-head subcommand
builtin-remote: teach show to display remote HEAD
builtin-remote: fix two inconsistencies in the output of "show <remote>"
builtin-remote: make get_remote_ref_states() always populate states.tracked
builtin-remote: rename variables and eliminate redundant function call
builtin-remote: remove unused code in get_ref_states
builtin-remote: refactor duplicated cleanup code
string-list: new for_each_string_list() function
remote: make match_refs() not short-circuit
remote: make match_refs() copy src ref before assigning to peer_ref
remote: let guess_remote_head() optionally return all matches
remote: make copy_ref() perform a deep copy
remote: simplify guess_remote_head()
move locate_head() to remote.c
move duplicated ref_newer() to remote.c
move duplicated get_local_heads() to remote.c
...
Conflicts:
builtin-clone.c
* kb/checkout-optim:
Revert "lstat_cache(): print a warning if doing ping-pong between cache types"
checkout bugfix: use stat.mtime instead of stat.ctime in two places
Makefile: Set compiler switch for USE_NSEC
Create USE_ST_TIMESPEC and turn it on for Darwin
Not all systems use st_[cm]tim field for ns resolution file timestamp
Record ns-timestamps if possible, but do not use it without USE_NSEC
write_index(): update index_state->timestamp after flushing to disk
verify_uptodate(): add ce_uptodate(ce) test
make USE_NSEC work as expected
fix compile error when USE_NSEC is defined
check_updates(): effective removal of cache entries marked CE_REMOVE
lstat_cache(): print a warning if doing ping-pong between cache types
show_patch_diff(): remove a call to fstat()
write_entry(): use fstat() instead of lstat() when file is open
write_entry(): cleanup of some duplicated code
create_directories(): remove some memcpy() and strchr() calls
unlink_entry(): introduce schedule_dir_for_removal()
lstat_cache(): swap func(length, string) into func(string, length)
lstat_cache(): generalise longest_match_lstat_cache()
lstat_cache(): small cleanup and optimisation
When "git push" is not told what refspecs to push, it pushes all matching
branches to the current remote. For some workflows this default is not
useful, and surprises new users. Some have even found that this default
behaviour is too easy to trigger by accident with unwanted consequences.
Introduce a new configuration variable "push.default" that decides what
action git push should take if no refspecs are given or implied by the
command line arguments or the current remote configuration.
Possible values are:
'nothing' : Push nothing;
'matching' : Current default behaviour, push all branches that already
exist in the current remote;
'tracking' : Push the current branch to whatever it is tracking;
'current' : Push the current branch to a branch of the same name,
i.e. HEAD.
Signed-off-by: Finn Arne Gangstad <finnag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Its "ignore_packed" parameter always comes from struct rev_info. This
patch makes the function take a pointer to the surrounding structure, so
that the refactoring in the next patch becomes easier to review.
There is an unfortunate header file dependency and the easiest workaround
is to temporarily move the function declaration from cache.h to
revision.h; this will be moved back to cache.h once the function loses
this "ignore_packed" parameter altogether in the later part of the
series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.
Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.
All other callers and has_sha1_pack() lose the ignore_packed parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since it doesn't actually touch its argument, this makes
sense.
However, we still want to return a non-const version (which
requires a cast) so that this:
struct ref *a, *b;
a = find_ref_by_name(b);
works. Unfortunately, you can also silently strip the const
from a variable:
struct ref *a;
const struct ref *b;
a = find_ref_by_name(b);
This is a classic C const problem because there is no way to
say "return the type with the same constness that was passed
to us"; we provide the same semantics as standard library
functions like strchr.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this timestamp is used to check for racy-clean files, it is
important to keep it uptodate.
For the 'git checkout' command without the '-q' option, this make a
huge difference. Before, each and every file which was updated, was
racy-clean after the call to unpack_trees() and write_index() but
before the GIT process ended.
And because of the call to show_local_changes() in builtin-checkout.c,
we ended up reading those files back into memory, doing a SHA1 to
check if the files was really different from the index. And, of
course, no file was different.
With this fix, 'git checkout' without the '-q' option should now be
almost as fast as with the '-q' option, but not quite, as we still do
some few lstat(2) calls more without the '-q' option.
Below is some average numbers for 10 checkout's to v2.6.27 and 10 to
v2.6.25 of the Linux kernel, to show the difference:
before (git version 1.6.2.rc1.256.g58a87):
7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor
after:
6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Talking about --date, one thing I wanted for the 1234567890 date was to
get things in the raw format. Sure, you get them with --pretty=raw, but it
felt a bit sad that you couldn't just ask for the date in raw format.
So here's a throw-away patch (meaning: I won't be re-sending it, because I
really don't think it's a big deal) to add "--date=raw". It just prints
out the internal raw git format - seconds since epoch plus timezone (put
another way: 'date +"%s %z"' format)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just saying that index.lock exists doesn't tell the user _what_ to do
to fix the problem. We should give an indication that it's normally
safe to delete index.lock after making sure git isn't running here.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function strip_path_suffix() will try to strip a given suffix from
a given path. The suffix must start at a directory boundary (i.e. "core"
is not a path suffix of "libexec/git-core", but "git-core" is).
Arbitrary runs of directory separators ("slashes") are assumed identical.
Example:
strip_path_suffix("C:\\msysgit/\\libexec\\git-core",
"libexec///git-core", &prefix)
will set prefix to "C:\\msysgit" and return 0.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.
This will make racy git situations less likely to happen. For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not. The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.
Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by. For a fast Solid State Disk, this patch should be helpfull.
Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.
For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):
CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
409247 100.000 342878 100.000 git
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
260476 63.6476 257843 75.1996 libz.so.1.2.3
100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux
30850 7.5382 7874 2.2964 libc-2.9.so
14775 3.6103 8390 2.4469 git
2020 0.4936 4325 1.2614 libcrypto.so.0.9.8
191 0.0467 32 0.0093 libpthread-2.9.so
58 0.0142 36 0.0105 ld-2.9.so
1 2.4e-04 0 0 libldap-2.3.so.0.2.31
Detail list of the top 20 function entries (libz counted in one blob):
CPU_CLK_UNHALTED INST_RETIRED_ANY_P
samples % samples % image name symbol name
260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3
16587 4.0555 3636 1.0615 libc-2.9.so memcpy
7710 1.8851 277 0.0809 libc-2.9.so memmove
3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate
3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk
3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc
2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user
2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk
2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access
2070 0.5061 514 0.1501 git cache_name_compare
2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8
1965 0.4804 1384 0.4040 git patch_delta
1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659 0.4056 290 0.0847 git find_pack_entry_one
1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks
Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.
Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:
memmove(istate->cache + pos,
istate->cache + pos + 1,
(istate->cache_nr - pos) * sizeof(struct cache_entry *));
remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.
Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).
OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.
To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.
From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:
46 0.0105 15 0.0041 git remove_marked_cache_entries
If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test. And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ms/mailmap:
Move mailmap documentation into separate file
Change current mailmap usage to do matching on both name and email of author/committer.
Add map_user() and clear_mailmap() to mailmap
Add find_insert_index, insert_at_index and clear_func functions to string_list
Add mailmap.file as configurational option for mailmap location
* js/maint-1.6.0-path-normalize:
Remove unused normalize_absolute_path()
Test and fix normalize_path_copy()
Fix GIT_CEILING_DIRECTORIES on Windows
Move sanitary_path_copy() to path.c and rename it to normalize_path_copy()
Make test-path-utils more robust against incorrect use
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.
Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time. So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.
To fix this and avoid some unnecessary calls to rmdir(), we schedule
each directory for removal and wait much longer before we do the real
call to rmdir().
Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir(). In some cases over 95% of all calls to
rmdir() is removed with this patch.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.
Also, add a note about this fact into the coding guidelines.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to augment the repo mailmap file, and to use
mailmap files elsewhere than the repository root. Meaning
that the entries in mailmap.file will override the entries
in "./.mailmap", should they match.
Signed-off-by: Marius Storm-Olsen <marius@trolltech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is now superseded by normalize_path_copy().
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function and normalize_absolute_path() do almost the same thing. The
former already works on Windows, but the latter crashes.
In subsequent changes we will remove normalize_absolute_path(). Here we
make the replacement function reusable. On the way we rename it to reflect
that it does some path normalization. Apart from that this is only moving
around code.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/notes:
git-notes: fix printing of multi-line notes
notes: fix core.notesRef documentation
Add an expensive test for git-notes
Speed up git notes lookup
Add a script to edit/inspect notes
Introduce commit notes
Conflicts:
pretty.c
* kb/lstat-cache:
lstat_cache(): introduce clear_lstat_cache() function
lstat_cache(): introduce invalidate_lstat_cache() function
lstat_cache(): introduce has_dirs_only_path() function
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): more cache effective symlink/directory detection
If you want to completely clear the contents of the lstat_cache(), then
call this new function.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases it could maybe be necessary to say to the cache that
"Hey, I deleted/changed the type of this pathname and if you currently
have it inside your cache, you should deleted it".
This patch introduce a function which support this.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time. For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.
To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions. Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.
To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also be told to keep track
of non-existing directories.
So we update the lstat_cache() function to handle this new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.
We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement a shortcut @{-N} for the N-th last branch checked out, that
works by parsing the reflog for the message added by previous
git-checkout invocations. We expand the @{-N} to the branch name, so
that you end up on an attached HEAD on that branch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both versions have the same functionality. This removes any
redundancy.
This also adds makes two extensions to match_pathspec:
- If pathspec is NULL, return 1. This reflects the behavior of git
commands, for which no paths usually means "match all paths".
- If seen is NULL, do not use it.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is only used from "sha1_file.c".
And as we want to add a "replace_object" hook in "read_sha1_file",
we must not let people bypass the hook using something other than
"read_sha1_file".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.
This should fix t5303 on Windows.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
This uses the extended index flag mechanism introduced earlier to mark
the entries added to the index via "git add -N" with CE_INTENT_TO_ADD.
The logic to detect an "intent to add" entry for the purpose of allowing
"git rm --cached $path" is tightened to check not just for a staged empty
blob, but with the CE_INTENT_TO_ADD bit. This protects an empty blob that
was explicitly added and then modified in the work tree from being dropped
with this sequence:
$ >empty
$ git add empty
$ echo "non empty" >empty
$ git rm --cached empty
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This can do the lstat() storm in parallel, giving potentially much
improved performance for cold-cache cases or things like NFS that have
weak metadata caching.
Just use "read_cache_preload()" instead of "read_cache()" to force an
optimistic preload of the index stat data. The function takes a
pathspec as its argument, allowing us to preload only the relevant
portion of the index.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
* bc/maint-keep-pack:
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
* maint:
Start 1.6.0.5 cycle
Fix pack.packSizeLimit and --max-pack-size handling
checkout: Fix "initial checkout" detection
Remove the period after the git-check-attr summary
Conflicts:
RelNotes
Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07)
tightened the rule to prevent switching branches from losing local
changes, so that staged removal of paths can be protected, while
attempting to keep a loophole to still allow a special case of switching
out of an un-checked-out state.
However, the loophole was made a bit too tight, and did not allow
switching from one branch (in an un-checked-out state) to check out
another branch.
The change to builtin-checkout.c in this commit loosens it to allow this,
by not insisting the original commit and the new commit to be the same.
It also introduces a new function, is_index_unborn (and an associated
macro, is_cache_unborn), to check if the repository is truly in an
un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE
did not exist when populating the in-core index structure. A few places
the earlier commit 5521883 added the check for the initial checkout
condition are updated to use this function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This can potentially be used in a few places, so let's make
it available to all parts of the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack_keep will be set when a pack file has an associated .keep file.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/maint-mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Fix potentially dangerous uses of mkpath and git_path
Fix mkpath abuse in dwim_ref and dwim_log of sha1_name.c
Add mksnpath which allows you to specify the output buffer
Conflicts:
builtin-revert.c
rerere.c
* mv/maint-branch-m-symref:
update-ref --no-deref -d: handle the case when the pointed ref is packed
git branch -m: forbid renaming of a symref
Fix git update-ref --no-deref -d.
rename_ref(): handle the case when the reflog of a ref does not exist
Fix git branch -m for symrefs.
* ar/mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Fix potentially dangerous uses of mkpath and git_path
Fix potentially dangerous uses of mkpath and git_path
Fix mkpath abuse in dwim_ref and dwim_log of sha1_name.c
Add mksnpath which allows you to specify the output buffer
Conflicts:
builtin-revert.c
* mv/maint-branch-m-symref:
update-ref --no-deref -d: handle the case when the pointed ref is packed
git branch -m: forbid renaming of a symref
Fix git update-ref --no-deref -d.
rename_ref(): handle the case when the reflog of a ref does not exist
Fix git branch -m for symrefs.
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.
So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/maint-mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Conflicts:
builtin-revert.c
refs.c
rerere.c