Commit Graph

1407 Commits

Author SHA1 Message Date
Erik Elfström
921bdd96af setup: sanity check file size in read_gitfile_gently
read_gitfile_gently will allocate a buffer to fit the entire file that
should be read. Add a sanity check of the file size before opening to
avoid allocating a potentially huge amount of memory if we come across
a large file that someone happened to name ".git". The limit is set to
a sufficiently unreasonable size that should never be exceeded by a
genuine .git file.

Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-15 13:14:01 -07:00
Mike Hommey
58d121b22b Allow to control where the replace refs are looked for
It can be useful to have grafts or replace refs for specific use-cases while
keeping the default "view" of the repository pristine (or with a different
set of grafts/replace refs).

It is possible to use a different graft file with GIT_GRAFT_FILE, but while
replace refs are more powerful, they don't have an equivalent override.

Add a GIT_REPLACE_REF_BASE environment variable to control where git is
going to look for replace refs.

Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-12 15:28:17 -07:00
Erik Elfström
a93bedada8 setup: add gentle version of read_gitfile
read_gitfile will die on most error cases. This makes it unsuitable
for speculative calls. Extract the core logic and provide a gentle
version that returns NULL on failure.

The first usecase of the new gentle version will be to probe for
submodules during git clean.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Erik Elfström <erik.elfstrom@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-09 12:29:22 -07:00
Jeff King
0eeb077be7 index-pack: avoid excessive re-reading of pack directory
Since 45e8a74 (has_sha1_file: re-check pack directory before
giving up, 2013-08-30), we spend extra effort for
has_sha1_file to give the right answer when somebody else is
repacking. Usually this effort does not matter, because
after finding that the object does not exist, the next step
is usually to die().

However, some code paths make a large number of
has_sha1_file checks which are _not_ expected to return 1.
The collision test in index-pack.c is such a case. On a
local system, this can cause a performance slowdown of
around 5%. But on a system with high-latency system calls
(like NFS), it can be much worse.

This patch introduces a "quick" flag to has_sha1_file which
callers can use when they would prefer high performance at
the cost of false negatives during repacks. There may be
other code paths that can use this, but the index-pack one
is the most obviously critical, so we'll start with
switching that one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-09 12:26:35 -07:00
Johannes Sixt
2024d31765 help.c: wrap wait-only poll() invocation in sleep_millisec()
We want to use the new function elsewhere in a moment.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-05 15:00:32 -07:00
Junio C Hamano
d9c82fa7a7 Merge branch 'pt/xdg-config-path' into maint
Code clean-up for xdg configuration path support.

* pt/xdg-config-path:
  path.c: remove home_config_paths()
  git-config: replace use of home_config_paths()
  git-commit: replace use of home_config_paths()
  credential-store.c: replace home_config_paths() with xdg_config_home()
  dir.c: replace home_config_paths() with xdg_config_home()
  attr.c: replace home_config_paths() with xdg_config_home()
  path.c: implement xdg_config_home()
  t0302: "unreadable" test needs POSIXPERM
  t0302: test credential-store support for XDG_CONFIG_HOME
  git-credential-store: support XDG_CONFIG_HOME
  git-credential-store: support multiple credential files
2015-06-05 12:00:04 -07:00
Junio C Hamano
67f0b6f3b2 Merge branch 'dt/cat-file-follow-symlinks'
"git cat-file --batch(-check)" learned the "--follow-symlinks"
option that follows an in-tree symbolic link when asked about an
object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at
Documentation/RelNotes/2.5.0.txt.  With the new option, the command
behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as
input instead.

* dt/cat-file-follow-symlinks:
  cat-file: add --follow-symlinks to --batch
  sha1_name: get_sha1_with_context learns to follow symlinks
  tree-walk: learn get_tree_entry_follow_symlinks
2015-06-01 12:45:16 -07:00
Junio C Hamano
1e6c8babf8 Merge branch 'jc/hash-object' into maint
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.

* jc/hash-object:
  write_sha1_file(): do not use a separate sha1[] array
  t1007: add hash-object --literally tests
  hash-object --literally: fix buffer overrun with extra-long object type
  git-hash-object.txt: document --literally option
2015-05-26 13:49:25 -07:00
Junio C Hamano
38ccaf93bb Merge branch 'nd/untracked-cache'
Teach the index to optionally remember already seen untracked files
to speed up "git status" in a working tree with tons of cruft.

* nd/untracked-cache: (24 commits)
  git-status.txt: advertisement for untracked cache
  untracked cache: guard and disable on system changes
  mingw32: add uname()
  t7063: tests for untracked cache
  update-index: test the system before enabling untracked cache
  update-index: manually enable or disable untracked cache
  status: enable untracked cache
  untracked-cache: temporarily disable with $GIT_DISABLE_UNTRACKED_CACHE
  untracked cache: mark index dirty if untracked cache is updated
  untracked cache: print stats with $GIT_TRACE_UNTRACKED_STATS
  untracked cache: avoid racy timestamps
  read-cache.c: split racy stat test to a separate function
  untracked cache: invalidate at index addition or removal
  untracked cache: load from UNTR index extension
  untracked cache: save to an index extension
  ewah: add convenient wrapper ewah_serialize_strbuf()
  untracked cache: don't open non-existent .gitignore
  untracked cache: mark what dirs should be recursed/saved
  untracked cache: record/validate dir mtime and reuse cached output
  untracked cache: make a wrapper around {open,read,close}dir()
  ...
2015-05-26 13:24:46 -07:00
Junio C Hamano
ddaf4e2e9a Merge branch 'jc/ignore-epipe-in-filter'
Filter scripts were run with SIGPIPE disabled on the Git side,
expecting that they may not read what Git feeds them to filter.
We however treated a filter that does not read its input fully
before exiting as an error.

This changes semantics, but arguably in a good way.  If a filter
can produce its output without consuming its input using whatever
magic, we now let it do so, instead of diagnosing it as a
programming error.

* jc/ignore-epipe-in-filter:
  filter_buffer_or_fd(): ignore EPIPE
  copy.c: make copy_fd() report its status silently
2015-05-22 12:41:57 -07:00
David Turner
c4ec96774b sha1_name: get_sha1_with_context learns to follow symlinks
Wire up get_sha1_with_context to call get_tree_entry_follow_symlinks
when GET_SHA1_FOLLOW_SYMLINKS is passed in flags. G_S_FOLLOW_SYMLINKS
is incompatible with G_S_ONLY_TO_DIE because the diagnosis
that ONLY_TO_DIE triggers does not at present consider symlinks, and
it would be a significant amount of additional code to allow it to
do so.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 13:46:13 -07:00
Junio C Hamano
00b7cbfcb3 copy.c: make copy_fd() report its status silently
When copy_fd() function encounters errors, it emits error messages
itself, which makes it impossible for callers to take responsibility
for reporting errors, especially when they want to ignore certain
errors.

Move the error reporting to its callers in preparation.

 - copy_file() and copy_file_with_time() by indirection get their
   own calls to error().

 - hold_lock_file_for_append(), when told to die on error, used to
   exit(128) relying on the error message from copy_fd(), but now it
   does its own die() instead.  Note that the callers that do not
   pass LOCK_DIE_ON_ERROR need to be adjusted for this change, but
   fortunately there is none ;-)

 - filter_buffer_or_fd() has its own error() already, in addition to
   the message from copy_fd(), so this will change the output but
   arguably in a better way.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-19 14:48:54 -07:00
Junio C Hamano
3b7d373ae2 Merge branch 'kn/cat-file-literally'
Add the "--allow-unknown-type" option to "cat-file" to allow
inspecting loose objects of an experimental or a broken type.

* kn/cat-file-literally:
  t1006: add tests for git cat-file --allow-unknown-type
  cat-file: teach cat-file a '--allow-unknown-type' option
  cat-file: make the options mutually exclusive
  sha1_file: support reading from a loose object of unknown type
2015-05-19 13:17:58 -07:00
Junio C Hamano
ebb464f0cb Merge branch 'jk/prune-mtime' into maint
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.

* jk/prune-mtime:
  sha1_file: only freshen packs once per run
  sha1_file: freshen pack objects before loose
  reachable: only mark local objects as recent
2015-05-13 14:05:50 -07:00
Junio C Hamano
558e5a8c40 Merge branch 'pt/xdg-config-path'
Code clean-up for xdg configuration path support.

* pt/xdg-config-path:
  path.c: remove home_config_paths()
  git-config: replace use of home_config_paths()
  git-commit: replace use of home_config_paths()
  credential-store.c: replace home_config_paths() with xdg_config_home()
  dir.c: replace home_config_paths() with xdg_config_home()
  attr.c: replace home_config_paths() with xdg_config_home()
  path.c: implement xdg_config_home()
2015-05-11 14:24:01 -07:00
Junio C Hamano
051086b947 Merge branch 'jc/hash-object'
"hash-object --literally" introduced in v2.2 was not prepared to
take a really long object type name.

* jc/hash-object:
  write_sha1_file(): do not use a separate sha1[] array
  t1007: add hash-object --literally tests
  hash-object --literally: fix buffer overrun with extra-long object type
  git-hash-object.txt: document --literally option
2015-05-11 14:23:59 -07:00
Junio C Hamano
68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Karthik Nayak
46f034483e sha1_file: support reading from a loose object of unknown type
Update sha1_loose_object_info() to optionally allow it to read
from a loose object file of unknown/bogus type; as the function
usually returns the type of the object it read in the form of enum
for known types, add an optional "typename" field to receive the
name of the type in textual form and a flag to indicate the reading
of a loose object file of unknown/bogus type.

Add parse_sha1_header_extended() which acts as a wrapper around
parse_sha1_header() allowing more information to be obtained.

Add unpack_sha1_header_to_strbuf() to unpack sha1 headers of
unknown/corrupt objects which have a unknown sha1 header size to
a strbuf structure. This was written by Junio C Hamano but tested
by me.

Helped-by: Junio C Hamano <gitster@pobox.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Hepled-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06 13:35:48 -07:00
Paul Tan
846e5dfbab path.c: remove home_config_paths()
home_config_paths() combines distinct functionality already implemented
by expand_user_path() and xdg_config_home(), and it also hard-codes the
path ~/.gitconfig, which makes it unsuitable to use for other home
config file paths. Since its use will just add unnecessary complexity to
the code, remove it.

Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06 11:33:53 -07:00
Paul Tan
ea19289bc8 path.c: implement xdg_config_home()
The XDG base dir spec[1] specifies that configuration files be stored in
a subdirectory in $XDG_CONFIG_HOME. To construct such a configuration
file path, home_config_paths() can be used. However, home_config_paths()
combines distinct functionality:

1. Retrieve the home git config file path ~/.gitconfig

2. Construct the XDG config path of the file specified by `file`.

This function was introduced in commit 21cf3227 ("read (but not write)
from $XDG_CONFIG_HOME/git/config file").  While the intention of the
function was to allow the home directory configuration file path and the
xdg directory configuration file path to be retrieved with one function
call, the hard-coding of the path ~/.gitconfig prevents it from being
used for other configuration files. Furthermore, retrieving a file path
relative to the user's home directory can be done with
expand_user_path(). Hence, it can be seen that home_config_paths()
introduces unnecessary complexity, especially if a user just wants to
retrieve the xdg config file path.

As such, implement a simpler function xdg_config_home() for constructing
the XDG base dir spec configuration file path. This function, together
with expand_user_path(), can replace all uses of home_config_paths().

[1] http://standards.freedesktop.org/basedir-spec/basedir-spec-0.7.html

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06 11:21:04 -07:00
Junio C Hamano
e3b199aef1 Merge branch 'jk/prune-mtime'
Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.

* jk/prune-mtime:
  sha1_file: only freshen packs once per run
  sha1_file: freshen pack objects before loose
  reachable: only mark local objects as recent
2015-05-05 21:00:37 -07:00
Junio C Hamano
a916cb5fb4 Merge branch 'bc/object-id'
Identify parts of the code that knows that we use SHA-1 hash to
name our objects too much, and use (1) symbolic constants instead
of hardcoded 20 as byte count and/or (2) use struct object_id
instead of unsigned char [20] for object names.

* bc/object-id:
  apply: convert threeway_stage to object_id
  patch-id: convert to use struct object_id
  commit: convert parts to struct object_id
  diff: convert struct combine_diff_path to object_id
  bulk-checkin.c: convert to use struct object_id
  zip: use GIT_SHA1_HEXSZ for trailers
  archive.c: convert to use struct object_id
  bisect.c: convert leaf functions to use struct object_id
  define utility functions for object IDs
  define a structure for object IDs
2015-05-05 21:00:23 -07:00
Eric Sunshine
0c3db67cc8 hash-object --literally: fix buffer overrun with extra-long object type
"hash-object" learned in 5ba9a93 (hash-object: add --literally
option, 2014-09-11) to allow crafting a corrupt/broken object of
unknown type.

When the user-provided type is particularly long, however, it can
overflow the relatively small stack-based character array handed to
write_sha1_file_prepare() by hash_sha1_file() and write_sha1_file(),
leading to stack corruption (and crash).  Introduce a custom helper
to allow arbitrarily long typenames just for "hash-object --literally".

[jc: Eric's original used a strbuf in the more common codepaths, and
I rewrote it to avoid penalizing the non-literally code. Bugs are mine]

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-05 10:14:18 -07:00
Jeff King
ee1c6c34ac sha1_file: only freshen packs once per run
Since 33d4221 (write_sha1_file: freshen existing objects,
2014-10-15), we update the mtime of existing objects that we
would have written out (had they not existed). For the
common case in which many objects are packed, we may update
the mtime on a single packfile repeatedly. This can result
in a noticeable performance problem if calling utime() is
expensive (e.g., because your storage is on NFS).

We can fix this by keeping a per-pack flag that lets us
freshen only once per program invocation.

An alternative would be to keep the packed_git.mtime flag up
to date as we freshen, and freshen only once every N
seconds. In practice, it's not worth the complexity. We are
racing against prune expiration times here, which inherently
must be set to accomodate reasonable program running times
(because they really care about the time between an object
being written and it becoming referenced, and the latter is
typically the last step a program takes).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20 13:09:40 -07:00
Jeff King
1385bb7ba3 reachable: only mark local objects as recent
When pruning and repacking a repository that has an
alternate object store configured, we may traverse a large
number of objects in the alternate. This serves no purpose,
and may be expensive to do. A longer explanation is below.

Commits d3038d2 and abcb865 taught prune and pack-objects
(respectively) to treat "recent" objects as tips for
reachability, so that we keep whole chunks of history. They
built on the object traversal in 660c889 (sha1_file: add
for_each iterators for loose and packed objects,
2014-10-15), which covers both local and alternate objects.

In both cases, covering alternate objects is unnecessary, as
both commands can only drop objects from the local
repository. In the case of prune, we traverse only the local
object directory. And in the case of repacking, while we may
or may not include local objects in our pack, we will never
reach into the alternate with "repack -d". The "-l" option
is only a question of whether we are migrating objects from
the alternate into our repository, or leaving them
untouched.

It is possible that we may drop an object that is depended
upon by another object in the alternate. For example,
imagine two repositories, A and B, with A pointing to B as
an alternate. Now imagine a commit that is in B which
references a tree that is only in A. Traversing from recent
objects in B might prevent A from dropping that tree. But
this case isn't worth covering. Repo B should take
responsibility for its own objects. It would never have had
the commit in the first place if it did not also have the
tree, and assuming it is using the same "keep recent chunks
of history" scheme, then it would itself keep the tree, as
well.

So checking the alternate objects is not worth doing, and
come with a significant performance impact. In both cases,
we skip any recent objects that have already been marked
SEEN (i.e., that we know are already reachable for prune, or
included in the pack for a repack). So there is a slight
waste of time in opening the alternate packs at all, only to
notice that we have already considered each object. But much
worse, the alternate repository may have a large number of
objects that are not reachable from the local repository at
all, and we end up adding them to the traversal.

We can fix this by considering only local unseen objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-04-20 13:09:27 -07:00
Junio C Hamano
574ee8ae86 Merge branch 'jc/report-path-error-to-dir'
Code clean-up.

* jc/report-path-error-to-dir:
  report_path_error(): move to dir.c
2015-03-26 11:57:13 -07:00
Junio C Hamano
05e816e37f Merge branch 'jk/prune-with-corrupt-refs'
"git prune" used to largely ignore broken refs when deciding which
objects are still being used, which could spread an existing small
damage and make it a larger one.

* jk/prune-with-corrupt-refs:
  refs.c: drop curate_packed_refs
  repack: turn on "ref paranoia" when doing a destructive repack
  prune: turn on ref_paranoia flag
  refs: introduce a "ref paranoia" flag
  t5312: test object deletion code paths in a corrupted repository
2015-03-25 12:54:26 -07:00
Junio C Hamano
777c55a616 report_path_error(): move to dir.c
The expected call sequence is for the caller to use match_pathspec()
repeatedly on a set of pathspecs, accumulating the "hits" in a
separate array, and then call this function to diagnose a pathspec
that never matched anything, as that can indicate a typo from the
command line, e.g. "git commit Maekfile".

Many builtin commands use this function from builtin/ls-files.c,
which is not a very healthy arrangement.  ls-files might have been
the first command to feel the need for such a helper, but the need
is shared by everybody who uses the "match and then report" pattern.

Move it to dir.c where match_pathspec() is defined.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-24 14:12:10 -07:00
Jeff King
49672f26d9 refs: introduce a "ref paranoia" flag
Most operations that iterate over refs are happy to ignore
broken cruft. However, some operations should be performed
with knowledge of these broken refs, because it is better
for the operation to choke on a missing object than it is to
silently pretend that the ref did not exist (e.g., if we are
computing the set of reachable tips in order to prune
objects).

These processes could just call for_each_rawref, except that
ref iteration is often hidden behind other interfaces. For
instance, for a destructive "repack -ad", we would have to
inform "pack-objects" that we are destructive, and then it
would in turn have to tell the revision code that our
"--all" should include broken refs.

It's much simpler to just set a global for "dangerous"
operations that includes broken refs in all iterations.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 12:40:49 -07:00
brian m. carlson
aa1c6fdf47 define utility functions for object IDs
There are several utility functions (hashcmp and friends) that are used
for comparing object IDs (SHA-1 values).  Using these functions, which
take pointers to unsigned char, with struct object_id requires tiresome
access to the sha1 member, which bloats code and violates the desired
encapsulation.  Provide wrappers around these functions for struct
object_id for neater, more maintainable code.  Use the new constants to
avoid the hard-coded 20s and 40s throughout the original functions.

These functions simply call the underlying pointer-to-unsigned-char
versions to ensure that any performance improvements will be passed
through to the new functions.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:43:11 -07:00
brian m. carlson
5f7817c85d define a structure for object IDs
Many places throughout the code use "unsigned char [20]" to store object IDs
(SHA-1 values).  This leads to lots of hardcoded numbers throughout the
codebase.  It also leads to confusion about the purposes of a buffer.

Introduce a structure for object IDs.  This allows us to obtain the benefits
of compile-time checking for misuse.  The structure is expected to remain
the same size and have the same alignment requirements on all known
platforms, compared to the array of unsigned char, although this is not
required for correctness.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:43:11 -07:00
Nguyễn Thái Ngọc Duy
1bbb3dba3f untracked cache: mark index dirty if untracked cache is updated
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:17 -07:00
Nguyễn Thái Ngọc Duy
ed4efab1b1 untracked cache: avoid racy timestamps
When a directory is updated within the same second that its timestamp
is last saved, we cannot realize the directory has been updated by
checking timestamps. Assume the worst (something is update). See
29e4d36 (Racy GIT - 2005-12-20) for more information.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:17 -07:00
Nguyễn Thái Ngọc Duy
83c094ad0d untracked cache: save to an index extension
Helped-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-12 13:45:16 -07:00
Junio C Hamano
cbc8d6d8f8 Merge branch 'jk/prune-mtime' into maint
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.

* jk/prune-mtime:
  sha1_file: fix iterating loose alternate objects
  for_each_loose_file_in_objdir: take an optional strbuf path
2015-03-05 13:13:08 -08:00
Junio C Hamano
3bef3c12d6 Merge branch 'jk/decimal-width-for-uintmax' into maint
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.

* jk/decimal-width-for-uintmax:
  decimal_width: avoid integer overflow
2015-03-05 13:12:59 -08:00
Junio C Hamano
fd9de868c3 Merge branch 'mh/refs-have-new'
Simplify the ref transaction API around how "the ref should be
pointing at this object" is specified.

* mh/refs-have-new:
  refs.h: remove duplication in function docstrings
  update_ref(): improve documentation
  ref_transaction_verify(): new function to check a reference's value
  ref_transaction_delete(): check that old_sha1 is not null_sha1
  ref_transaction_create(): check that new_sha1 is valid
  commit: avoid race when creating orphan commits
  commit: add tests of commit races
  ref_transaction_delete(): remove "have_old" parameter
  ref_transaction_update(): remove "have_old" parameter
  struct ref_update: move "have_old" into "flags"
  refs.c: change some "flags" to "unsigned int"
  refs: remove the gap in the REF_* constant values
  refs: move REF_DELETING to refs.c
2015-03-05 12:45:39 -08:00
Junio C Hamano
7cf6232e2c Merge branch 'jk/prune-mtime'
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.

* jk/prune-mtime:
  sha1_file: fix iterating loose alternate objects
  for_each_loose_file_in_objdir: take an optional strbuf path
2015-02-22 12:28:28 -08:00
Junio C Hamano
ca00db08da Merge branch 'jk/decimal-width-for-uintmax'
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.

* jk/decimal-width-for-uintmax:
  decimal_width: avoid integer overflow
2015-02-18 11:45:02 -08:00
Michael Haggerty
fec14ec38c refs.c: change some "flags" to "unsigned int"
Change the following functions' "flags" arguments from "int" to
"unsigned int":

 * ref_transaction_update()
 * ref_transaction_create()
 * ref_transaction_delete()
 * update_ref()
 * delete_ref()
 * lock_ref_sha1_basic()

Also change the "flags" member in "struct ref_update" to unsigned.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-17 11:22:29 -08:00
Jeff King
e6f875e052 for_each_loose_file_in_objdir: take an optional strbuf path
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.

We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-09 14:14:53 -08:00
Jeff King
d306f3d351 decimal_width: avoid integer overflow
The decimal_width function originally appeared in blame.c as
"lineno_width", and was designed for calculating the
print-width of small-ish integer values (line numbers in
text files). In ec7ff5b, it was made into a reusable
function, and in dc801e7, we started using it to align
diffstats.

Binary files in a diffstat show byte counts rather than line
numbers, meaning they can be quite large (e.g., consider
adding or removing a 2GB file). decimal_width is not up to
the challenge for two reasons:

  1. It takes the value as an "int", whereas large files may
     easily surpass this. The value may be truncated, in
     which case we will produce an incorrect value.

  2. It counts "up" by repeatedly multiplying another
     integer by 10 until it surpasses the value.  This can
     cause an infinite loop when the value is close to the
     largest representable integer.

     For example, consider using a 32-bit signed integer,
     and a value of 2,140,000,000 (just shy of 2^31-1).
     We will count up and eventually see that 1,000,000,000
     is smaller than our value. The next step would be to
     multiply by 10 and see that 10,000,000,000 is too
     large, ending the loop. But we can't represent that
     value, and we have signed overflow.

     This is technically undefined behavior, but a common
     behavior is to lose the high bits, in which case our
     iterator will certainly be less than the number. So
     we'll keep multiplying, overflow again, and so on.

This patch changes the argument to a uintmax_t (the same
type we use to store the diffstat information for binary
filese), and counts "down" by repeatedly dividing our value
by 10.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-05 12:38:35 -08:00
Junio C Hamano
2f17ecbd8d Merge branch 'dm/compat-s-ifmt-for-zos'
Long overdue departure from the assumption that S_IFMT is shared by
everybody made in 2005.

* dm/compat-s-ifmt-for-zos:
  compat: convert modes to use portable file type values
2014-12-22 12:27:16 -08:00
Junio C Hamano
77933f4449 Sync with v2.1.4
* maint-2.1:
  Git 2.1.4
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:46:57 -08:00
Junio C Hamano
58f1d950e3 Sync with v2.0.5
* maint-2.0:
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:42:28 -08:00
Junio C Hamano
5e519fb8b0 Sync with v1.9.5
* maint-1.9:
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:28:54 -08:00
Junio C Hamano
6898b79721 Sync with v1.8.5.6
* maint-1.8.5:
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:20:31 -08:00
Johannes Schindelin
2b4c6efc82 read-cache: optionally disallow NTFS .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on NTFS nor FAT32.
     In practice this probably doesn't matter, though, as
     the restricted names are rather obscure and almost
     certainly would never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Johannes Schindelin
1d1d69bc52 path: add is_ntfs_dotgit() helper
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.

On NTFS (and FAT32), there exist so-called "short names" for
backwards-compatibility: 8.3 compliant names that refer to the same files
as their long names. As ".git" is not an 8.3 compliant name, a short name
is generated automatically, typically "git~1".

Depending on the Windows version, any combination of trailing spaces and
periods are ignored, too, so that both "git~1." and ".git." still refer
to the Git directory. The reason is that 8.3 stores file names shorter
than 8 characters with trailing spaces. So literally, it does not matter
for the short name whether it is padded with spaces or whether it is
shorter than 8 characters, it is considered to be the exact same.

The period is the separator between file name and file extension, and
again, an empty extension consists just of spaces in 8.3 format. So
technically, we would need only take care of the equivalent of this
regex:
        (\.git {0,4}|git~1 {0,3})\. {0,3}

However, there are indications that at least some Windows versions might
be more lenient and accept arbitrary combinations of trailing spaces and
periods and strip them out. So we're playing it real safe here. Besides,
there can be little doubt about the intention behind using file names
matching even the more lenient pattern specified above, therefore we
should be fine with disallowing such patterns.

Extra care is taken to catch names such as '.\\.git\\booh' because the
backslash is marked as a directory separator only on Windows, and we want
to use this new helper function also in fsck on other platforms.

A big thank you goes to Ed Thomson and an unnamed Microsoft engineer for
the detailed analysis performed to come up with the corresponding fixes
for libgit2.

This commit adds a function to detect whether a given file name can refer
to the Git directory by mistake.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King
a42643aa8d read-cache: optionally disallow HFS+ .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on HFS+. In practice
     this probably doesn't matter, though, as the restricted
     names are rather obscure and almost certainly would
     never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:44 -08:00
David Michael
d543d9c0f4 compat: convert modes to use portable file type values
This adds simple wrapper functions around calls to stat(), fstat(),
and lstat() that translate the operating system's native file type
bits to those used by most operating systems.  It also rewrites the
S_IF* macros to the common values, so all file type processing is
performed using the translated modes.  This makes projects portable
across operating systems that use different file type definitions.

Only the file type bits may be affected by these compatibility
functions; the file permission bits are assumed to be 07777 and are
passed through unchanged.

Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-04 11:58:36 -08:00
Nguyễn Thái Ngọc Duy
77a6d84045 count-objects: report unused files in $GIT_DIR/worktrees/...
In linked checkouts, borrowed parts like config is taken from
$GIT_COMMON_DIR. $GIT_DIR/config is never used. Report them as
garbage.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy
316e53e68c wrapper.c: wrapper to open a file, fprintf then close
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:16 -08:00
Nguyễn Thái Ngọc Duy
31e26ebcb5 setup.c: support multi-checkout repo setup
The repo setup procedure is updated to detect $GIT_DIR/commondir and
set $GIT_COMMON_DIR properly.

The core.worktree is ignored when $GIT_COMMON_DIR is set. This is
because the config file is shared in multi-checkout setup, but
checkout directories _are_ different. Making core.worktree effective
in all checkouts mean it's back to a single checkout.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:15 -08:00
Nguyễn Thái Ngọc Duy
c7b3a3d2fe $GIT_COMMON_DIR: a new environment variable
This variable is intended to support multiple working directories
attached to a repository. Such a repository may have a main working
directory, created by either "git init" or "git clone" and one or more
linked working directories. These working directories and the main
repository share the same repository directory.

In linked working directories, $GIT_COMMON_DIR must be defined to point
to the real repository directory and $GIT_DIR points to an unused
subdirectory inside $GIT_COMMON_DIR. File locations inside the
repository are reorganized from the linked worktree view point:

 - worktree-specific such as HEAD, logs/HEAD, index, other top-level
   refs and unrecognized files are from $GIT_DIR.

 - the rest like objects, refs, info, hooks, packed-refs, shallow...
   are from $GIT_COMMON_DIR (except info/sparse-checkout, but that's
   a separate patch)

Scripts are supposed to retrieve paths in $GIT_DIR with "git rev-parse
--git-path", which will take care of "$GIT_DIR vs $GIT_COMMON_DIR"
business.

The redirection is done by git_path(), git_pathdup() and
strbuf_git_path(). The selected list of paths goes to $GIT_COMMON_DIR,
not the other way around in case a developer adds a new
worktree-specific file and it's accidentally promoted to be shared
across repositories (this includes unknown files added by third party
commands)

The list of known files that belong to $GIT_DIR are:

ADD_EDIT.patch BISECT_ANCESTORS_OK BISECT_EXPECTED_REV BISECT_LOG
BISECT_NAMES CHERRY_PICK_HEAD COMMIT_MSG FETCH_HEAD HEAD MERGE_HEAD
MERGE_MODE MERGE_RR NOTES_EDITMSG NOTES_MERGE_WORKTREE ORIG_HEAD
REVERT_HEAD SQUASH_MSG TAG_EDITMSG fast_import_crash_* logs/HEAD
next-index-* rebase-apply rebase-merge rsync-refs-* sequencer/*
shallow_*

Path mapping is NOT done for git_path_submodule(). Multi-checkouts are
not supported as submodules.

Helped-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:13 -08:00
Nguyễn Thái Ngọc Duy
557bd833bb git_path(): be aware of file relocation in $GIT_DIR
We allow the user to relocate certain paths out of $GIT_DIR via
environment variables, e.g. GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE and
GIT_GRAFT_FILE. Callers are not supposed to use git_path() or
git_pathdup() to get those paths. Instead they must use
get_object_directory(), get_index_file() and get_graft_file()
respectively. This is inconvenient and could be missed in review (for
example, there's git_path("objects/info/alternates") somewhere in
sha1_file.c).

This patch makes git_path() and git_pathdup() understand those
environment variables. So if you set GIT_OBJECT_DIRECTORY to /foo/bar,
git_path("objects/abc") should return /foo/bar/abc. The same is done
for the two remaining env variables.

"git rev-parse --git-path" is the wrapper for script use.

This patch kinda reverts a0279e1 (setup_git_env: use git_pathdup
instead of xmalloc + sprintf - 2014-06-19) because using git_pathdup
here would result in infinite recursion:

  setup_git_env() -> git_pathdup("objects") -> .. -> adjust_git_path()
  -> get_object_directory() -> oops, git_object_directory is NOT set
  yet -> setup_git_env()

I wanted to make git_pathdup_literal() that skips adjust_git_path().
But that won't work because later on when $GIT_COMMON_DIR is
introduced, git_pathdup_literal("objects") needs adjust_git_path() to
replace $GIT_DIR with $GIT_COMMON_DIR.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
1a83c240f2 git_snpath(): retire and replace with strbuf_git_path()
In the previous patch, git_snpath() is modified to allocate a new
strbuf buffer because vsnpath() needs that. But that makes it
awkward because git_snpath() receives a pre-allocated buffer from
outside and has to copy data back. Rename it to strbuf_git_path()
and make it receive strbuf directly.

Using git_path() in update_refs_for_switch() which used to call
git_snpath() is safe because that function and all of its callers do
not keep any pointer to the round-robin buffer pool allocated by
get_pathname().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Junio C Hamano
d70e331c0e Merge branch 'jk/prune-mtime'
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.

* jk/prune-mtime: (28 commits)
  drop add_object_array_with_mode
  revision: remove definition of unused 'add_object' function
  pack-objects: double-check options before discarding objects
  repack: pack objects mentioned by the index
  pack-objects: use argv_array
  reachable: use revision machinery's --indexed-objects code
  rev-list: add --indexed-objects option
  rev-list: document --reflog option
  t5516: test pushing a tag of an otherwise unreferenced blob
  traverse_commit_list: support pending blobs/trees with paths
  make add_object_array_with_context interface more sane
  write_sha1_file: freshen existing objects
  pack-objects: match prune logic for discarding objects
  pack-objects: refactor unpack-unreachable expiration check
  prune: keep objects reachable from recent objects
  sha1_file: add for_each iterators for loose and packed objects
  count-objects: use for_each_loose_file_in_objdir
  count-objects: do not use xsize_t when counting object size
  prune-packed: use for_each_loose_file_in_objdir
  reachable: mark index blobs as SEEN
  ...
2014-10-29 10:07:56 -07:00
Jeff King
660c889e46 sha1_file: add for_each iterators for loose and packed objects
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00
Jeff King
fe1b22686f foreach_alt_odb: propagate return value from callback
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:35 -07:00
Ronnie Sahlberg
d0f810f0bc refs.c: allow listing and deleting badly named refs
We currently do not handle badly named refs well:

  $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
  $ git branch
    fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
  $ git branch -D master.....@\*@\\.
    error: branch 'master.....@*@\.' not found.

Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.

So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them.  Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.

Details:

In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").

In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.

Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.

Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially.  For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.

In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:26 -07:00
Jonathan Nieder
62a2d52514 branch -d: avoid repeated symref resolution
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":

 $ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
 $ git branch -d nonsense
 error: branch 'nonsense' not found.

Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:

 $ git update-ref -d refs/heads/nonsense
 error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links

Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.

Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:25 -07:00
Ronnie Sahlberg
7695d118e5 refs.c: change resolve_ref_unsafe reading argument to be a flags field
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading).  Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.

While at it, swap two of the arguments in the function to put output
arguments at the end.  As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.

Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:24 -07:00
Junio C Hamano
bd107e1052 Merge branch 'mh/lockfile'
The lockfile API and its users have been cleaned up.

* mh/lockfile: (38 commits)
  lockfile.h: extract new header file for the functions in lockfile.c
  hold_locked_index(): move from lockfile.c to read-cache.c
  hold_lock_file_for_append(): restore errno before returning
  get_locked_file_path(): new function
  lockfile.c: rename static functions
  lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
  commit_lock_file_to(): refactor a helper out of commit_lock_file()
  trim_last_path_component(): replace last_path_elm()
  resolve_symlink(): take a strbuf parameter
  resolve_symlink(): use a strbuf for internal scratch space
  lockfile: change lock_file::filename into a strbuf
  commit_lock_file(): use a strbuf to manage temporary space
  try_merge_strategy(): use a statically-allocated lock_file object
  try_merge_strategy(): remove redundant lock_file allocation
  struct lock_file: declare some fields volatile
  lockfile: avoid transitory invalid states
  git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
  dump_marks(): remove a redundant call to rollback_lock_file()
  api-lockfile: document edge cases
  commit_lock_file(): rollback lock file on failure to rename
  ...
2014-10-14 10:49:45 -07:00
Junio C Hamano
f0d8900175 Merge branch 'sp/stream-clean-filter'
When running a required clean filter, we do not have to mmap the
original before feeding the filter.  Instead, stream the file
contents directly to the filter and process its output.

* sp/stream-clean-filter:
  sha1_file: don't convert off_t to size_t too early to avoid potential die()
  convert: stream from fd to required clean filter to reduce used address space
  copy_fd(): do not close the input file descriptor
  mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
  config.c: add git_env_ulong() to parse environment variable
  convert: drop arguments other than 'path' from would_convert_to_git()
2014-10-08 13:05:32 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Michael Haggerty
ec38b4e482 get_locked_file_path(): new function
Add a function to return the path of the file that is locked by a
lock_file object. This reduces the knowledge that callers have to have
about the lock_file layout.

Suggested-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:53:54 -07:00
Michael Haggerty
47ba4662bf lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
This makes it harder to misread the name as LOCK_NODE_REF.

Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:53:28 -07:00
Michael Haggerty
751bacedaa commit_lock_file_to(): refactor a helper out of commit_lock_file()
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:52:06 -07:00
Michael Haggerty
cf6950d3bf lockfile: change lock_file::filename into a strbuf
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value.  (That will be fixed in a moment.)

Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file.  But lock_file objects are often reused.  By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:50:01 -07:00
Michael Haggerty
2091c5062c struct lock_file: declare some fields volatile
The function remove_lock_file_on_signal() is used as a signal handler.
It is not realistic to make the signal handler conform strictly to the
C standard, which is very restrictive about what a signal handler is
allowed to do.  But let's increase the likelihood that it will work:

The lock_file_list global variable and several fields from struct
lock_file are used by the signal handler.  Declare those values
"volatile" to (1) force the main process to write the values to RAM
promptly, and (2) prevent updates to these fields from being reordered
in a way that leaves an opportunity for a jump to the signal handler
while the object is in an inconsistent state.

We don't mark the filename field volatile because that would prevent
the use of strcpy(), and it is anyway unlikely that a compiler
re-orders a strcpy() call across other expressions.  So in practice it
should be possible to get away without "volatile" in the "filename"
case.

Suggested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:49:00 -07:00
Michael Haggerty
707103fdfd lockfile: avoid transitory invalid states
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state.  And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.

This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename.  It
is wrong to assume that this string can be updated atomically; for
example, even

    strcpy(lk->filename, value)

is unsafe.  But the old code was even more reckless; for example,

    strcpy(lk->filename, path);
    if (!(flags & LOCK_NODEREF))
            resolve_symlink(lk->filename, max_path_len);
    strcat(lk->filename, ".lock");

During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile.  If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!

We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.

So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose.  Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:48:59 -07:00
Michael Haggerty
7108ad232f cache.h: define constants LOCK_SUFFIX and LOCK_SUFFIX_LEN
There are a few places that use these values, so define constants for
them.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:45:11 -07:00
Michael Haggerty
e197c21807 unable_to_lock_die(): rename function from unable_to_lock_index_die()
This function is used for other things besides the index, so rename it
accordingly.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:38:38 -07:00
Junio C Hamano
102edda4df Merge branch 'ta/config-add-to-empty-or-true-fix' into maint
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.

* ta/config-add-to-empty-or-true-fix:
  config: avoid a funny sentinel value "a^"
  make config --add behave correctly for empty and NULL values
2014-09-29 22:10:25 -07:00
Junio C Hamano
1c2ea2cdc0 Merge branch 'rs/realloc-array'
Code cleanup.

* rs/realloc-array:
  use REALLOC_ARRAY for changing the allocation size of arrays
  add macro REALLOC_ARRAY
2014-09-26 14:39:45 -07:00
Junio C Hamano
69a5bbbbfa Merge branch 'jk/write-packed-refs-via-stdio'
Optimize the code path to write out the packed-refs file, which
especially matters in a repository with a large number of refs.

* jk/write-packed-refs-via-stdio:
  refs: write packed_refs file using stdio
2014-09-26 14:39:42 -07:00
Junio C Hamano
4daf5c8643 Merge branch 'ta/config-add-to-empty-or-true-fix'
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.

* ta/config-add-to-empty-or-true-fix:
  config: avoid a funny sentinel value "a^"
  make config --add behave correctly for empty and NULL values
2014-09-19 11:38:40 -07:00
Junio C Hamano
9ff700ebac Merge branch 'jk/commit-author-parsing'
Code clean-up.

* jk/commit-author-parsing:
  determine_author_info(): copy getenv output
  determine_author_info(): reuse parsing functions
  date: use strbufs in date-formatting functions
  record_author_date(): use find_commit_header()
  record_author_date(): fix memory leak on malformed commit
  commit: provide a function to find a header in a buffer
2014-09-19 11:38:33 -07:00
Junio C Hamano
ceeacc501b Merge branch 'bb/date-iso-strict'
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable.  A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.

* bb/date-iso-strict:
  pretty: provide a strict ISO 8601 date format
2014-09-19 11:38:32 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Jeff King
c1063be2a3 config: avoid a funny sentinel value "a^"
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 16:33:54 -07:00
Junio C Hamano
7f346e9d73 Merge branch 'ta/config-set-1'
Use the new caching config-set API in git_config() calls.

* ta/config-set-1:
  add tests for `git_config_get_string_const()`
  add a test for semantic errors in config files
  rewrite git_config() to use the config-set API
  config: add `git_die_config()` to the config-set API
  change `git_config()` return value to void
  add line number and file name info to `config_set`
  config.c: fix accuracy of line number in errors
  config.c: mark error and warnings strings for translation
2014-09-11 10:33:25 -07:00
Jeff King
9540ce5030 refs: write packed_refs file using stdio
We write each line of a new packed-refs file individually
using a write() syscall (and sometimes 2, if the ref is
peeled). Since each line is only about 50-100 bytes long,
this creates a lot of system call overhead.

We can instead open a stdio handle around our descriptor and
use fprintf to write to it. The extra buffering is not a
problem for us, because nobody will read our new packed-refs
file until we call commit_lock_file (by which point we have
flushed everything).

On a pathological repository with 8.5 million refs, this
dropped the time to run `git pack-refs` from 20s to 6s.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 10:58:32 -07:00
Junio C Hamano
56f214e071 Merge branch 'ta/config-set'
Add in-core caching layer to let us avoid reading the same
configuration files number of times.

* ta/config-set:
  test-config: add tests for the config_set API
  add `config_set` API for caching config-like files
2014-09-02 13:24:18 -07:00
Junio C Hamano
1d8a6f6929 Merge branch 'mm/config-edit-global'
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any.  This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.

* mm/config-edit-global:
  commit: advertise config --global --edit on guessed identity
  home_config_paths(): let the caller ignore xdg path
  config --global --edit: create a template file if needed
2014-09-02 13:23:20 -07:00
Junio C Hamano
c518279c0e Merge branch 'jc/reopen-lock-file'
There are cases where you lock and open to write a file, close it to
show the updated contents to external processes, and then have to
update the file again while still holding the lock, but the lockfile
API lacked support for such an access pattern.

* jc/reopen-lock-file:
  lockfile: allow reopening a closed but still locked file
2014-09-02 13:20:13 -07:00
Beat Bolli
466fb6742d pretty: provide a strict ISO 8601 date format
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.

The output from "--date=iso" deviates from ISO 8601 in these ways:

  - a space instead of the `T` date/time delimiter
  - a space between time and time zone
  - no colon between hours and minutes of the time zone

Add a strict ISO 8601 date format for displaying committer and
author dates.  Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.

See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.

Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 12:37:02 -07:00
Steffen Prohaska
23b0c4782e config.c: add git_env_ulong() to parse environment variable
The new function parses an integeral value that fits in unsigned
long in human readable form, i.e. possibly with unit suffix, e.g.
10k = 10240, etc., from an environment variable.  Parsing of
GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-28 10:24:54 -07:00
Jeff King
c33ddc2e33 date: use strbufs in date-formatting functions
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:32:56 -07:00
Tanay Abhra
155ef25f12 rewrite git_config() to use the config-set API
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.

Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:41:10 -07:00
Tanay Abhra
5a80e97c82 config: add git_die_config() to the config-set API
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.

It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,

	if (!git_config_get_value(key, &value)){
		if (!strcmp(value, "foo"))
			git_config_die(key, "value: `%s` is illegal", value);
		else
			/* do work */
	}

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:40:25 -07:00
Tanay Abhra
aace438502 change git_config() return value to void
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.

Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.

Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.

Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:40:17 -07:00
Tanay Abhra
3df8fd625f add line number and file name info to config_set
Store file name and line number for each key-value pair in the cache
during parsing of the configuration files.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:38:50 -07:00
Tanay Abhra
3c8687a73e add config_set API for caching config-like files
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.

Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.

Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-29 14:29:56 -07:00
Jeff King
5de7f500c1 alloc: factor out commit index
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-28 10:14:33 -07:00
Matthieu Moy
9830534e40 config --global --edit: create a template file if needed
When the user has no ~/.gitconfig file, git config --global --edit used
to launch an editor on an nonexistant file name.

Instead, create a file with a default content before launching the
editor. The template contains only commented-out entries, to save a few
keystrokes for the user. If the values are guessed properly, the user
will only have to uncomment the entries.

Advanced users teaching newbies can create a minimalistic configuration
faster for newbies. Beginners reading a tutorial advising to run "git
config --global --edit" as a first step will be slightly more guided for
their first contact with Git.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-25 12:23:06 -07:00
Junio C Hamano
10b944b37b Merge branch 'jk/alloc-commit-id'
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.

* jk/alloc-commit-id:
  diff-tree: avoid lookup_unknown_object
  object_as_type: set commit index
  alloc: factor out commit index
  add object_as_type helper for casting objects
  parse_object_buffer: do not set object type
  move setting of object->type to alloc_* functions
  alloc: write out allocator definitions
  alloc.c: remove the alloc_raw_commit_node() function
2014-07-22 10:59:25 -07:00
Junio C Hamano
9f2de9c121 Merge branch 'kb/perf-trace'
* kb/perf-trace:
  api-trace.txt: add trace API documentation
  progress: simplify performance measurement by using getnanotime()
  wt-status: simplify performance measurement by using getnanotime()
  git: add performance tracing for git's main() function to debug scripts
  trace: add trace_performance facility to debug performance issues
  trace: add high resolution timer function to debug performance issues
  trace: add 'file:line' to all trace output
  trace: move code around, in preparation to file:line output
  trace: add current timestamp to all trace output
  trace: disable additional trace output for unit tests
  trace: add infrastructure to augment trace output with additional info
  sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
  Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
  trace: improve trace performance
  trace: remove redundant printf format attribute
  trace: consistently name the format parameter
  trace: move trace declarations from cache.h to new trace.h
2014-07-22 10:59:19 -07:00
Junio C Hamano
19a249ba83 Merge branch 'rs/ref-transaction-0'
Early part of the "ref transaction" topic.

* rs/ref-transaction-0:
  refs.c: change ref_transaction_update() to do error checking and return status
  refs.c: remove the onerr argument to ref_transaction_commit
  update-ref: use err argument to get error from ref_transaction_commit
  refs.c: make update_ref_write update a strbuf on failure
  refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
  refs.c: log_ref_write should try to return meaningful errno
  refs.c: make resolve_ref_unsafe set errno to something meaningful on error
  refs.c: commit_packed_refs to return a meaningful errno on failure
  refs.c: make remove_empty_directories always set errno to something sane
  refs.c: verify_lock should set errno to something meaningful
  refs.c: make sure log_ref_setup returns a meaningful errno
  refs.c: add an err argument to repack_without_refs
  lockfile.c: make lock_file return a meaningful errno on failurei
  lockfile.c: add a new public function unable_to_lock_message
  refs.c: add a strbuf argument to ref_transaction_commit for error logging
  refs.c: allow passing NULL to ref_transaction_free
  refs.c: constify the sha arguments for ref_transaction_create|delete|update
  refs.c: ref_transaction_commit should not free the transaction
  refs.c: remove ref_transaction_rollback
2014-07-21 11:18:37 -07:00
Junio C Hamano
c9831bb09d Merge branch 'kb/path-max-must-go'
* kb/path-max-must-go:
  cache.h: rename cache_def_free to cache_def_clear
2014-07-16 11:32:33 -07:00
Junio C Hamano
788cef81d4 Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
2014-07-16 11:25:40 -07:00
Junio C Hamano
93dcaea226 lockfile: allow reopening a closed but still locked file
In some code paths (e.g. giving "add -i" to prepare the contents to
be committed interactively inside "commit -p") where a caller takes
a lock, writes the new content, give chance for others to use it
while still holding the lock, and then releases the lock when all is
done.  As an extension, allow the caller to re-update an already
closed file while still holding the lock (i.e. not yet committed) by
re-opening the file, to be followed by updating the contents and
then by the usual close_lock_file() or commit_lock_file().

This is necessary if we want to add code to rebuild the cache-tree
and write the resulting index out after "add -i" returns the control
to "commit -p", for example.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-14 13:05:37 -07:00
Ronnie Sahlberg
76d70dc0c6 refs.c: make resolve_ref_unsafe set errno to something meaningful on error
Making errno when returning from resolve_ref_unsafe() meaningful,
which should fix

 * a bug in lock_ref_sha1_basic, where it assumes EISDIR
   means it failed due to a directory being in the way

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
2014-07-14 11:54:42 -07:00
Ronnie Sahlberg
6af926e8bc lockfile.c: add a new public function unable_to_lock_message
Introducing a new unable_to_lock_message helper, which has nicer
semantics than unable_to_lock_error and cleans up lockfile.c a little.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
2014-07-14 11:54:40 -07:00
Jeff King
94d5a22cf6 alloc: factor out commit index
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Karsten Blees
2a60839150 cache.h: rename cache_def_free to cache_def_clear
Rename cache_def_free to cache_def_clear as it doesn't free the struct
cache_def, but just clears its content.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 10:12:37 -07:00
Junio C Hamano
11def366e5 Merge branch 'kb/path-max-must-go'
* kb/path-max-must-go:
  symlinks: remove PATH_MAX limitation
2014-07-10 11:27:47 -07:00
Karsten Blees
e7c7305300 symlinks: remove PATH_MAX limitation
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.

Remove the limitation by using strbuf instead.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 11:22:42 -07:00
Junio C Hamano
1881d2b88c Merge branch 'ym/fix-opportunistic-index-update-race' into maint
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running.  Detect such a race and avoid overwriting the index.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-25 11:49:48 -07:00
Jeremiah Mahler
ccdd4a0f3c cleanup duplicate name_compare() functions
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.

To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically.  In addition, the cache_name_compare() function in
read-cache.c is nearly identical.  The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.

Unify these three functions by using the implementation from
cache_name_compare().  This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:12:14 -07:00
Karsten Blees
5991a55c54 trace: move trace declarations from cache.h to new trace.h
Also include direct dependencies (strbuf.h and git-compat-util.h for
__attribute__) so that trace.h can be used independently of cache.h, e.g.
in test programs.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-17 09:37:47 -07:00
Junio C Hamano
b83163643b Merge branch 'sk/windows-unc-path'
* sk/windows-unc-path:
  Windows: allow using UNC path for git repository
2014-06-16 10:07:03 -07:00
Nguyễn Thái Ngọc Duy
3e52f70b15 t1700: new tests for split-index mode
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:42 -07:00
Nguyễn Thái Ngọc Duy
c18b80a0e8 update-index: new options to enable/disable split index mode
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
b3c96fb158 split-index: strip pathname of on-disk replaced entries
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
ce7c614bce split-index: do not invalidate cache-tree at read time
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
078a58e825 read-cache: mark updated entries for split index
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
5fc2fc8fa2 read-cache: split-index mode
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)

This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.

Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.

The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
e93021b20a read-cache: save index SHA-1 after reading
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
d4a2024aef entry.c: update cache_changed if refresh_cache is set in checkout_entry()
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
6c306a34ee resolve-undo: be specific what part of the index has changed
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
e636a7b4d0 read-cache: be specific what part of the index has changed
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
ce51bf09f8 read-cache: store in-memory flags in the first 12 bits of ce_flags
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
626f35c893 read-cache: relocate and unexport commit_locked_index()
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
03b8664772 read-cache: new API write_locked_index instead of write_index/write_cache
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:10 -07:00
Cezary Zawadka
c2369bdf7f Windows: allow using UNC path for git repository
[efl: moved MinGW-specific part to compat/]
[jes: fixed compilation on non-Windows]

Eric Sunshine fixed mingw_offset_1st_component() to return
consistently "foo" for UNC "//machine/share/foo", cf

http://groups.google.com/group/msysgit/browse_thread/thread/c0af578549b5dda0

Author: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Cezary Zawadka <czawadka@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 13:30:04 -07:00
Junio C Hamano
1e2600dd6a Merge branch 'nd/status-auto-comment-char'
* nd/status-auto-comment-char:
  commit: allow core.commentChar=auto for character auto selection
  config: be strict on core.commentChar
2014-06-06 11:36:10 -07:00
Junio C Hamano
610a14f643 Merge branch 'jk/squelch-compiler-warning-from-funny-error-macro'
* jk/squelch-compiler-warning-from-funny-error-macro:
  let clang use the constant-return error() macro
  inline constant return from error() function
2014-06-06 11:21:36 -07:00
Junio C Hamano
e1857af923 Merge branch 'jk/commit-date-approxidate'
* jk/commit-date-approxidate:
  commit: accept more date formats for "--date"
  commit: print "Date" line when the user has set date
  pretty: make show_ident_date public
  commit: use split_ident_line to compare author/committer
2014-06-03 12:06:46 -07:00
Junio C Hamano
9af098c29b Merge branch 'ym/fix-opportunistic-index-update-race'
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.

Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath.  While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file.  So in that sense that
can be done as a less urgent follow-up for this topic.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-03 12:06:41 -07:00
Junio C Hamano
8eaf517835 Merge branch 'ks/tree-diff-nway'
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel.  These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such.  And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.

* ks/tree-diff-nway:
  mingw: activate alloca
  combine-diff: speed it up, by using multiparent diff tree-walker directly
  tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
  Portable alloca for Git
  tree-diff: reuse base str(buf) memory on sub-tree recursion
  tree-diff: no need to call "full" diff_tree_sha1 from show_path()
  tree-diff: rework diff_tree interface to be sha1 based
  tree-diff: diff_tree() should now be static
  tree-diff: remove special-case diff-emitting code for empty-tree cases
  tree-diff: simplify tree_entry_pathcmp
  tree-diff: show_path prototype is not needed anymore
  tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
  tree-diff: move all action-taking code out of compare_tree_entry()
  tree-diff: don't assume compare_tree_entry() returns -1,0,1
  tree-diff: consolidate code for emitting diffs and recursion in one place
  tree-diff: show_tree() is not needed
  tree-diff: no need to pass match to skip_uninteresting()
  tree-diff: no need to manually verify that there is no mode change for a path
  combine-diff: move changed-paths scanning logic into its own function
  combine-diff: move show_log_first logic/action out of paths scanning
2014-06-03 12:06:40 -07:00
Nguyễn Thái Ngọc Duy
84c9dc2c5a commit: allow core.commentChar=auto for character auto selection
When core.commentChar is "auto", the comment char starts with '#' as
in default but if it's already in the prepared message, find another
char in a small subset. This should stop surprises because git strips
some lines unexpectedly.

Note that git is not smart enough to recognize '#' as the comment char
in custom templates and convert it if the final comment char is
different. It thinks '#' lines in custom templates as part of the
commit message. So don't use this with custom templates.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-19 13:37:25 -07:00
Jeff King
ff0a80af72 let clang use the constant-return error() macro
Commit e208f9c converted error() into a macro to make its
constant return value more apparent to calling code.  Commit
5ded807 prevents us using this macro with clang, since
clang's -Wunused-value is smart enough to realize that the
constant "-1" is useless in some contexts.

However, since the last commit puts the constant behind an
inline function call, this is enough to prevent the
-Wunused-value warning on both modern gcc and clang. So we
can now re-enable the macro when compiling with clang.

Tested with clang 3.3, 3.4, and 3.5.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-06 15:30:40 -07:00
Jeff King
87fe5df365 inline constant return from error() function
Commit e208f9c introduced a macro to turn error() calls
into:

  (error(), -1)

to make the constant return value more visible to the
calling code (and thus let the compiler make better
decisions about the code).

This works well for code like:

  return error(...);

but the "-1" is superfluous in code that just calls error()
without caring about the return value. In older versions of
gcc, that was fine, but gcc 4.9 complains with -Wunused-value.

We can work around this by encapsulating the constant return
value in a static inline function, as gcc specifically
avoids complaining about unused function returns unless the
function has been specifically marked with the
warn_unused_result attribute.

We also use the same trick for config_error_nonbool and
opterror, which learned the same error technique in a469a10.

Reported-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-06 15:30:38 -07:00
Jeff King
d105324655 pretty: make show_ident_date public
We use this function internally to format "Date" lines in
commit logs, but other parts of the code will want it, too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-02 14:13:00 -07:00
Yiannis Marangos
426ddeead6 read-cache.c: verify index file before we opportunistically update it
Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.

In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10 12:27:58 -07:00
Kirill Smelkov
72441af7c4 tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(T,P1...Pn) calculation scheme
-------------------------------

D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)

    D(T,Pj)		- diff between T..Pj
    D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn

We start from all trees, which are sorted, and compare their entries in
lock-step:

     T     P1       Pn
     -     -        -
    |t|   |p1|     |pn|
    |-|   |--| ... |--|      imin = argmin(p1...pn)
    | |   |  |     |  |
    |-|   |--|     |--|
    |.|   |. |     |. |
     .     .        .
     .     .        .

at any time there could be 3 cases:

    1)  t < p[imin];
    2)  t > p[imin];
    3)  t = p[imin].

Schematic deduction of what every case means, and what to do, follows:

1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓

2)  t > p[imin]

    2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
    2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓

3)  t = p[imin]

    3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
    3.2) pi = p[imin]  ->  investigate δ(t,pi)
     |
     |
     v

    3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->

                      ⎧δ(t,pi)  - if pi=p[imin]
             ->  D += ⎨
                      ⎩"+t"     - if pi>p[imin]

    in any case t↓  ∀ pi=p[imin]  pi↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-------------------------

    A     B
    -     -
   |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
   |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
   | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
   |-|   |-|
   |.|   |.|
    .     .
    .     .

~~~~~~~~

This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff

D(A,B)

is by definition the same as combined diff

D(A,[B]),

so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.

What we do is as follows:

1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
   a paths generator (new name diff_tree_paths()), with each generated path
   being `struct combine_diff_path` with info for path, new sha1,mode and for
   every parent which sha1,mode it was in it.

2) From that info, we can still generate usual diff queue with
   struct diff_filepairs, via "exporting" generated
   combine_diff_path, if we know we run for nparent=1 case.
   (see emit_diff() which is now named emit_diff_first_parent_only())

3) In order for diff_can_quit_early(), which checks

       DIFF_OPT_TST(opt, HAS_CHANGES))

   to work, that exporting have to be happening not in bulk, but
   incrementally, one diff path at a time.

   For such consumers, there is a new callback in diff_options
   introduced:

       ->pathchange(opt, struct combine_diff_path *)

   which, if set to !NULL, is called for every generated path.

   (see new compat ll_diff_tree_sha1() wrapper around new paths
    generator for setup)

4) The paths generation itself, is reworked from previous
   ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
   scheme" provided above:

   On the start we allocate [nparent] arrays in place what was
   earlier just for one parent tree.

   then we just generalize loops, and comparison according to the
   algorithm.

Some notes(*):

1) alloca(), for small arrays, is used for "runs not slower for
   nparent=1 case than before" goal - if we change it to xmalloc()/free()
   the timings get ~1% worse. For alloca() we use just-introduced
   xalloca/xalloca_free compatibility wrappers, so it should not be a
   portability problem.

2) For every parent tree, we need to keep a tag, whether entry from that
   parent equals to entry from minimal parent. For performance reasons I'm
   keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
   Not doing so, we'd need to alloca another [nparent] array, which hurts
   performance.

3) For emitted paths, memory could be reused, if we know the path was
   processed via callback and will not be needed later. We use efficient
   hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
   of potential additional slowdown.

4) goto(s) are used in several places, as the code executes a little bit
   faster with lowered register pressure.

Also

- we should now check for FIND_COPIES_HARDER not only when two entries
  names are the same, and their hashes are equal, but also for a case,
  when a path was removed from some of all parents having it.

  The reason is, if we don't, that path won't be emitted at all (see
  "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
  all paths - with diff or without - to be emitted, to be later analyzed
  for being copies sources.

  The new check is only necessary for nparent >1, as for nparent=1 case
  xmin_eqtotal always =1 =nparent, and a path is always added to diff as
  removal.

~~~~~~~~

Timings for

    # without -c, i.e. testing only nparent=1 case
    `git log --raw --no-abbrev --no-renames`

before and after the patch are as follows:

                navy.git        linux.git v3.10..v3.11

    before      0.611s          1.889s
    after       0.619s          1.907s
    slowdown    1.3%            0.9%

This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.

The output also stayed the same.

(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
    we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
   "do no harm for nparent=1 case" rule.

For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.

P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with

    `git log --reverse --raw --no-abbrev --no-renames -c`

to extract log of what was created/changed when, as a result building a
map

    {}  sha1    ->  in which commit (and date) a content was added

that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.

That case was my initial motivation for combined diffs speedup.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-07 14:40:46 -07:00
Junio C Hamano
b6de0c633e Merge branch 'nd/tag-version-sort'
Allow v1.9.0 sorted before v1.10.0 in "git tag --list" output.

* nd/tag-version-sort:
  tag: support --sort=<spec>
2014-03-21 12:47:39 -07:00
Junio C Hamano
8aac6c97e8 Merge branch 'jk/commit-dates-parsing-fix' into maint
Codepaths that parse timestamps in commit objects have been
tightened.

* jk/commit-dates-parsing-fix:
  show_ident_date: fix tz range check
  log: do not segfault on gmtime errors
  log: handle integer overflow in timestamps
  date: check date overflow against time_t
  fsck: report integer overflow in author timestamps
  t4212: test bogus timestamps with git-log
2014-03-18 14:04:01 -07:00
Junio C Hamano
6d011b8e3f Merge branch 'bk/refresh-missing-ok-in-merge-recursive' into maint
"merge-recursive" was broken in 1.7.7 era and stopped working in an
empty (temporary) working tree, when there are renames involved.
This has been corrected.

* bk/refresh-missing-ok-in-merge-recursive:
  merge-recursive.c: tolerate missing files while refreshing index
  read-cache.c: extend make_cache_entry refresh flag with options
  read-cache.c: refactor --ignore-missing implementation
  t3030-merge-recursive: test known breakage with empty work tree
2014-03-18 14:02:38 -07:00
Junio C Hamano
3e30cb0fbf Merge branch 'mh/replace-refs-variable-rename'
* mh/replace-refs-variable-rename:
  Document some functions defined in object.c
  Add docstrings for lookup_replace_object() and do_lookup_replace_object()
  rename read_replace_refs to check_replace_refs
2014-03-14 14:27:06 -07:00
Junio C Hamano
060be00621 Merge branch 'mh/object-code-cleanup'
* mh/object-code-cleanup:
  sha1_file.c: document a bunch of functions defined in the file
  sha1_file_name(): declare to return a const string
  find_pack_entry(): document last_found_pack
  replace_object: use struct members instead of an array
2014-03-14 14:26:29 -07:00
Junio C Hamano
3c83b080e4 Merge branch 'jk/commit-dates-parsing-fix'
Tighten codepaths that parse timestamps in commit objects.

* jk/commit-dates-parsing-fix:
  show_ident_date: fix tz range check
  log: do not segfault on gmtime errors
  log: handle integer overflow in timestamps
  date: check date overflow against time_t
  fsck: report integer overflow in author timestamps
  t4212: test bogus timestamps with git-log
2014-03-14 14:25:44 -07:00
Junio C Hamano
08f36302b5 Merge branch 'ks/config-file-stdin'
"git config" learned to read from the standard input when "-" is
given as the value to its "--file" parameter (attempting an
operation to update the configuration in the standard input of
course is rejected).

* ks/config-file-stdin:
  config: teach "git config --file -" to read from the standard input
  config: change git_config_with_options() interface
  builtin/config.c: rename check_blob_write() -> check_write()
  config: disallow relative include paths from blobs
2014-03-14 14:24:40 -07:00
Junio C Hamano
053a6b1807 Merge branch 'jn/add-2.0-u-A-sans-pathspec'
"git add -u" and "git add -A" without any pathspec is a tree-wide
operation now, even when they are run in a subdirectory of the
working tree.
2014-03-07 15:14:02 -08:00
Junio C Hamano
4c4ac4db2c Merge branch 'nd/daemonize-gc'
Allow running "gc --auto" in the background.

* nd/daemonize-gc:
  gc: config option for running --auto in background
  daemon: move daemonize() to libgit.a
2014-03-05 15:06:39 -08:00