Commit Graph

1372 Commits

Author SHA1 Message Date
Junio C Hamano
3bef3c12d6 Merge branch 'jk/decimal-width-for-uintmax' into maint
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.

* jk/decimal-width-for-uintmax:
  decimal_width: avoid integer overflow
2015-03-05 13:12:59 -08:00
Junio C Hamano
fd9de868c3 Merge branch 'mh/refs-have-new'
Simplify the ref transaction API around how "the ref should be
pointing at this object" is specified.

* mh/refs-have-new:
  refs.h: remove duplication in function docstrings
  update_ref(): improve documentation
  ref_transaction_verify(): new function to check a reference's value
  ref_transaction_delete(): check that old_sha1 is not null_sha1
  ref_transaction_create(): check that new_sha1 is valid
  commit: avoid race when creating orphan commits
  commit: add tests of commit races
  ref_transaction_delete(): remove "have_old" parameter
  ref_transaction_update(): remove "have_old" parameter
  struct ref_update: move "have_old" into "flags"
  refs.c: change some "flags" to "unsigned int"
  refs: remove the gap in the REF_* constant values
  refs: move REF_DELETING to refs.c
2015-03-05 12:45:39 -08:00
Junio C Hamano
7cf6232e2c Merge branch 'jk/prune-mtime'
In v2.2.0, we broke "git prune" that runs in a repository that
borrows from an alternate object store.

* jk/prune-mtime:
  sha1_file: fix iterating loose alternate objects
  for_each_loose_file_in_objdir: take an optional strbuf path
2015-02-22 12:28:28 -08:00
Junio C Hamano
ca00db08da Merge branch 'jk/decimal-width-for-uintmax'
We didn't format an integer that wouldn't fit in "int" but in
"uintmax_t" correctly.

* jk/decimal-width-for-uintmax:
  decimal_width: avoid integer overflow
2015-02-18 11:45:02 -08:00
Michael Haggerty
fec14ec38c refs.c: change some "flags" to "unsigned int"
Change the following functions' "flags" arguments from "int" to
"unsigned int":

 * ref_transaction_update()
 * ref_transaction_create()
 * ref_transaction_delete()
 * update_ref()
 * delete_ref()
 * lock_ref_sha1_basic()

Also change the "flags" member in "struct ref_update" to unsigned.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-17 11:22:29 -08:00
Jeff King
e6f875e052 for_each_loose_file_in_objdir: take an optional strbuf path
We feed a root "objdir" path to this iterator function,
which then copies the result into a strbuf, so that it can
repeatedly append the object sub-directories to it. Let's
make it easy for callers to just pass us a strbuf in the
first place.

We leave the original interface as a convenience for callers
who want to just pass a const string like the result of
get_object_directory().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-09 14:14:53 -08:00
Jeff King
d306f3d351 decimal_width: avoid integer overflow
The decimal_width function originally appeared in blame.c as
"lineno_width", and was designed for calculating the
print-width of small-ish integer values (line numbers in
text files). In ec7ff5b, it was made into a reusable
function, and in dc801e7, we started using it to align
diffstats.

Binary files in a diffstat show byte counts rather than line
numbers, meaning they can be quite large (e.g., consider
adding or removing a 2GB file). decimal_width is not up to
the challenge for two reasons:

  1. It takes the value as an "int", whereas large files may
     easily surpass this. The value may be truncated, in
     which case we will produce an incorrect value.

  2. It counts "up" by repeatedly multiplying another
     integer by 10 until it surpasses the value.  This can
     cause an infinite loop when the value is close to the
     largest representable integer.

     For example, consider using a 32-bit signed integer,
     and a value of 2,140,000,000 (just shy of 2^31-1).
     We will count up and eventually see that 1,000,000,000
     is smaller than our value. The next step would be to
     multiply by 10 and see that 10,000,000,000 is too
     large, ending the loop. But we can't represent that
     value, and we have signed overflow.

     This is technically undefined behavior, but a common
     behavior is to lose the high bits, in which case our
     iterator will certainly be less than the number. So
     we'll keep multiplying, overflow again, and so on.

This patch changes the argument to a uintmax_t (the same
type we use to store the diffstat information for binary
filese), and counts "down" by repeatedly dividing our value
by 10.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-02-05 12:38:35 -08:00
Junio C Hamano
2f17ecbd8d Merge branch 'dm/compat-s-ifmt-for-zos'
Long overdue departure from the assumption that S_IFMT is shared by
everybody made in 2005.

* dm/compat-s-ifmt-for-zos:
  compat: convert modes to use portable file type values
2014-12-22 12:27:16 -08:00
Junio C Hamano
77933f4449 Sync with v2.1.4
* maint-2.1:
  Git 2.1.4
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:46:57 -08:00
Junio C Hamano
58f1d950e3 Sync with v2.0.5
* maint-2.0:
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:42:28 -08:00
Junio C Hamano
5e519fb8b0 Sync with v1.9.5
* maint-1.9:
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:28:54 -08:00
Junio C Hamano
6898b79721 Sync with v1.8.5.6
* maint-1.8.5:
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:20:31 -08:00
Johannes Schindelin
2b4c6efc82 read-cache: optionally disallow NTFS .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for NTFS
and FAT32; let's use it in verify_path().

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on NTFS nor FAT32.
     In practice this probably doesn't matter, though, as
     the restricted names are rather obscure and almost
     certainly would never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectNTFS config
option. Though this is expected to be most useful on Windows,
we allow it to be set everywhere, as NTFS may be mounted on
other platforms. The variable does default to on for Windows,
though.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Johannes Schindelin
1d1d69bc52 path: add is_ntfs_dotgit() helper
We do not allow paths with a ".git" component to be added to
the index, as that would mean repository contents could
overwrite our repository files. However, asking "is this
path the same as .git" is not as simple as strcmp() on some
filesystems.

On NTFS (and FAT32), there exist so-called "short names" for
backwards-compatibility: 8.3 compliant names that refer to the same files
as their long names. As ".git" is not an 8.3 compliant name, a short name
is generated automatically, typically "git~1".

Depending on the Windows version, any combination of trailing spaces and
periods are ignored, too, so that both "git~1." and ".git." still refer
to the Git directory. The reason is that 8.3 stores file names shorter
than 8 characters with trailing spaces. So literally, it does not matter
for the short name whether it is padded with spaces or whether it is
shorter than 8 characters, it is considered to be the exact same.

The period is the separator between file name and file extension, and
again, an empty extension consists just of spaces in 8.3 format. So
technically, we would need only take care of the equivalent of this
regex:
        (\.git {0,4}|git~1 {0,3})\. {0,3}

However, there are indications that at least some Windows versions might
be more lenient and accept arbitrary combinations of trailing spaces and
periods and strip them out. So we're playing it real safe here. Besides,
there can be little doubt about the intention behind using file names
matching even the more lenient pattern specified above, therefore we
should be fine with disallowing such patterns.

Extra care is taken to catch names such as '.\\.git\\booh' because the
backslash is marked as a directory separator only on Windows, and we want
to use this new helper function also in fsck on other platforms.

A big thank you goes to Ed Thomson and an unnamed Microsoft engineer for
the detailed analysis performed to come up with the corresponding fixes
for libgit2.

This commit adds a function to detect whether a given file name can refer
to the Git directory by mistake.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King
a42643aa8d read-cache: optionally disallow HFS+ .git variants
The point of disallowing ".git" in the index is that we
would never want to accidentally overwrite files in the
repository directory. But this means we need to respect the
filesystem's idea of when two paths are equal. The prior
commit added a helper to make such a comparison for HFS+;
let's use it in verify_path.

We make this check optional for two reasons:

  1. It restricts the set of allowable filenames, which is
     unnecessary for people who are not on HFS+. In practice
     this probably doesn't matter, though, as the restricted
     names are rather obscure and almost certainly would
     never come up in practice.

  2. It has a minor performance penalty for every path we
     insert into the index.

This patch ties the check to the core.protectHFS config
option. Though this is expected to be most useful on OS X,
we allow it to be set everywhere, as HFS+ may be mounted on
other platforms. The variable does default to on for OS X,
though.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:44 -08:00
David Michael
d543d9c0f4 compat: convert modes to use portable file type values
This adds simple wrapper functions around calls to stat(), fstat(),
and lstat() that translate the operating system's native file type
bits to those used by most operating systems.  It also rewrites the
S_IF* macros to the common values, so all file type processing is
performed using the translated modes.  This makes projects portable
across operating systems that use different file type definitions.

Only the file type bits may be affected by these compatibility
functions; the file permission bits are assumed to be 07777 and are
passed through unchanged.

Signed-off-by: David Michael <fedora.dm0@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-04 11:58:36 -08:00
Nguyễn Thái Ngọc Duy
77a6d84045 count-objects: report unused files in $GIT_DIR/worktrees/...
In linked checkouts, borrowed parts like config is taken from
$GIT_COMMON_DIR. $GIT_DIR/config is never used. Report them as
garbage.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy
316e53e68c wrapper.c: wrapper to open a file, fprintf then close
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:16 -08:00
Nguyễn Thái Ngọc Duy
31e26ebcb5 setup.c: support multi-checkout repo setup
The repo setup procedure is updated to detect $GIT_DIR/commondir and
set $GIT_COMMON_DIR properly.

The core.worktree is ignored when $GIT_COMMON_DIR is set. This is
because the config file is shared in multi-checkout setup, but
checkout directories _are_ different. Making core.worktree effective
in all checkouts mean it's back to a single checkout.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:15 -08:00
Nguyễn Thái Ngọc Duy
c7b3a3d2fe $GIT_COMMON_DIR: a new environment variable
This variable is intended to support multiple working directories
attached to a repository. Such a repository may have a main working
directory, created by either "git init" or "git clone" and one or more
linked working directories. These working directories and the main
repository share the same repository directory.

In linked working directories, $GIT_COMMON_DIR must be defined to point
to the real repository directory and $GIT_DIR points to an unused
subdirectory inside $GIT_COMMON_DIR. File locations inside the
repository are reorganized from the linked worktree view point:

 - worktree-specific such as HEAD, logs/HEAD, index, other top-level
   refs and unrecognized files are from $GIT_DIR.

 - the rest like objects, refs, info, hooks, packed-refs, shallow...
   are from $GIT_COMMON_DIR (except info/sparse-checkout, but that's
   a separate patch)

Scripts are supposed to retrieve paths in $GIT_DIR with "git rev-parse
--git-path", which will take care of "$GIT_DIR vs $GIT_COMMON_DIR"
business.

The redirection is done by git_path(), git_pathdup() and
strbuf_git_path(). The selected list of paths goes to $GIT_COMMON_DIR,
not the other way around in case a developer adds a new
worktree-specific file and it's accidentally promoted to be shared
across repositories (this includes unknown files added by third party
commands)

The list of known files that belong to $GIT_DIR are:

ADD_EDIT.patch BISECT_ANCESTORS_OK BISECT_EXPECTED_REV BISECT_LOG
BISECT_NAMES CHERRY_PICK_HEAD COMMIT_MSG FETCH_HEAD HEAD MERGE_HEAD
MERGE_MODE MERGE_RR NOTES_EDITMSG NOTES_MERGE_WORKTREE ORIG_HEAD
REVERT_HEAD SQUASH_MSG TAG_EDITMSG fast_import_crash_* logs/HEAD
next-index-* rebase-apply rebase-merge rsync-refs-* sequencer/*
shallow_*

Path mapping is NOT done for git_path_submodule(). Multi-checkouts are
not supported as submodules.

Helped-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:13 -08:00
Nguyễn Thái Ngọc Duy
557bd833bb git_path(): be aware of file relocation in $GIT_DIR
We allow the user to relocate certain paths out of $GIT_DIR via
environment variables, e.g. GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE and
GIT_GRAFT_FILE. Callers are not supposed to use git_path() or
git_pathdup() to get those paths. Instead they must use
get_object_directory(), get_index_file() and get_graft_file()
respectively. This is inconvenient and could be missed in review (for
example, there's git_path("objects/info/alternates") somewhere in
sha1_file.c).

This patch makes git_path() and git_pathdup() understand those
environment variables. So if you set GIT_OBJECT_DIRECTORY to /foo/bar,
git_path("objects/abc") should return /foo/bar/abc. The same is done
for the two remaining env variables.

"git rev-parse --git-path" is the wrapper for script use.

This patch kinda reverts a0279e1 (setup_git_env: use git_pathdup
instead of xmalloc + sprintf - 2014-06-19) because using git_pathdup
here would result in infinite recursion:

  setup_git_env() -> git_pathdup("objects") -> .. -> adjust_git_path()
  -> get_object_directory() -> oops, git_object_directory is NOT set
  yet -> setup_git_env()

I wanted to make git_pathdup_literal() that skips adjust_git_path().
But that won't work because later on when $GIT_COMMON_DIR is
introduced, git_pathdup_literal("objects") needs adjust_git_path() to
replace $GIT_DIR with $GIT_COMMON_DIR.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
1a83c240f2 git_snpath(): retire and replace with strbuf_git_path()
In the previous patch, git_snpath() is modified to allocate a new
strbuf buffer because vsnpath() needs that. But that makes it
awkward because git_snpath() receives a pre-allocated buffer from
outside and has to copy data back. Rename it to strbuf_git_path()
and make it receive strbuf directly.

Using git_path() in update_refs_for_switch() which used to call
git_snpath() is safe because that function and all of its callers do
not keep any pointer to the round-robin buffer pool allocated by
get_pathname().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Junio C Hamano
d70e331c0e Merge branch 'jk/prune-mtime'
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.

* jk/prune-mtime: (28 commits)
  drop add_object_array_with_mode
  revision: remove definition of unused 'add_object' function
  pack-objects: double-check options before discarding objects
  repack: pack objects mentioned by the index
  pack-objects: use argv_array
  reachable: use revision machinery's --indexed-objects code
  rev-list: add --indexed-objects option
  rev-list: document --reflog option
  t5516: test pushing a tag of an otherwise unreferenced blob
  traverse_commit_list: support pending blobs/trees with paths
  make add_object_array_with_context interface more sane
  write_sha1_file: freshen existing objects
  pack-objects: match prune logic for discarding objects
  pack-objects: refactor unpack-unreachable expiration check
  prune: keep objects reachable from recent objects
  sha1_file: add for_each iterators for loose and packed objects
  count-objects: use for_each_loose_file_in_objdir
  count-objects: do not use xsize_t when counting object size
  prune-packed: use for_each_loose_file_in_objdir
  reachable: mark index blobs as SEEN
  ...
2014-10-29 10:07:56 -07:00
Jeff King
660c889e46 sha1_file: add for_each iterators for loose and packed objects
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00
Jeff King
fe1b22686f foreach_alt_odb: propagate return value from callback
We check the return value of the callback and stop iterating
if it is non-zero. However, we do not make the non-zero
return value available to the caller, so they have no way of
knowing whether the operation succeeded or not (technically
they can keep their own error flag in the callback data, but
that is unlike our other for_each functions).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:35 -07:00
Ronnie Sahlberg
d0f810f0bc refs.c: allow listing and deleting badly named refs
We currently do not handle badly named refs well:

  $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
  $ git branch
    fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
  $ git branch -D master.....@\*@\\.
    error: branch 'master.....@*@\.' not found.

Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.

So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them.  Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.

Details:

In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").

In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.

Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.

Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially.  For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.

In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:26 -07:00
Jonathan Nieder
62a2d52514 branch -d: avoid repeated symref resolution
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":

 $ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
 $ git branch -d nonsense
 error: branch 'nonsense' not found.

Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:

 $ git update-ref -d refs/heads/nonsense
 error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links

Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.

Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:25 -07:00
Ronnie Sahlberg
7695d118e5 refs.c: change resolve_ref_unsafe reading argument to be a flags field
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading).  Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.

While at it, swap two of the arguments in the function to put output
arguments at the end.  As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.

Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:24 -07:00
Junio C Hamano
bd107e1052 Merge branch 'mh/lockfile'
The lockfile API and its users have been cleaned up.

* mh/lockfile: (38 commits)
  lockfile.h: extract new header file for the functions in lockfile.c
  hold_locked_index(): move from lockfile.c to read-cache.c
  hold_lock_file_for_append(): restore errno before returning
  get_locked_file_path(): new function
  lockfile.c: rename static functions
  lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
  commit_lock_file_to(): refactor a helper out of commit_lock_file()
  trim_last_path_component(): replace last_path_elm()
  resolve_symlink(): take a strbuf parameter
  resolve_symlink(): use a strbuf for internal scratch space
  lockfile: change lock_file::filename into a strbuf
  commit_lock_file(): use a strbuf to manage temporary space
  try_merge_strategy(): use a statically-allocated lock_file object
  try_merge_strategy(): remove redundant lock_file allocation
  struct lock_file: declare some fields volatile
  lockfile: avoid transitory invalid states
  git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
  dump_marks(): remove a redundant call to rollback_lock_file()
  api-lockfile: document edge cases
  commit_lock_file(): rollback lock file on failure to rename
  ...
2014-10-14 10:49:45 -07:00
Junio C Hamano
f0d8900175 Merge branch 'sp/stream-clean-filter'
When running a required clean filter, we do not have to mmap the
original before feeding the filter.  Instead, stream the file
contents directly to the filter and process its output.

* sp/stream-clean-filter:
  sha1_file: don't convert off_t to size_t too early to avoid potential die()
  convert: stream from fd to required clean filter to reduce used address space
  copy_fd(): do not close the input file descriptor
  mmap_limit: introduce GIT_MMAP_LIMIT to allow testing expected mmap size
  memory_limit: use git_env_ulong() to parse GIT_ALLOC_LIMIT
  config.c: add git_env_ulong() to parse environment variable
  convert: drop arguments other than 'path' from would_convert_to_git()
2014-10-08 13:05:32 -07:00
Michael Haggerty
697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Michael Haggerty
ec38b4e482 get_locked_file_path(): new function
Add a function to return the path of the file that is locked by a
lock_file object. This reduces the knowledge that callers have to have
about the lock_file layout.

Suggested-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:53:54 -07:00
Michael Haggerty
47ba4662bf lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
This makes it harder to misread the name as LOCK_NODE_REF.

Suggested-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:53:28 -07:00
Michael Haggerty
751bacedaa commit_lock_file_to(): refactor a helper out of commit_lock_file()
commit_locked_index(), when writing to an alternate index file,
duplicates (poorly) the code in commit_lock_file(). And anyway, it
shouldn't have to know so much about the internal workings of lockfile
objects. So extract a new function commit_lock_file_to() that does the
work common to the two functions, and call it from both
commit_lock_file() and commit_locked_index().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:52:06 -07:00
Michael Haggerty
cf6950d3bf lockfile: change lock_file::filename into a strbuf
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value.  (That will be fixed in a moment.)

Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file.  But lock_file objects are often reused.  By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.

Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:50:01 -07:00
Michael Haggerty
2091c5062c struct lock_file: declare some fields volatile
The function remove_lock_file_on_signal() is used as a signal handler.
It is not realistic to make the signal handler conform strictly to the
C standard, which is very restrictive about what a signal handler is
allowed to do.  But let's increase the likelihood that it will work:

The lock_file_list global variable and several fields from struct
lock_file are used by the signal handler.  Declare those values
"volatile" to (1) force the main process to write the values to RAM
promptly, and (2) prevent updates to these fields from being reordered
in a way that leaves an opportunity for a jump to the signal handler
while the object is in an inconsistent state.

We don't mark the filename field volatile because that would prevent
the use of strcpy(), and it is anyway unlikely that a compiler
re-orders a strcpy() call across other expressions.  So in practice it
should be possible to get away without "volatile" in the "filename"
case.

Suggested-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:49:00 -07:00
Michael Haggerty
707103fdfd lockfile: avoid transitory invalid states
Because remove_lock_file() can be called any time by the signal
handler, it is important that any lock_file objects that are in the
lock_file_list are always in a valid state.  And since lock_file
objects are often reused (but are never removed from lock_file_list),
that means we have to be careful whenever mutating a lock_file object
to always keep it in a well-defined state.

This was formerly not the case, because part of the state was encoded
by setting lk->filename to the empty string vs. a valid filename.  It
is wrong to assume that this string can be updated atomically; for
example, even

    strcpy(lk->filename, value)

is unsafe.  But the old code was even more reckless; for example,

    strcpy(lk->filename, path);
    if (!(flags & LOCK_NODEREF))
            resolve_symlink(lk->filename, max_path_len);
    strcat(lk->filename, ".lock");

During the call to resolve_symlink(), lk->filename contained the name
of the file that was being locked, not the name of the lockfile.  If a
signal were raised during that interval, then the signal handler would
have deleted the valuable file!

We could probably continue to use the filename field to encode the
state by being careful to write characters 1..N-1 of the filename
first, and then overwrite the NUL at filename[0] with the first
character of the filename, but that would be awkward and error-prone.

So, instead of using the filename field to determine whether the
lock_file object is active, add a new field "lock_file::active" for
this purpose.  Be careful to set this field only when filename really
contains the name of a file that should be deleted on cleanup.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:48:59 -07:00
Michael Haggerty
7108ad232f cache.h: define constants LOCK_SUFFIX and LOCK_SUFFIX_LEN
There are a few places that use these values, so define constants for
them.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:45:11 -07:00
Michael Haggerty
e197c21807 unable_to_lock_die(): rename function from unable_to_lock_index_die()
This function is used for other things besides the index, so rename it
accordingly.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:38:38 -07:00
Junio C Hamano
102edda4df Merge branch 'ta/config-add-to-empty-or-true-fix' into maint
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.

* ta/config-add-to-empty-or-true-fix:
  config: avoid a funny sentinel value "a^"
  make config --add behave correctly for empty and NULL values
2014-09-29 22:10:25 -07:00
Junio C Hamano
1c2ea2cdc0 Merge branch 'rs/realloc-array'
Code cleanup.

* rs/realloc-array:
  use REALLOC_ARRAY for changing the allocation size of arrays
  add macro REALLOC_ARRAY
2014-09-26 14:39:45 -07:00
Junio C Hamano
69a5bbbbfa Merge branch 'jk/write-packed-refs-via-stdio'
Optimize the code path to write out the packed-refs file, which
especially matters in a repository with a large number of refs.

* jk/write-packed-refs-via-stdio:
  refs: write packed_refs file using stdio
2014-09-26 14:39:42 -07:00
Junio C Hamano
4daf5c8643 Merge branch 'ta/config-add-to-empty-or-true-fix'
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.

* ta/config-add-to-empty-or-true-fix:
  config: avoid a funny sentinel value "a^"
  make config --add behave correctly for empty and NULL values
2014-09-19 11:38:40 -07:00
Junio C Hamano
9ff700ebac Merge branch 'jk/commit-author-parsing'
Code clean-up.

* jk/commit-author-parsing:
  determine_author_info(): copy getenv output
  determine_author_info(): reuse parsing functions
  date: use strbufs in date-formatting functions
  record_author_date(): use find_commit_header()
  record_author_date(): fix memory leak on malformed commit
  commit: provide a function to find a header in a buffer
2014-09-19 11:38:33 -07:00
Junio C Hamano
ceeacc501b Merge branch 'bb/date-iso-strict'
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable.  A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.

* bb/date-iso-strict:
  pretty: provide a strict ISO 8601 date format
2014-09-19 11:38:32 -07:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Jeff King
c1063be2a3 config: avoid a funny sentinel value "a^"
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 16:33:54 -07:00
Junio C Hamano
7f346e9d73 Merge branch 'ta/config-set-1'
Use the new caching config-set API in git_config() calls.

* ta/config-set-1:
  add tests for `git_config_get_string_const()`
  add a test for semantic errors in config files
  rewrite git_config() to use the config-set API
  config: add `git_die_config()` to the config-set API
  change `git_config()` return value to void
  add line number and file name info to `config_set`
  config.c: fix accuracy of line number in errors
  config.c: mark error and warnings strings for translation
2014-09-11 10:33:25 -07:00
Jeff King
9540ce5030 refs: write packed_refs file using stdio
We write each line of a new packed-refs file individually
using a write() syscall (and sometimes 2, if the ref is
peeled). Since each line is only about 50-100 bytes long,
this creates a lot of system call overhead.

We can instead open a stdio handle around our descriptor and
use fprintf to write to it. The extra buffering is not a
problem for us, because nobody will read our new packed-refs
file until we call commit_lock_file (by which point we have
flushed everything).

On a pathological repository with 8.5 million refs, this
dropped the time to run `git pack-refs` from 20s to 6s.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 10:58:32 -07:00
Junio C Hamano
56f214e071 Merge branch 'ta/config-set'
Add in-core caching layer to let us avoid reading the same
configuration files number of times.

* ta/config-set:
  test-config: add tests for the config_set API
  add `config_set` API for caching config-like files
2014-09-02 13:24:18 -07:00
Junio C Hamano
1d8a6f6929 Merge branch 'mm/config-edit-global'
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any.  This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.

* mm/config-edit-global:
  commit: advertise config --global --edit on guessed identity
  home_config_paths(): let the caller ignore xdg path
  config --global --edit: create a template file if needed
2014-09-02 13:23:20 -07:00
Junio C Hamano
c518279c0e Merge branch 'jc/reopen-lock-file'
There are cases where you lock and open to write a file, close it to
show the updated contents to external processes, and then have to
update the file again while still holding the lock, but the lockfile
API lacked support for such an access pattern.

* jc/reopen-lock-file:
  lockfile: allow reopening a closed but still locked file
2014-09-02 13:20:13 -07:00
Beat Bolli
466fb6742d pretty: provide a strict ISO 8601 date format
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.

The output from "--date=iso" deviates from ISO 8601 in these ways:

  - a space instead of the `T` date/time delimiter
  - a space between time and time zone
  - no colon between hours and minutes of the time zone

Add a strict ISO 8601 date format for displaying committer and
author dates.  Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.

See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.

Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-29 12:37:02 -07:00
Steffen Prohaska
23b0c4782e config.c: add git_env_ulong() to parse environment variable
The new function parses an integeral value that fits in unsigned
long in human readable form, i.e. possibly with unit suffix, e.g.
10k = 10240, etc., from an environment variable.  Parsing of
GIT_MMAP_LIMIT and GIT_ALLOC_LIMIT will use it in later patches.

Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-28 10:24:54 -07:00
Jeff King
c33ddc2e33 date: use strbufs in date-formatting functions
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).

Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:

  1. In test-date.c, where we don't care about efficiency.

  2. In determine_author_info, which is not performance
     critical (and where the use of a strbuf will help later
     refactoring).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-27 10:32:56 -07:00
Tanay Abhra
155ef25f12 rewrite git_config() to use the config-set API
Of all the functions in `git_config*()` family, `git_config()` has the
most invocations in the whole code base. Each `git_config()` invocation
causes config file rereads which can be avoided using the config-set API.

Use the config-set API to rewrite `git_config()` to use the config caching
layer to avoid config file rereads on each invocation during a git process
lifetime. First invocation constructs the cache, and after that for each
successive invocation, `git_config()` feeds values from the config cache
instead of rereading the configuration files.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:41:10 -07:00
Tanay Abhra
5a80e97c82 config: add git_die_config() to the config-set API
Add `git_die_config` that dies printing the line number and the file name
of the highest priority value for the configuration variable `key`. A custom
error message is also printed before dying, specified by the caller, which can
be skipped if `err` argument is set to NULL.

It has usage in non-callback based config value retrieval where we can
raise an error and die if there is a semantic error.
For example,

	if (!git_config_get_value(key, &value)){
		if (!strcmp(value, "foo"))
			git_config_die(key, "value: `%s` is illegal", value);
		else
			/* do work */
	}

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:40:25 -07:00
Tanay Abhra
aace438502 change git_config() return value to void
Currently `git_config()` returns an integer signifying an error code.
During rewrites of the function most of the code was shifted to
`git_config_with_options()`. `git_config_with_options()` normally
returns positive values if its `config_source` parameter is set as NULL,
as most errors are fatal, and non-fatal potential errors are guarded
by "if" statements that are entered only when no error is possible.

Still a negative value can be returned in case of race condition between
`access_or_die()` & `git_config_from_file()`. Also, all callers of
`git_config()` ignore the return value except for one case in branch.c.

Change `git_config()` return value to void and make it die if it receives
a negative value from `git_config_with_options()`.

Original-patch-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:40:17 -07:00
Tanay Abhra
3df8fd625f add line number and file name info to config_set
Store file name and line number for each key-value pair in the cache
during parsing of the configuration files.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 11:38:50 -07:00
Tanay Abhra
3c8687a73e add config_set API for caching config-like files
Currently `git_config()` uses a callback mechanism and file rereads for
config values. Due to this approach, it is not uncommon for the config
files to be parsed several times during the run of a git program, with
different callbacks picking out different variables useful to themselves.

Add a `config_set`, that can be used to construct an in-memory cache for
config-like files that the caller specifies (i.e., files like `.gitmodules`,
`~/.gitconfig` etc.). Add two external functions `git_configset_get_value`
and `git_configset_get_value_multi` for querying from the config sets.
`git_configset_get_value` follows `last one wins` semantic (i.e. if there
are multiple matches for the queried key in the files of the configset the
value returned will be the last entry in `value_list`).
`git_configset_get_value_multi` returns a list of values sorted in order of
increasing priority (i.e. last match will be at the end of the list). Add
type specific query functions like `git_configset_get_bool` and similar.

Add a default `config_set`, `the_config_set` to cache all key-value pairs
read from usual config files (repo specific .git/config, user wide
~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set`
is populated using `git_config()`.

Add two external functions `git_config_get_value` and
`git_config_get_value_multi` for querying in a non-callback manner from
`the_config_set`. Also, add type specific query functions that are
implemented as a thin wrapper around the `config_set` API.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-29 14:29:56 -07:00
Jeff King
5de7f500c1 alloc: factor out commit index
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-28 10:14:33 -07:00
Matthieu Moy
9830534e40 config --global --edit: create a template file if needed
When the user has no ~/.gitconfig file, git config --global --edit used
to launch an editor on an nonexistant file name.

Instead, create a file with a default content before launching the
editor. The template contains only commented-out entries, to save a few
keystrokes for the user. If the values are guessed properly, the user
will only have to uncomment the entries.

Advanced users teaching newbies can create a minimalistic configuration
faster for newbies. Beginners reading a tutorial advising to run "git
config --global --edit" as a first step will be slightly more guided for
their first contact with Git.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-25 12:23:06 -07:00
Junio C Hamano
10b944b37b Merge branch 'jk/alloc-commit-id'
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.

* jk/alloc-commit-id:
  diff-tree: avoid lookup_unknown_object
  object_as_type: set commit index
  alloc: factor out commit index
  add object_as_type helper for casting objects
  parse_object_buffer: do not set object type
  move setting of object->type to alloc_* functions
  alloc: write out allocator definitions
  alloc.c: remove the alloc_raw_commit_node() function
2014-07-22 10:59:25 -07:00
Junio C Hamano
9f2de9c121 Merge branch 'kb/perf-trace'
* kb/perf-trace:
  api-trace.txt: add trace API documentation
  progress: simplify performance measurement by using getnanotime()
  wt-status: simplify performance measurement by using getnanotime()
  git: add performance tracing for git's main() function to debug scripts
  trace: add trace_performance facility to debug performance issues
  trace: add high resolution timer function to debug performance issues
  trace: add 'file:line' to all trace output
  trace: move code around, in preparation to file:line output
  trace: add current timestamp to all trace output
  trace: disable additional trace output for unit tests
  trace: add infrastructure to augment trace output with additional info
  sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
  Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
  trace: improve trace performance
  trace: remove redundant printf format attribute
  trace: consistently name the format parameter
  trace: move trace declarations from cache.h to new trace.h
2014-07-22 10:59:19 -07:00
Junio C Hamano
19a249ba83 Merge branch 'rs/ref-transaction-0'
Early part of the "ref transaction" topic.

* rs/ref-transaction-0:
  refs.c: change ref_transaction_update() to do error checking and return status
  refs.c: remove the onerr argument to ref_transaction_commit
  update-ref: use err argument to get error from ref_transaction_commit
  refs.c: make update_ref_write update a strbuf on failure
  refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
  refs.c: log_ref_write should try to return meaningful errno
  refs.c: make resolve_ref_unsafe set errno to something meaningful on error
  refs.c: commit_packed_refs to return a meaningful errno on failure
  refs.c: make remove_empty_directories always set errno to something sane
  refs.c: verify_lock should set errno to something meaningful
  refs.c: make sure log_ref_setup returns a meaningful errno
  refs.c: add an err argument to repack_without_refs
  lockfile.c: make lock_file return a meaningful errno on failurei
  lockfile.c: add a new public function unable_to_lock_message
  refs.c: add a strbuf argument to ref_transaction_commit for error logging
  refs.c: allow passing NULL to ref_transaction_free
  refs.c: constify the sha arguments for ref_transaction_create|delete|update
  refs.c: ref_transaction_commit should not free the transaction
  refs.c: remove ref_transaction_rollback
2014-07-21 11:18:37 -07:00
Junio C Hamano
c9831bb09d Merge branch 'kb/path-max-must-go'
* kb/path-max-must-go:
  cache.h: rename cache_def_free to cache_def_clear
2014-07-16 11:32:33 -07:00
Junio C Hamano
788cef81d4 Merge branch 'nd/split-index'
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.

* nd/split-index: (32 commits)
  t1700: new tests for split-index mode
  t2104: make sure split index mode is off for the version test
  read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
  read-tree: note about dropping split-index mode or index version
  read-tree: force split-index mode off on --index-output
  rev-parse: add --shared-index-path to get shared index path
  update-index --split-index: do not split if $GIT_DIR is read only
  update-index: new options to enable/disable split index mode
  split-index: strip pathname of on-disk replaced entries
  split-index: do not invalidate cache-tree at read time
  split-index: the reading part
  split-index: the writing part
  read-cache: mark updated entries for split index
  read-cache: save deleted entries in split index
  read-cache: mark new entries for split index
  read-cache: split-index mode
  read-cache: save index SHA-1 after reading
  entry.c: update cache_changed if refresh_cache is set in checkout_entry()
  cache-tree: mark istate->cache_changed on prime_cache_tree()
  cache-tree: mark istate->cache_changed on cache tree update
  ...
2014-07-16 11:25:40 -07:00
Junio C Hamano
93dcaea226 lockfile: allow reopening a closed but still locked file
In some code paths (e.g. giving "add -i" to prepare the contents to
be committed interactively inside "commit -p") where a caller takes
a lock, writes the new content, give chance for others to use it
while still holding the lock, and then releases the lock when all is
done.  As an extension, allow the caller to re-update an already
closed file while still holding the lock (i.e. not yet committed) by
re-opening the file, to be followed by updating the contents and
then by the usual close_lock_file() or commit_lock_file().

This is necessary if we want to add code to rebuild the cache-tree
and write the resulting index out after "add -i" returns the control
to "commit -p", for example.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-14 13:05:37 -07:00
Ronnie Sahlberg
76d70dc0c6 refs.c: make resolve_ref_unsafe set errno to something meaningful on error
Making errno when returning from resolve_ref_unsafe() meaningful,
which should fix

 * a bug in lock_ref_sha1_basic, where it assumes EISDIR
   means it failed due to a directory being in the way

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
2014-07-14 11:54:42 -07:00
Ronnie Sahlberg
6af926e8bc lockfile.c: add a new public function unable_to_lock_message
Introducing a new unable_to_lock_message helper, which has nicer
semantics than unable_to_lock_error and cleans up lockfile.c a little.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
2014-07-14 11:54:40 -07:00
Jeff King
94d5a22cf6 alloc: factor out commit index
We keep a static counter to set the commit index on newly
allocated objects. However, since we also need to set the
index on any_objects which are converted to commits, let's
make the counter available as a public function.

While we're moving it, let's make sure the counter is
allocated as an unsigned integer to match the index field in
"struct commit".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 18:59:05 -07:00
Karsten Blees
2a60839150 cache.h: rename cache_def_free to cache_def_clear
Rename cache_def_free to cache_def_clear as it doesn't free the struct
cache_def, but just clears its content.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-13 10:12:37 -07:00
Junio C Hamano
11def366e5 Merge branch 'kb/path-max-must-go'
* kb/path-max-must-go:
  symlinks: remove PATH_MAX limitation
2014-07-10 11:27:47 -07:00
Karsten Blees
e7c7305300 symlinks: remove PATH_MAX limitation
'git checkout' fails if a directory is longer than PATH_MAX, because the
lstat_cache in symlinks.c checks if the leading directory exists using
PATH_MAX-bounded string operations.

Remove the limitation by using strbuf instead.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 11:22:42 -07:00
Junio C Hamano
1881d2b88c Merge branch 'ym/fix-opportunistic-index-update-race' into maint
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running.  Detect such a race and avoid overwriting the index.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-25 11:49:48 -07:00
Jeremiah Mahler
ccdd4a0f3c cleanup duplicate name_compare() functions
We often represent our strings as a counted string, i.e. a pair of
the pointer to the beginning of the string and its length, and the
string may not be NUL terminated to that length.

To compare a pair of such counted strings, unpack-trees.c and
read-cache.c implement their own name_compare() functions
identically.  In addition, the cache_name_compare() function in
read-cache.c is nearly identical.  The only difference is when one
string is the prefix of the other string, in which case
name_compare() returns -1/+1 to show which one is longer, and
cache_name_compare() returns the difference of the lengths to show
the same information.

Unify these three functions by using the implementation from
cache_name_compare().  This does not make any difference to the
existing and future callers, as they must be paying attention only
to the sign of the returned value (and not the magnitude) because
the original implementations of these two functions return values
returned by memcmp(3) when the one string is not a prefix of the
other string, and the only thing memcmp(3) guarantees its callers is
the sign of the returned value, not the magnitude.

Signed-off-by: Jeremiah Mahler <jmmahler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:12:14 -07:00
Karsten Blees
5991a55c54 trace: move trace declarations from cache.h to new trace.h
Also include direct dependencies (strbuf.h and git-compat-util.h for
__attribute__) so that trace.h can be used independently of cache.h, e.g.
in test programs.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-17 09:37:47 -07:00
Junio C Hamano
b83163643b Merge branch 'sk/windows-unc-path'
* sk/windows-unc-path:
  Windows: allow using UNC path for git repository
2014-06-16 10:07:03 -07:00
Nguyễn Thái Ngọc Duy
3e52f70b15 t1700: new tests for split-index mode
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:42 -07:00
Nguyễn Thái Ngọc Duy
c18b80a0e8 update-index: new options to enable/disable split index mode
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
b3c96fb158 split-index: strip pathname of on-disk replaced entries
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
ce7c614bce split-index: do not invalidate cache-tree at read time
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:41 -07:00
Nguyễn Thái Ngọc Duy
078a58e825 read-cache: mark updated entries for split index
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:40 -07:00
Nguyễn Thái Ngọc Duy
5fc2fc8fa2 read-cache: split-index mode
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)

This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.

Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.

The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
e93021b20a read-cache: save index SHA-1 after reading
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
d4a2024aef entry.c: update cache_changed if refresh_cache is set in checkout_entry()
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
a5400efe29 cache-tree: mark istate->cache_changed on cache tree invalidation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:39 -07:00
Nguyễn Thái Ngọc Duy
6c306a34ee resolve-undo: be specific what part of the index has changed
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
e636a7b4d0 read-cache: be specific what part of the index has changed
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
ce51bf09f8 read-cache: store in-memory flags in the first 12 bits of ce_flags
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
626f35c893 read-cache: relocate and unexport commit_locked_index()
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:38 -07:00
Nguyễn Thái Ngọc Duy
03b8664772 read-cache: new API write_locked_index instead of write_index/write_cache
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 11:49:10 -07:00
Cezary Zawadka
c2369bdf7f Windows: allow using UNC path for git repository
[efl: moved MinGW-specific part to compat/]
[jes: fixed compilation on non-Windows]

Eric Sunshine fixed mingw_offset_1st_component() to return
consistently "foo" for UNC "//machine/share/foo", cf

http://groups.google.com/group/msysgit/browse_thread/thread/c0af578549b5dda0

Author: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Cezary Zawadka <czawadka@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-10 13:30:04 -07:00
Junio C Hamano
1e2600dd6a Merge branch 'nd/status-auto-comment-char'
* nd/status-auto-comment-char:
  commit: allow core.commentChar=auto for character auto selection
  config: be strict on core.commentChar
2014-06-06 11:36:10 -07:00
Junio C Hamano
610a14f643 Merge branch 'jk/squelch-compiler-warning-from-funny-error-macro'
* jk/squelch-compiler-warning-from-funny-error-macro:
  let clang use the constant-return error() macro
  inline constant return from error() function
2014-06-06 11:21:36 -07:00
Junio C Hamano
e1857af923 Merge branch 'jk/commit-date-approxidate'
* jk/commit-date-approxidate:
  commit: accept more date formats for "--date"
  commit: print "Date" line when the user has set date
  pretty: make show_ident_date public
  commit: use split_ident_line to compare author/committer
2014-06-03 12:06:46 -07:00
Junio C Hamano
9af098c29b Merge branch 'ym/fix-opportunistic-index-update-race'
Read-only operations such as "git status" that internally refreshes
the index write out the refreshed index to the disk to optimize
future accesses to the working tree, but this could race with a
"read-write" operation that modify the index while it is running.
Detect such a race and avoid overwriting the index.

Duy raised a good point that we may need to do the same for the
normal writeout codepath, not just the "opportunistic" update
codepath.  While that is true, nobody sane would be running two
simultaneous operations that are clearly write-oriented competing
with each other against the same index file.  So in that sense that
can be done as a less urgent follow-up for this topic.

* ym/fix-opportunistic-index-update-race:
  read-cache.c: verify index file before we opportunistically update it
  wrapper.c: add xpread() similar to xread()
2014-06-03 12:06:41 -07:00
Junio C Hamano
8eaf517835 Merge branch 'ks/tree-diff-nway'
Instead of running N pair-wise diff-trees when inspecting a
N-parent merge, find the set of paths that were touched by walking
N+1 trees in parallel.  These set of paths can then be turned into
N pair-wise diff-tree results to be processed through rename
detections and such.  And N=2 case nicely degenerates to the usual
2-way diff-tree, which is very nice.

* ks/tree-diff-nway:
  mingw: activate alloca
  combine-diff: speed it up, by using multiparent diff tree-walker directly
  tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
  Portable alloca for Git
  tree-diff: reuse base str(buf) memory on sub-tree recursion
  tree-diff: no need to call "full" diff_tree_sha1 from show_path()
  tree-diff: rework diff_tree interface to be sha1 based
  tree-diff: diff_tree() should now be static
  tree-diff: remove special-case diff-emitting code for empty-tree cases
  tree-diff: simplify tree_entry_pathcmp
  tree-diff: show_path prototype is not needed anymore
  tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
  tree-diff: move all action-taking code out of compare_tree_entry()
  tree-diff: don't assume compare_tree_entry() returns -1,0,1
  tree-diff: consolidate code for emitting diffs and recursion in one place
  tree-diff: show_tree() is not needed
  tree-diff: no need to pass match to skip_uninteresting()
  tree-diff: no need to manually verify that there is no mode change for a path
  combine-diff: move changed-paths scanning logic into its own function
  combine-diff: move show_log_first logic/action out of paths scanning
2014-06-03 12:06:40 -07:00
Nguyễn Thái Ngọc Duy
84c9dc2c5a commit: allow core.commentChar=auto for character auto selection
When core.commentChar is "auto", the comment char starts with '#' as
in default but if it's already in the prepared message, find another
char in a small subset. This should stop surprises because git strips
some lines unexpectedly.

Note that git is not smart enough to recognize '#' as the comment char
in custom templates and convert it if the final comment char is
different. It thinks '#' lines in custom templates as part of the
commit message. So don't use this with custom templates.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-19 13:37:25 -07:00
Jeff King
ff0a80af72 let clang use the constant-return error() macro
Commit e208f9c converted error() into a macro to make its
constant return value more apparent to calling code.  Commit
5ded807 prevents us using this macro with clang, since
clang's -Wunused-value is smart enough to realize that the
constant "-1" is useless in some contexts.

However, since the last commit puts the constant behind an
inline function call, this is enough to prevent the
-Wunused-value warning on both modern gcc and clang. So we
can now re-enable the macro when compiling with clang.

Tested with clang 3.3, 3.4, and 3.5.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-06 15:30:40 -07:00
Jeff King
87fe5df365 inline constant return from error() function
Commit e208f9c introduced a macro to turn error() calls
into:

  (error(), -1)

to make the constant return value more visible to the
calling code (and thus let the compiler make better
decisions about the code).

This works well for code like:

  return error(...);

but the "-1" is superfluous in code that just calls error()
without caring about the return value. In older versions of
gcc, that was fine, but gcc 4.9 complains with -Wunused-value.

We can work around this by encapsulating the constant return
value in a static inline function, as gcc specifically
avoids complaining about unused function returns unless the
function has been specifically marked with the
warn_unused_result attribute.

We also use the same trick for config_error_nonbool and
opterror, which learned the same error technique in a469a10.

Reported-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-06 15:30:38 -07:00
Jeff King
d105324655 pretty: make show_ident_date public
We use this function internally to format "Date" lines in
commit logs, but other parts of the code will want it, too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-02 14:13:00 -07:00
Yiannis Marangos
426ddeead6 read-cache.c: verify index file before we opportunistically update it
Before we proceed to opportunistically update the index (often done
by an otherwise read-only operation like "git status" and "git diff"
that internally refreshes the index), we must verify that the
current index file is the same as the one that we read earlier
before we took the lock on it, in order to avoid a possible race.

In the example below git-status does "opportunistic update" and
git-rebase updates the index, but the race can happen in general.

  1. process A calls git-rebase (or does anything that uses the index)

  2. process A applies 1st commit

  3. process B calls git-status (or does anything that updates the index)

  4. process B reads index

  5. process A applies 2nd commit

  6. process B takes the lock, then overwrites process A's changes.

  7. process A applies 3rd commit

As an end result the 3rd commit will have a revert of the 2nd commit.
When process B takes the lock, it needs to make sure that the index
hasn't changed since step 4.

Signed-off-by: Yiannis Marangos <yiannis.marangos@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-10 12:27:58 -07:00
Kirill Smelkov
72441af7c4 tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.

In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.

For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .

There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .

That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.

That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).

That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:

D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn)      (w.r.t. paths)

where

D(A,P1...Pn) is combined diff between commit A, and parents Pi

and

D(A,Pi) is usual two-tree diff Pi..A

So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.

And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.

The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.

Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:

D(T,P1...Pn) calculation scheme
-------------------------------

D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn)	(regarding resulting paths set)

    D(T,Pj)		- diff between T..Pj
    D(T,P1...Pn)	- combined diff from T to parents P1,...,Pn

We start from all trees, which are sorted, and compare their entries in
lock-step:

     T     P1       Pn
     -     -        -
    |t|   |p1|     |pn|
    |-|   |--| ... |--|      imin = argmin(p1...pn)
    | |   |  |     |  |
    |-|   |--|     |--|
    |.|   |. |     |. |
     .     .        .
     .     .        .

at any time there could be 3 cases:

    1)  t < p[imin];
    2)  t > p[imin];
    3)  t = p[imin].

Schematic deduction of what every case means, and what to do, follows:

1)  t < p[imin]  ->  ∀j t ∉ Pj  ->  "+t" ∈ D(T,Pj)  ->  D += "+t";  t↓

2)  t > p[imin]

    2.1) ∃j: pj > p[imin]  ->  "-p[imin]" ∉ D(T,Pj)  ->  D += ø;  ∀ pi=p[imin]  pi↓
    2.2) ∀i  pi = p[imin]  ->  pi ∉ T  ->  "-pi" ∈ D(T,Pi)  ->  D += "-p[imin]";  ∀i pi↓

3)  t = p[imin]

    3.1) ∃j: pj > p[imin]  ->  "+t" ∈ D(T,Pj)  ->  only pi=p[imin] remains to investigate
    3.2) pi = p[imin]  ->  investigate δ(t,pi)
     |
     |
     v

    3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø  ->

                      ⎧δ(t,pi)  - if pi=p[imin]
             ->  D += ⎨
                      ⎩"+t"     - if pi>p[imin]

    in any case t↓  ∀ pi=p[imin]  pi↓

~

For comparison, here is how diff_tree() works:

D(A,B) calculation scheme
-------------------------

    A     B
    -     -
   |a|   |b|    a < b   ->  a ∉ B   ->   D(A,B) +=  +a    a↓
   |-|   |-|    a > b   ->  b ∉ A   ->   D(A,B) +=  -b    b↓
   | |   | |    a = b   ->  investigate δ(a,b)            a↓ b↓
   |-|   |-|
   |.|   |.|
    .     .
    .     .

~~~~~~~~

This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff

D(A,B)

is by definition the same as combined diff

D(A,[B]),

so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.

What we do is as follows:

1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
   a paths generator (new name diff_tree_paths()), with each generated path
   being `struct combine_diff_path` with info for path, new sha1,mode and for
   every parent which sha1,mode it was in it.

2) From that info, we can still generate usual diff queue with
   struct diff_filepairs, via "exporting" generated
   combine_diff_path, if we know we run for nparent=1 case.
   (see emit_diff() which is now named emit_diff_first_parent_only())

3) In order for diff_can_quit_early(), which checks

       DIFF_OPT_TST(opt, HAS_CHANGES))

   to work, that exporting have to be happening not in bulk, but
   incrementally, one diff path at a time.

   For such consumers, there is a new callback in diff_options
   introduced:

       ->pathchange(opt, struct combine_diff_path *)

   which, if set to !NULL, is called for every generated path.

   (see new compat ll_diff_tree_sha1() wrapper around new paths
    generator for setup)

4) The paths generation itself, is reworked from previous
   ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
   scheme" provided above:

   On the start we allocate [nparent] arrays in place what was
   earlier just for one parent tree.

   then we just generalize loops, and comparison according to the
   algorithm.

Some notes(*):

1) alloca(), for small arrays, is used for "runs not slower for
   nparent=1 case than before" goal - if we change it to xmalloc()/free()
   the timings get ~1% worse. For alloca() we use just-introduced
   xalloca/xalloca_free compatibility wrappers, so it should not be a
   portability problem.

2) For every parent tree, we need to keep a tag, whether entry from that
   parent equals to entry from minimal parent. For performance reasons I'm
   keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
   Not doing so, we'd need to alloca another [nparent] array, which hurts
   performance.

3) For emitted paths, memory could be reused, if we know the path was
   processed via callback and will not be needed later. We use efficient
   hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
   of potential additional slowdown.

4) goto(s) are used in several places, as the code executes a little bit
   faster with lowered register pressure.

Also

- we should now check for FIND_COPIES_HARDER not only when two entries
  names are the same, and their hashes are equal, but also for a case,
  when a path was removed from some of all parents having it.

  The reason is, if we don't, that path won't be emitted at all (see
  "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
  all paths - with diff or without - to be emitted, to be later analyzed
  for being copies sources.

  The new check is only necessary for nparent >1, as for nparent=1 case
  xmin_eqtotal always =1 =nparent, and a path is always added to diff as
  removal.

~~~~~~~~

Timings for

    # without -c, i.e. testing only nparent=1 case
    `git log --raw --no-abbrev --no-renames`

before and after the patch are as follows:

                navy.git        linux.git v3.10..v3.11

    before      0.611s          1.889s
    after       0.619s          1.907s
    slowdown    1.3%            0.9%

This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.

The output also stayed the same.

(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
    we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
   "do no harm for nparent=1 case" rule.

For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.

P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with

    `git log --reverse --raw --no-abbrev --no-renames -c`

to extract log of what was created/changed when, as a result building a
map

    {}  sha1    ->  in which commit (and date) a content was added

that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.

That case was my initial motivation for combined diffs speedup.

Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-07 14:40:46 -07:00
Junio C Hamano
b6de0c633e Merge branch 'nd/tag-version-sort'
Allow v1.9.0 sorted before v1.10.0 in "git tag --list" output.

* nd/tag-version-sort:
  tag: support --sort=<spec>
2014-03-21 12:47:39 -07:00
Junio C Hamano
8aac6c97e8 Merge branch 'jk/commit-dates-parsing-fix' into maint
Codepaths that parse timestamps in commit objects have been
tightened.

* jk/commit-dates-parsing-fix:
  show_ident_date: fix tz range check
  log: do not segfault on gmtime errors
  log: handle integer overflow in timestamps
  date: check date overflow against time_t
  fsck: report integer overflow in author timestamps
  t4212: test bogus timestamps with git-log
2014-03-18 14:04:01 -07:00
Junio C Hamano
6d011b8e3f Merge branch 'bk/refresh-missing-ok-in-merge-recursive' into maint
"merge-recursive" was broken in 1.7.7 era and stopped working in an
empty (temporary) working tree, when there are renames involved.
This has been corrected.

* bk/refresh-missing-ok-in-merge-recursive:
  merge-recursive.c: tolerate missing files while refreshing index
  read-cache.c: extend make_cache_entry refresh flag with options
  read-cache.c: refactor --ignore-missing implementation
  t3030-merge-recursive: test known breakage with empty work tree
2014-03-18 14:02:38 -07:00
Junio C Hamano
3e30cb0fbf Merge branch 'mh/replace-refs-variable-rename'
* mh/replace-refs-variable-rename:
  Document some functions defined in object.c
  Add docstrings for lookup_replace_object() and do_lookup_replace_object()
  rename read_replace_refs to check_replace_refs
2014-03-14 14:27:06 -07:00
Junio C Hamano
060be00621 Merge branch 'mh/object-code-cleanup'
* mh/object-code-cleanup:
  sha1_file.c: document a bunch of functions defined in the file
  sha1_file_name(): declare to return a const string
  find_pack_entry(): document last_found_pack
  replace_object: use struct members instead of an array
2014-03-14 14:26:29 -07:00
Junio C Hamano
3c83b080e4 Merge branch 'jk/commit-dates-parsing-fix'
Tighten codepaths that parse timestamps in commit objects.

* jk/commit-dates-parsing-fix:
  show_ident_date: fix tz range check
  log: do not segfault on gmtime errors
  log: handle integer overflow in timestamps
  date: check date overflow against time_t
  fsck: report integer overflow in author timestamps
  t4212: test bogus timestamps with git-log
2014-03-14 14:25:44 -07:00
Junio C Hamano
08f36302b5 Merge branch 'ks/config-file-stdin'
"git config" learned to read from the standard input when "-" is
given as the value to its "--file" parameter (attempting an
operation to update the configuration in the standard input of
course is rejected).

* ks/config-file-stdin:
  config: teach "git config --file -" to read from the standard input
  config: change git_config_with_options() interface
  builtin/config.c: rename check_blob_write() -> check_write()
  config: disallow relative include paths from blobs
2014-03-14 14:24:40 -07:00
Junio C Hamano
053a6b1807 Merge branch 'jn/add-2.0-u-A-sans-pathspec'
"git add -u" and "git add -A" without any pathspec is a tree-wide
operation now, even when they are run in a subdirectory of the
working tree.
2014-03-07 15:14:02 -08:00
Junio C Hamano
4c4ac4db2c Merge branch 'nd/daemonize-gc'
Allow running "gc --auto" in the background.

* nd/daemonize-gc:
  gc: config option for running --auto in background
  daemon: move daemonize() to libgit.a
2014-03-05 15:06:39 -08:00
Michael Haggerty
1f91e79cf6 Add docstrings for lookup_replace_object() and do_lookup_replace_object()
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-28 13:17:56 -08:00
Nguyễn Thái Ngọc Duy
9ef176b55c tag: support --sort=<spec>
--sort=version:refname (or --sort=v:refname for short) sorts tags as
if they are versions. --sort=-refname reverses the order (with or
without ":version").

versioncmp() is copied from string/strverscmp.c in glibc commit
ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding
style. The implementation is under LGPL-2.1 and according to [1] I can
relicense it to GPLv2.

[1] http://www.gnu.org/licenses/gpl-faq.html#AllCompatibility

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-27 14:04:05 -08:00
Junio C Hamano
0f9e62e084 Merge branch 'jk/pack-bitmap'
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.

* jk/pack-bitmap: (26 commits)
  ewah: unconditionally ntohll ewah data
  ewah: support platforms that require aligned reads
  read-cache: use get_be32 instead of hand-rolled ntoh_l
  block-sha1: factor out get_be and put_be wrappers
  do not discard revindex when re-preparing packfiles
  pack-bitmap: implement optional name_hash cache
  t/perf: add tests for pack bitmaps
  t: add basic bitmap functionality tests
  count-objects: recognize .bitmap in garbage-checking
  repack: consider bitmaps when performing repacks
  repack: handle optional files created by pack-objects
  repack: turn exts array into array-of-struct
  repack: stop using magic number for ARRAY_SIZE(exts)
  pack-objects: implement bitmap writing
  rev-list: add bitmap mode to speed up object lists
  pack-objects: use bitmaps when packing objects
  pack-objects: split add_object_entry
  pack-bitmap: add support for bitmap indexes
  documentation: add documentation for the bitmap format
  ewah: compressed bitmap implementation
  ...
2014-02-27 14:01:48 -08:00
Junio C Hamano
8336832ad9 Merge branch 'nd/reset-intent-to-add'
* nd/reset-intent-to-add:
  reset: support "--mixed --intent-to-add" mode
2014-02-27 14:01:40 -08:00
Junio C Hamano
cbaeafc325 Merge branch 'nd/submodule-pathspec-ending-with-slash'
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.

* nd/submodule-pathspec-ending-with-slash:
  clean: use cache_name_is_other()
  clean: replace match_pathspec() with dir_path_match()
  pathspec: pass directory indicator to match_pathspec_item()
  match_pathspec: match pathspec "foo/" against directory "foo"
  dir.c: prepare match_pathspec_item for taking more flags
  pathspec: rename match_pathspec_depth() to match_pathspec()
  pathspec: convert some match_pathspec_depth() to dir_path_match()
  pathspec: convert some match_pathspec_depth() to ce_path_match()
2014-02-27 14:01:15 -08:00
Junio C Hamano
156d6ed922 Merge branch 'bk/refresh-missing-ok-in-merge-recursive'
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.

* bk/refresh-missing-ok-in-merge-recursive:
  merge-recursive.c: tolerate missing files while refreshing index
  read-cache.c: extend make_cache_entry refresh flag with options
  read-cache.c: refactor --ignore-missing implementation
  t3030-merge-recursive: test known breakage with empty work tree
2014-02-27 14:01:14 -08:00
Junio C Hamano
d637d1b9a8 Merge branch 'kb/fast-hashmap'
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.

* kb/fast-hashmap:
  name-hash: retire unused index_name_exists()
  hashmap.h: use 'unsigned int' for hash-codes everywhere
  test-hashmap.c: drop unnecessary #includes
  .gitignore: test-hashmap is a generated file
  read-cache.c: fix memory leaks caused by removed cache entries
  builtin/update-index.c: cleanup update_one
  fix 'git update-index --verbose --again' output
  remove old hash.[ch] implementation
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for directories
  diffcore-rename.c: use new hash map implementation
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: move code around to prepare for the next patch
  buitin/describe.c: use new hash map implementation
  add a hashtable implementation that supports O(1) removal
  submodule: don't access the .gitmodules cache entry after removing it
2014-02-27 14:01:09 -08:00
Michael Haggerty
d40d535b89 sha1_file.c: document a bunch of functions defined in the file
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 16:01:11 -08:00
Eric Sunshine
7b359ea6b3 name-hash: retire unused index_name_exists()
db5360f3f4 (name-hash: refactor polymorphic index_name_exists();
2013-09-17) split index_name_exists() into index_file_exists() and
index_dir_exists() but retained index_name_exists() as a thin wrapper
to avoid disturbing possible in-flight topics. Since this change
landed in 'master' some time ago and there are no in-flight topics
referencing index_name_exists(), retire it.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 15:26:33 -08:00
Nguyễn Thái Ngọc Duy
429bb40abd pathspec: convert some match_pathspec_depth() to ce_path_match()
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:36:52 -08:00
Brad King
257627268a read-cache.c: extend make_cache_entry refresh flag with options
Convert the make_cache_entry boolean 'refresh' argument to a more
general 'refresh_options' argument.  Pass the value through to the
underlying refresh_cache_ent call.  Add option CE_MATCH_REFRESH to
enable stat refresh.  Update call sites to use the new signature.

Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:31:17 -08:00
Brad King
2e2e7ec1ef read-cache.c: refactor --ignore-missing implementation
Move lstat ENOENT handling from refresh_index to refresh_cache_ent and
activate it with a new CE_MATCH_IGNORE_MISSING option.  This will allow
other call paths into refresh_cache_ent to use the feature.

Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 14:31:10 -08:00
Jeff King
7ca36d9398 date: check date overflow against time_t
When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions.  We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 10:12:58 -08:00
Michael Haggerty
30d6c6eabf sha1_file_name(): declare to return a const string
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.)  Change callers
accordingly, deleting a few superfluous temporary variables along the
way.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 09:10:22 -08:00
Michael Haggerty
afc711b8e1 rename read_replace_refs to check_replace_refs
The semantics of this flag was changed in commit

    e1111cef23 inline lookup_replace_object() calls

but wasn't renamed at the time to minimize code churn.  Rename it now,
and add a comment explaining its use.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-20 14:16:55 -08:00
Kirill A. Shutemov
3caec73b55 config: teach "git config --file -" to read from the standard input
The patch extends git config --file interface to allow read config from
stdin.

Editing stdin or setting value in stdin is an error.

Include by absolute path is allowed in stdin config, but not by relative
path.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-18 16:12:14 -08:00
Kirill A. Shutemov
c8985ce053 config: change git_config_with_options() interface
We're going to have more options for config source.

Let's alter git_config_with_options() interface to accept struct with
all source options.

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-18 16:12:13 -08:00
Nguyễn Thái Ngọc Duy
de0957ce2e daemon: move daemonize() to libgit.a
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-10 10:46:35 -08:00
Nguyễn Thái Ngọc Duy
b4b313f94a reset: support "--mixed --intent-to-add" mode
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
2014-02-05 16:44:51 -08:00
Junio C Hamano
d0956cfa8e Merge branch 'mh/safe-create-leading-directories'
Code clean-up and protection against concurrent write access to the
ref namespace.

* mh/safe-create-leading-directories:
  rename_tmp_log(): on SCLD_VANISHED, retry
  rename_tmp_log(): limit the number of remote_empty_directories() attempts
  rename_tmp_log(): handle a possible mkdir/rmdir race
  rename_ref(): extract function rename_tmp_log()
  remove_dir_recurse(): handle disappearing files and directories
  remove_dir_recurse(): tighten condition for removing unreadable dir
  lock_ref_sha1_basic(): if locking fails with ENOENT, retry
  lock_ref_sha1_basic(): on SCLD_VANISHED, retry
  safe_create_leading_directories(): add new error value SCLD_VANISHED
  cmd_init_db(): when creating directories, handle errors conservatively
  safe_create_leading_directories(): introduce enum for return values
  safe_create_leading_directories(): always restore slash at end of loop
  safe_create_leading_directories(): split on first of multiple slashes
  safe_create_leading_directories(): rename local variable
  safe_create_leading_directories(): add explicit "slash" pointer
  safe_create_leading_directories(): reduce scope of local variable
  safe_create_leading_directories(): fix format of "if" chaining
2014-01-27 10:45:33 -08:00
Junio C Hamano
9bb5287098 Merge branch 'mh/retire-ref-fetch-rules'
Code simplification.

* mh/retire-ref-fetch-rules:
  refname_match(): always use the rules in ref_rev_parse_rules
2014-01-27 10:44:07 -08:00
Junio C Hamano
92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Michael Haggerty
54457fe509 refname_match(): always use the rules in ref_rev_parse_rules
We used to use two separate rules for the normal ref resolution
dwimming and dwimming done to decide which remote ref to grab.  The
third parameter to refname_match() selected which rules to use.

When these two rules were harmonized in

    2011-11-04 dd621df9cd refs DWIMmery: use the same rule for both "git fetch" and others

, ref_fetch_rules was #defined to avoid potential breakages for
in-flight topics.

It is now safe to remove the backwards-compatibility code, so remove
refname_match()'s third parameter, make ref_rev_parse_rules private to
refs.c, and remove ref_fetch_rules entirely.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-14 13:58:06 -08:00
Junio C Hamano
b2132068c6 Merge branch 'jk/oi-delta-base'
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.

* jk/oi-delta-base:
  cat-file: provide %(deltabase) batch format
  sha1_object_info_extended: provide delta base sha1s
2014-01-10 10:33:11 -08:00
Michael Haggerty
18d37e860d safe_create_leading_directories(): add new error value SCLD_VANISHED
Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED.  This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared.  This probably indicates that another process deleted the
directory while we were working.  If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.

Why doesn't safe_create_leading_directories() do the retrying
internally?  Because an empty directory isn't really ever safe until
it holds a file.  So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06 09:34:22 -08:00
Michael Haggerty
0be0521b23 safe_create_leading_directories(): introduce enum for return values
Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum.  Add
a docstring.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-06 09:34:21 -08:00
Jeff King
5d642e7506 sha1_object_info_extended: provide delta base sha1s
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.

However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.

For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:53:32 -08:00
Christian Couder
de7b5d6218 sha1_object_info_extended(): add an "unsigned flags" parameter
This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Christian Couder
bf93eea0f6 sha1_file.c: add lookup_replace_object_extended() to pass flags
Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Christian Couder
ffe68cf9ac rename READ_SHA1_FILE_REPLACE flag to LOOKUP_REPLACE_OBJECT
The READ_SHA1_FILE_REPLACE flag is more related to using the
lookup_replace_object() function rather than the
read_sha1_file() function.

We also need such a flag to be used with sha1_object_info()
instead of read_sha1_file().

The name LOOKUP_REPLACE_OBJECT is therefore better for this
flag.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:53:48 -08:00
Nguyễn Thái Ngọc Duy
069c053222 add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
This may be needed when a hook is run after a new shallow pack is
received, but .git/shallow is not settled yet. A temporary shallow
file to plug all loose ends should be used instead. GIT_SHALLOW_FILE
is overriden by --shallow-file.

--shallow-file does not work in this case because the hook may spawn
many git subprocesses and the launch commands do not have
--shallow-file as it's a recent addition.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:17 -08:00
Nguyễn Thái Ngọc Duy
58babfffde shallow.c: the 8 steps to select new commits for .git/shallow
Suppose a fetch or push is requested between two shallow repositories
(with no history deepening or shortening). A pack that contains
necessary objects is transferred over together with .git/shallow of
the sender. The receiver has to determine whether it needs to update
.git/shallow if new refs needs new shallow comits.

The rule here is avoid updating .git/shallow by default. But we don't
want to waste the received pack. If the pack contains two refs, one
needs new shallow commits installed in .git/shallow and one does not,
we keep the latter and reject/warn about the former.

Even if .git/shallow update is allowed, we only add shallow commits
strictly necessary for the former ref (remember the sender can send
more shallow commits than necessary) and pay attention not to
accidentally cut the receiver history short (no history shortening is
asked for)

So the steps to figure out what ref need what new shallow commits are:

1. Split the sender shallow commit list into "ours" and "theirs" list
   by has_sha1_file. Those that exist in current repo in "ours", the
   remaining in "theirs".

2. Check the receiver .git/shallow, remove from "ours" the ones that
   also exist in .git/shallow.

3. Fetch the new pack. Either install or unpack it.

4. Do has_sha1_file on "theirs" list again. Drop the ones that fail
   has_sha1_file. Obviously the new pack does not need them.

5. If the pack is kept, remove from "ours" the ones that do not exist
   in the new pack.

6. Walk the new refs to answer the question "what shallow commits,
   both ours and theirs, are required in .git/shallow in order to add
   this ref?". Shallow commits not associated to any refs are removed
   from their respective list.

7. (*) Check reachability (from the current refs) of all remaining
   commits in "ours". Those reachable are removed. We do not want to
   cut any part of our (reachable) history. We only check up
   commits. True reachability test is done by
   check_everything_connected() at the end as usual.

8. Combine the final "ours" and "theirs" and add them all to
   .git/shallow. Install new refs. The case where some hook rejects
   some refs on a push is explained in more detail in the push
   patches.

Of these steps, #6 and #7 are expensive. Both require walking through
some commits, or in the worst case all commits. And we rather avoid
them in at least common case, where the transferred pack does not
contain any shallow commits that the sender advertises. Let's look at
each scenario:

1) the sender has longer history than the receiver

   All shallow commits from the sender will be put into "theirs" list
   at step 1 because none of them exists in current repo. In the
   common case, "theirs" becomes empty at step 4 and exit early.

2) the sender has shorter history than the receiver

   All shallow commits from the sender are likely in "ours" list at
   step 1. In the common case, if the new pack is kept, we could empty
   "ours" and exit early at step 5.

   If the pack is not kept, we hit the expensive step 6 then exit
   after "ours" is emptied. There'll be only a handful of objects to
   walk in fast-forward case. If it's forced update, we may need to
   walk to the bottom.

3) the sender has same .git/shallow as the receiver

   This is similar to case 2 except that "ours" should be emptied at
   step 2 and exit early.

A fetch after "clone --depth=X" is case 1. A fetch after "clone" (from
a shallow repo) is case 3. Luckily they're cheap for the common case.

A push from "clone --depth=X" falls into case 2, which is expensive.
Some more work may be done at the sender/client side to avoid more
work on the server side: if the transferred pack does not contain any
shallow commits, send-pack should not send any shallow commits to the
receive-pack, effectively turning it into a normal push and avoid all
steps.

This patch implements all steps except #3, already handled by
fetch-pack and receive-pack, #6 and #7, which has their own patch due
to their size.

(*) in previous versions step 7 was put before step 3. I reorder it so
    that the common case that keeps the pack does not need to walk
    commits at all. In future if we implement faster commit
    reachability check (maybe with the help of pack bitmaps or commit
    cache), step 7 could become cheap and be moved up before 6 again.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:16 -08:00
Karsten Blees
efc684245b remove old hash.[ch] implementation
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:25 -08:00
Karsten Blees
419a597f64 name-hash.c: remove cache entries instead of marking them CE_UNHASHED
The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.

The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.

Keep the CE_HASHED flag to prevent adding entries twice.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:24 -08:00
Karsten Blees
8b013788a1 name-hash.c: use new hash map implementation for cache entries
Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:04:24 -08:00