Commit Graph

130 Commits

Author SHA1 Message Date
Junio C Hamano
54dc783766 Merge branch 'bd/maint-unpack-trees-parawalk-fix' into maint
* bd/maint-unpack-trees-parawalk-fix:
  unpack-trees: Make index lookahead less pessimal
2010-06-28 16:18:02 -07:00
Junio C Hamano
21919d396a Merge branch 'pc/remove-warn' into maint
* pc/remove-warn:
  Remove a redundant errno test in a usage of remove_path
  Introduce remove_or_warn function
  Implement the rmdir_or_warn function
  Generalise the unlink_or_warn function
2010-06-22 08:30:38 -07:00
Brian Downing
e53e6b4433 unpack-trees: Make index lookahead less pessimal
When traversing trees with an index, the current index pointer
(o->cache_bottom) occasionally has to be temporarily advanced forwards to
match the traversal order of the tree, which is not the same as the sort
order of the index.  The existing algorithm that did this (introduced in
730f72840c) would get "stuck" when the
cache_bottom was popped and then repeatedly check the same index entries
over and over.  This represents a serious performance regression for
large repositories compared to the old "broken" traversal order.

This commit makes a simple change to mitigate this.  Whenever
find_cache_pos sees that the current pos is also the cache_bottom, and
it has already been unpacked, it advances the cache_bottom as well as
the current pos.  This prevents the above "sticking" behavior without
dramatically changing the algorithm.

In addition, this commit moves the unpacked check above the
ce_in_traverse_path() check.  The simple bitmask check is cheaper, and
in the case described above will be firing quite a bit to advance the
cache_bottom after a tree pop.

This yields considerable performance improvements for large trees.
The following are the number of function calls for "git diff HEAD" on
the Linux kernel tree, with 33,307 files:

   Symbol               Calls Before   Calls After
   -------------------  ------------   -----------
   unpack_callback            35,332        35,332
   find_cache_pos             37,357        37,357
   ce_in_traverse_path     4,979,473        37,357
   do_compare_entry        6,828,181       251,925
   df_name_compare         6,828,181       251,925

And on a repository of 187,456 files:

   Symbol               Calls Before   Calls After
   -------------------  ------------   -----------
   unpack_callback           197,958       197,958
   find_cache_pos            208,460       208,460
   ce_in_traverse_path    37,308,336       208,460
   do_compare_entry      156,950,469     2,690,626
   df_name_compare       156,950,469     2,690,626

On the latter repository, user time for "git diff HEAD" was reduced from
5.58 to 0.42 seconds.  This is compared to 0.30 seconds before the
traversal order fix was implemented.

Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-06-18 08:06:18 -07:00
Clemens Buchacher
aecda37c66 do not overwrite files marked "assume unchanged"
A merge will fail gracefully if it needs to update files marked
"assume unchanged", but other similar commands will not. In
particular, checkout and rebase will silently overwrite changes to
such files.

This is a regression introduced in commit 1dcafcc0 (verify_uptodate():
add ce_uptodate(ce) test), which avoids lstat's during a merge, if the
index entry is up-to-date. If the CE_VALID flag is set, however, we
cannot trust CE_UPTODATE.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-05-01 12:00:44 -07:00
Peter Collingbourne
80d706afed Introduce remove_or_warn function
This patch introduces the remove_or_warn function which is a
generalised version of the {unlink,rmdir}_or_warn functions.  It takes
an additional parameter indicating the mode of the file to be removed.

The patch also modifies certain functions to use remove_or_warn
where appropriate, and adds a test case for a bug fixed by the use
of remove_or_warn.

Signed-off-by: Peter Collingbourne <peter@pcc.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-03-28 09:53:04 -07:00
Junio C Hamano
026680f881 Merge branch 'jc/fix-tree-walk'
* jc/fix-tree-walk:
  read-tree --debug-unpack
  unpack-trees.c: look ahead in the index
  unpack-trees.c: prepare for looking ahead in the index
  Aggressive three-way merge: fix D/F case
  traverse_trees(): handle D/F conflict case sanely
  more D/F conflict tests
  tests: move convenience regexp to match object names to test-lib.sh

Conflicts:
	builtin-read-tree.c
	unpack-trees.c
	unpack-trees.h
2010-01-24 17:35:58 -08:00
Junio C Hamano
26b9f5cc99 Merge branch 'pc/uninteresting-submodule-disappear-upon-switch-branches'
* pc/uninteresting-submodule-disappear-upon-switch-branches:
  Remove empty directories when checking out a commit with fewer submodules
2010-01-18 18:12:57 -08:00
Junio C Hamano
dc96c5ee70 Merge branch 'cc/reset-more'
* cc/reset-more:
  t7111: check that reset options work as described in the tables
  Documentation: reset: add some missing tables
  Fix bit assignment for CE_CONFLICTED
  "reset --merge": fix unmerged case
  reset: use "unpack_trees()" directly instead of "git read-tree"
  reset: add a few tests for "git reset --merge"
  Documentation: reset: add some tables to describe the different options
  reset: improve mixed reset error message when in a bare repo
2010-01-13 11:58:56 -08:00
Junio C Hamano
73d66323ac Merge branch 'nd/sparse'
* nd/sparse: (25 commits)
  t7002: test for not using external grep on skip-worktree paths
  t7002: set test prerequisite "external-grep" if supported
  grep: do not do external grep on skip-worktree entries
  commit: correctly respect skip-worktree bit
  ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
  tests: rename duplicate t1009
  sparse checkout: inhibit empty worktree
  Add tests for sparse checkout
  read-tree: add --no-sparse-checkout to disable sparse checkout support
  unpack-trees(): ignore worktree check outside checkout area
  unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
  unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
  unpack-trees.c: generalize verify_* functions
  unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
  Introduce "sparse checkout"
  dir.c: export excluded_1() and add_excludes_from_file_1()
  excluded_1(): support exclude files in index
  unpack-trees(): carry skip-worktree bit over in merged_entry()
  Read .gitignore from index if it is skip-worktree
  Avoid writing to buffer in add_excludes_from_file_1()
  ...

Conflicts:
	.gitignore
	Documentation/config.txt
	Documentation/git-update-index.txt
	Makefile
	entry.c
	t/t7002-grep.sh
2010-01-13 11:58:34 -08:00
Peter Collingbourne
c5e558a80a Remove empty directories when checking out a commit with fewer submodules
Change the unlink_entry function to use rmdir to remove submodule
directories.  Currently we try to use unlink, which will never succeed.

Of course rmdir will only succeed for empty (i.e. not checked out)
submodule directories.  Behaviour if a submodule is checked out stays
essentially the same: print a warning message and keep the submodule
directory.

Signed-off-by: Peter Collingbourne <peter@pcc.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-11 19:50:51 -08:00
Junio C Hamano
ba655da537 read-tree --debug-unpack
A debugging patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 15:00:14 -08:00
Junio C Hamano
730f72840c unpack-trees.c: look ahead in the index
This makes the traversal of index be in sync with the tree traversal.
When unpack_callback() is fed a set of tree entries from trees, it
inspects the name of the entry and checks if the an index entry with
the same name could be hiding behind the current index entry, and

 (1) if the name appears in the index as a leaf node, it is also
     fed to the n_way_merge() callback function;

 (2) if the name is a directory in the index, i.e. there are entries in
     that are underneath it, then nothing is fed to the n_way_merge()
     callback function;

 (3) otherwise, if the name comes before the first eligible entry in the
     index, the index entry is first unpacked alone.

When traverse_trees_recursive() descends into a subdirectory, the
cache_bottom pointer is moved to walk index entries within that directory.

All of these are omitted for diff-index, which does not even want to be
fed an index entry and a tree entry with D/F conflicts.

This fixes 3-way read-tree and exposes a bug in other parts of the system
in t6035, test #5.  The test prepares these three trees:

 O = HEAD^
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/b/c/d
    100644 blob e69de29bb2    a/x

 A = HEAD
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/b/c/d
    100644 blob 587be6b4c3f93f93c489c0111bba5596147a26cb    a/x

 B = master
    120000 blob a36b77384451ea1de7bd340ffca868249626bc52    a/b
    100644 blob e69de29bb2    a/b-2/c/d
    100644 blob e69de29bb2    a/x

With a clean index that matches HEAD, running

    git read-tree -m -u --aggressive $O $A $B

now yields

    120000 a36b77384451ea1de7bd340ffca868249626bc52 3       a/b
    100644 e69de29bb2 0       a/b-2/c/d
    100644 e69de29bb2 1       a/b/c/d
    100644 e69de29bb2 2       a/b/c/d
    100644 587be6b4c3f93f93c489c0111bba5596147a26cb 0       a/x

which is correct.  "master" created "a/b" symlink that did not exist,
and removed "a/b/c/d" while HEAD did not do touch either path.

Before this series, read-tree did not notice the situation and resolved
addition of "a/b" and removal of "a/b/c/d" independently.  If A = HEAD had
another path "a/b/c/e" added, this merge should conflict but instead it
silently resolved "a/b" and then immediately overwrote it to add
"a/b/c/e", which was quite bogus.

Tests in t1012 start to work with this.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 15:00:14 -08:00
Junio C Hamano
da165f470e unpack-trees.c: prepare for looking ahead in the index
This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.

A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:

   O        A       B
   t                t
   t-i      t-i     t-i
   t-j      t-j
            t/1
            t/2

The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead.  After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".

The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed.  When this happens, the pointer o->pos still points at
"t-i" that is the first entry.  We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".

While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed.  o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 14:59:54 -08:00
Junio C Hamano
cee2d6ae63 Aggressive three-way merge: fix D/F case
When the ancestor used to have a blob "P", your tree removed it, and the
tree you are merging with also removed it, the agressive three-way cleanly
merges to remove that blob.  If the other tree added a new blob "P/Q"
while removing "P", it should also merge cleanly to remove "P" and create
"P/Q" (since neither the ancestor nor your tree could have had it, so it
is a typical "created in one").

The "aggressive" rule is not new anymore.  Reword the stale comment.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-03 23:25:13 -08:00
Junio C Hamano
e11d7b5969 "reset --merge": fix unmerged case
Commit 9e8ecea (Add 'merge' mode to 'git reset', 2008-12-01) disallowed
"git reset --merge" when there was unmerged entries.  But it wished if
unmerged entries were reset as if --hard (instead of --merge) has been
used.  This makes sense because all "mergy" operations makes sure that
any path involved in the merge does not have local modifications before
starting, so resetting such a path away won't lose any information.

The previous commit changed the behavior of --merge to accept resetting
unmerged entries if they are reset to a different state than HEAD, but it
did not reset the changes in the work tree, leaving the conflict markers
in the resulting file in the work tree.

Fix it by doing three things:

 - Update the documentation to match the wish of original "reset --merge"
   better, namely, "An unmerged entry is a sign that the path didn't have
   any local modification and can be safely resetted to whatever the new
   HEAD records";

 - Update read_index_unmerged(), which reads the index file into the cache
   while dropping any higher-stage entries down to stage #0, not to copy
   the object name from the higher stage entry.  The code used to take the
   object name from the a stage entry ("base" if you happened to have
   stage #1, or "ours" if both sides added, etc.), which essentially meant
   that you are getting random results depending on what the merge did.

   The _only_ reason we want to keep a previously unmerged entry in the
   index at stage #0 is so that we don't forget the fact that we have
   corresponding file in the work tree in order to be able to remove it
   when the tree we are resetting to does not have the path.  In order to
   differentiate such an entry from ordinary cache entry, the cache entry
   added by read_index_unmerged() is marked as CE_CONFLICTED.

 - Update merged_entry() and deleted_entry() so that they pay attention to
   cache entries marked as CE_CONFLICTED.  They are previously unmerged
   entries, and the files in the work tree that correspond to them are
   resetted away by oneway_merge() to the version from the tree we are
   resetting to.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-03 16:01:05 -08:00
Nguyễn Thái Ngọc Duy
56cac48c35 ie_match_stat(): do not ignore skip-worktree bit with CE_MATCH_IGNORE_VALID
Previously CE_MATCH_IGNORE_VALID flag is used by both valid and
skip-worktree bits. While the two bits have similar behaviour, sharing
this flag means "git update-index --really-refresh" will ignore
skip-worktree while it should not. Instead another flag is
introduced to ignore skip-worktree bit, CE_MATCH_IGNORE_VALID only
applies to valid bit.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-12-14 14:03:58 -08:00
Junio C Hamano
39add7a36f Merge branch 'jc/fix-tree-walk' (early part)
* 'jc/fix-tree-walk' (early part):
  unpack_callback(): use unpack_failed() consistently
  unpack-trees: typofix
  diff-lib.c: fix misleading comments on oneway_diff()
2009-11-20 23:55:50 -08:00
Felipe Contreras
a75d7b5409 Use 'fast-forward' all over the place
It's a compound word.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-24 23:50:28 -07:00
Junio C Hamano
353c5eeb5c unpack_callback(): use unpack_failed() consistently
When unpack_index_entry() failed, consistently call unpack_failed(),
instead of silently returning -1.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-11 16:40:43 -07:00
Junio C Hamano
6caa7b5553 unpack-trees: typofix
I am not good at subject-verb concordance.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-10-11 16:40:43 -07:00
Nguyễn Thái Ngọc Duy
9e1afb1675 sparse checkout: inhibit empty worktree
The way sparse checkout works, users may empty their worktree
completely, because of non-matching sparse-checkout spec, or empty
spec. I believe this is not desired. This patch makes Git refuse to
produce such worktree.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:14:42 -07:00
Nguyễn Thái Ngọc Duy
f1f523eae9 unpack-trees(): ignore worktree check outside checkout area
verify_absent() and verify_uptodate() are used to ensure worktree
is safe to be updated, then CE_REMOVE or CE_UPDATE will be set.
Finally check_updates() bases on CE_REMOVE, CE_UPDATE and the
recently added CE_WT_REMOVE to update working directory accordingly.

The entries that are checked may eventually be left out of checkout
area (done later in apply_sparse_checkout()). We don't want to update
outside checkout area. This patch teaches Git to assume "good",
skip these checks when it's sure those entries will be outside checkout
area, and clear CE_REMOVE|CE_UPDATE that could be set due to this
assumption.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:14:41 -07:00
Nguyễn Thái Ngọc Duy
e800ec9d72 unpack_trees(): apply $GIT_DIR/info/sparse-checkout to the final index
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:14:41 -07:00
Nguyễn Thái Ngọc Duy
08aefc9e47 unpack-trees(): "enable" sparse checkout and load $GIT_DIR/info/sparse-checkout
This patch introduces core.sparseCheckout, which will control whether
sparse checkout support is enabled in unpack_trees()

It also loads sparse-checkout file that will be used in the next patch.
I split it out so the next patch will be shorter, easier to read.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:14:41 -07:00
Nguyễn Thái Ngọc Duy
35a5aa79d0 unpack-trees.c: generalize verify_* functions
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:33 -07:00
Nguyễn Thái Ngọc Duy
e663db2f44 unpack-trees(): add CE_WT_REMOVE to remove on worktree alone
CE_REMOVE now removes both worktree and index versions. Sparse
checkout must be able to remove worktree version while keep the
index intact when checkout area is narrowed.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:33 -07:00
Nguyễn Thái Ngọc Duy
32f54ca317 unpack-trees(): carry skip-worktree bit over in merged_entry()
In this code path, we would remove "old" and replace it with "merge".
"old" may have skip-worktree bit, so re-add it to "merge".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:33 -07:00
Nguyễn Thái Ngọc Duy
5203083694 Teach Git to respect skip-worktree bit (writing part)
This part is mainly to remove CE_VALID shortcuts (and as a
consequence, ce_uptodate() shortcuts as it may be turned on by
CE_VALID) in writing code path if skip-worktree is used. Various tests
are added to avoid future breakages.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-23 17:13:32 -07:00
Junio C Hamano
58b1ef2f0f Merge branch 'maint'
* maint:
  checkout -f: deal with a D/F conflict entry correctly
  sha1_name.c: avoid unnecessary strbuf_release
  refs.c: release file descriptor on error return
2009-07-18 16:57:47 -07:00
Junio C Hamano
78d3b06e0f checkout -f: deal with a D/F conflict entry correctly
When we switch branches with "checkout -f", unpack_trees() feeds two
cache_entries to oneway_merge() function in its src[] array argument.  The
zeroth entry comes from the current index, and the first entry represents
what the merge result should be, taken from the tree recorded in the
commit we are switching to.

When we have a blob (either regular file or a symlink) in the index and in
the work tree at path "foo", and the switched-to tree has "foo/bar",
i.e. "foo" becomes a directory, src[0] is obviously that blob currently
registered at "foo".  Even though we do not have anything at "foo" in the
switched-to tree, src[1] is _not_ NULL in this case.

The unpack_trees() machinery places a special marker df_conflict_entry
to signal that no blob exists at "foo", but it will become a directory
that may have somthing underneath it (namely "foo/bar"), so a usual 3-way
merge can notice the situation.

But oneway_merge() codepath failed to notice this and passed the special
marker directly to merged_entry().  This happens to remove the "foo" in
the end because the df_conflict_entry does not have any name (hence the
"error" message) and its addition in add_index_entry() is rejected, but it
is wrong.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-18 16:57:30 -07:00
Linus Torvalds
05c1da2f5e Fix extraneous lstat's in 'git checkout -f'
In our 'oneway_merge()' we always do an 'lstat()' to see if we might
need to mark the entry for updating.

But we really shouldn't need to do that when the cache entry is already
marked as being ce_uptodate(), and this makes us do unnecessary lstat()
calls if we have index preloading enabled.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-14 15:17:01 -07:00
Brandon Casey
0039ba7e5e unpack-trees.c: work around run-time array initialization flaw on IRIX 6.5
The c99 MIPSpro Compiler version 7.4.4m on IRIX 6.5 does not properly
initialize run-time initialized arrays.  An array which is initialized with
fewer elements than the length of the array should have the unitialized
elements initialized to zero.  This compiler only initializes the remaining
elements when the last element is a static parameter.  So work around it
by adding a "NULL" initialization parameter.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-10 23:50:29 -07:00
Linus Torvalds
dba2e2037f Simplify read_directory[_recursive]() arguments
Stop the insanity with separate 'path' and 'base' arguments that must
match.  We don't need that crazy interface any more, since we cleaned up
handling of 'path' in commit da4b3e8c28.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-09 01:11:28 -07:00
Linus Torvalds
2af202be3d Fix various sparse warnings in the git source code
There are a few remaining ones, but this fixes the trivial ones. It boils
down to two main issues that sparse complains about:

 - warning: Using plain integer as NULL pointer

   Sparse doesn't like you using '0' instead of 'NULL'. For various good
   reasons, not the least of which is just the visual confusion. A NULL
   pointer is not an integer, and that whole "0 works as NULL" is a
   historical accident and not very pretty.

   A few of these remain: zlib is a total mess, and Z_NULL is just a 0.
   I didn't touch those.

 - warning: symbol 'xyz' was not declared. Should it be static?

   Sparse wants to see declarations for any functions you export. A lack
   of a declaration tends to mean that you should either add one, or you
   should mark the function 'static' to show that it's in file scope.

   A few of these remain: I only did the ones that should obviously just
   be made static.

That 'wt_status_submodule_summary' one is debatable. It has a few related
flags (like 'wt_status_use_color') which _are_ declared, and are used by
builtin-commit.c. So maybe we'd like to export it at some point, but it's
not declared now, and not used outside of that file, so 'static' it is in
this patch.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-20 21:52:55 -07:00
Junio C Hamano
b65982b608 Optimize "diff-index --cached" using cache-tree
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway.  Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.

As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:

    (without patch, hot cache)
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+9407minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+2446minor)pagefaults 0swaps

Cold cache numbers are very impressive, but it does not matter very much
in practice:

    (without patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time git diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
    247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps

    (with patch, cold cache)
    $ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
    $ /usr/bin/time ../git.git/git-diff --cached --raw
    :100644 100644 b57e1f5... e69de29... M  Makefile
    :100644 000000 8c86b72... 0000000... D  arch/x86/Makefile
    :000000 100644 0000000... e69de29... A  arche
    0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
    18440inputs+0outputs (79major+2369minor)pagefaults 0swaps

This of course helps "git status" as well.

    (without patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+10970minor)pagefaults 0swaps

    (with patch, hot cache)
    $ /usr/bin/time ../git.git/git-status >/dev/null
    0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+5336outputs (0major+3921minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-25 11:35:29 -07:00
Alex Riesen
691f1a28bf replace direct calls to unlink(2) with unlink_or_warn
This helps to notice when something's going wrong, especially on
systems which lock open files.

I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
  called from such a function
- it is in a static function, returning void and the function is only
  called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-29 18:37:41 -07:00
Junio C Hamano
66985e6629 unpack-trees: do not muck with attributes when we are not checking out
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-17 21:05:49 -07:00
Junio C Hamano
6ba8b079cb Merge branch 'jc/attributes-checkout'
* jc/attributes-checkout:
  Add a test for checking whether gitattributes is honored by checkout.
  Read attributes from the index that is being checked out
2009-03-26 00:27:33 -07:00
Junio C Hamano
7d4e3a72fb Merge branch 'jc/maint-1.6.0-read-tree-overlay'
* jc/maint-1.6.0-read-tree-overlay:
  read-tree A B C: do not create a bogus index and do not segfault
2009-03-17 18:58:55 -07:00
Junio C Hamano
06f33c1735 Read attributes from the index that is being checked out
Traditionally we used .gitattributes file from the work tree if exists,
and otherwise read from the index as a fallback.  When switching to a
branch that has an updated .gitattributes file, and entries in it give
different attributes to other paths being checked out, we should instead
read from the .gitattributes in the index.

This breaks a use case of fixing incorrect entries in the .gitattributes
in the work tree (without adding it to the index) and checking other paths
out, though.

    $ edit .gitattributes ;# mark foo.dat as binary
    $ rm foo.dat
    $ git checkout foo.dat

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-13 22:51:43 -07:00
Junio C Hamano
aab3b9a1aa read-tree A B C: do not create a bogus index and do not segfault
"git read-tree A B C..." without the "-m" (merge) option is a way to read
these trees on top of each other to get an overlay of them.

An ancient commit ee6566e (Rewrite read-tree, 2005-09-05) passed the
ADD_CACHE_SKIP_DFCHECK flag when calling add_index_entry() to add the
paths obtained from these trees to the index, but it is an incorrect use
of the flag.  The flag is meant to be used by callers who know the
addition of the entry does not introduce a D/F conflict to the index in
order to avoid the overhead of checking.

This bug resulted in a bogus index that records both "x" and "x/z" as a
blob after reading three trees that have paths ("x"), ("x", "y"), and
("x/z", "y") respectively.  34110cd (Make 'unpack_trees()' have a separate
source and destination index, 2008-03-06) refactored the callsites of
add_index_entry() incorrectly and added more codepaths that use this flag
when it shouldn't be used.

Also, 0190457 (Move 'unpack_trees()' over to 'traverse_trees()' interface,
2008-03-05) introduced a bug to call add_index_entry() for the tree that
does not have the path in it, passing NULL as a cache entry.  This caused
reading multiple trees, one of which has path "x" but another doesn't, to
segfault.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-12 17:06:07 -07:00
Kjetil Barvik
c06ff4908b Record ns-timestamps if possible, but do not use it without USE_NSEC
Traditionally, the lack of USE_NSEC meant "do not record nor use the
nanosecond resolution part of the file timestamps".  To avoid problems on
filesystems that lose the ns part when the metadata is flushed to the disk
and then later read back in, disabling USE_NSEC has been a good idea in
general.

If you are on a filesystem without such an issue, it does not hurt to read
and store them in the cached stat data in the index entries even if your
git is compiled without USE_NSEC.  The index left with such a version of
git can be read by git compiled with USE_NSEC and it can make use of the
nanosecond part to optimize the check to see if the path on the filesystem
hsa been modified since we last looked at.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-07 20:25:16 -08:00
Kjetil Barvik
1dcafcc0e6 verify_uptodate(): add ce_uptodate(ce) test
If we inside verify_uptodate() can already tell from the ce entry that
it is already uptodate by testing it with ce_uptodate(ce), there is no
need to call lstat(2) and ie_match_stat() afterwards.

And, reading from the commit log message from:

    commit eadb583134
    Author: Junio C Hamano <gitster@pobox.com>
    Date:   Fri Jan 18 23:45:24 2008 -0800

    Avoid running lstat(2) on the same cache entry.

this also seems to be correct usage of the ce_uptodate() macro
introduced by that patch.

This will avoid lots of lstat(2) calls in some cases, for example
by running the 'git checkout' command.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19 21:39:51 -08:00
Kjetil Barvik
fba2f38a2c make USE_NSEC work as expected
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.

This will make racy git situations less likely to happen.  For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not.  The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.

Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by.  For a fast Solid State Disk, this patch should be helpfull.

Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.

For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19 21:39:48 -08:00
Kjetil Barvik
36419c8ee4 check_updates(): effective removal of cache entries marked CE_REMOVE
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):

CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
                         mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
                           unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
  samples|      %|  samples|      %|
------------------------------------
   409247 100.000    342878 100.000 git
        CPU_CLK_UNHALT...|INST_RETIRED:2...|
          samples|      %|  samples|      %|
        ------------------------------------
           260476 63.6476    257843 75.1996 libz.so.1.2.3
           100876 24.6492     64378 18.7758 kernel-2.6.28.4_2.vmlinux
            30850  7.5382      7874  2.2964 libc-2.9.so
            14775  3.6103      8390  2.4469 git
             2020  0.4936      4325  1.2614 libcrypto.so.0.9.8
              191  0.0467        32  0.0093 libpthread-2.9.so
               58  0.0142        36  0.0105 ld-2.9.so
                1 2.4e-04         0       0 libldap-2.3.so.0.2.31

Detail list of the top 20 function entries (libz counted in one blob):

CPU_CLK_UNHALTED  INST_RETIRED_ANY_P
samples  %        samples  %        image name               symbol name
260476   63.6862  257843   75.2725  libz.so.1.2.3            /lib/libz.so.1.2.3
16587     4.0555  3636      1.0615  libc-2.9.so              memcpy
7710      1.8851  277       0.0809  libc-2.9.so              memmove
3679      0.8995  1108      0.3235  kernel-2.6.28.4_2.vmlinux d_validate
3546      0.8670  2607      0.7611  kernel-2.6.28.4_2.vmlinux __getblk
3174      0.7760  1813      0.5293  libc-2.9.so              _int_malloc
2396      0.5858  3681      1.0746  kernel-2.6.28.4_2.vmlinux copy_to_user
2270      0.5550  2528      0.7380  kernel-2.6.28.4_2.vmlinux __link_path_walk
2205      0.5391  1797      0.5246  kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103      0.5142  1203      0.3512  kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077      0.5078  997       0.2911  kernel-2.6.28.4_2.vmlinux do_get_write_access
2070      0.5061  514       0.1501  git                      cache_name_compare
2043      0.4995  1501      0.4382  kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022      0.4944  1732      0.5056  kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020      0.4939  4325      1.2626  libcrypto.so.0.9.8       /usr/lib/libcrypto.so.0.9.8
1965      0.4804  1384      0.4040  git                      patch_delta
1708      0.4176  984       0.2873  kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682      0.4112  727       0.2122  kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659      0.4056  290       0.0847  git                      find_pack_entry_one
1480      0.3619  1307      0.3816  kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks

Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.

Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:

        memmove(istate->cache + pos,
                istate->cache + pos + 1,
                (istate->cache_nr - pos) * sizeof(struct cache_entry *));

remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.

Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).

OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.

To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.

From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:

46        0.0105  15        0.0041  git                      remove_marked_cache_entries

If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test.  And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-18 17:11:21 -08:00
Kjetil Barvik
7847892716 unlink_entry(): introduce schedule_dir_for_removal()
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time.  So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.

To fix this and avoid some unnecessary calls to rmdir(), we schedule
each directory for removal and wait much longer before we do the real
call to rmdir().

Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir().  In some cases over 95% of all calls to
rmdir() is removed with this patch.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Kjetil Barvik
571998921d lstat_cache(): swap func(length, string) into func(string, length)
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.

Also, add a note about this fact into the coding guidelines.

Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-09 20:59:26 -08:00
Junio C Hamano
ddebfd1f27 Merge branch 'maint'
* maint:
  merge: fix out-of-bounds memory access
2009-01-31 17:42:26 -08:00
Junio C Hamano
6ac92294b3 Merge branch 'maint-1.6.0' into maint
* maint-1.6.0:
  merge: fix out-of-bounds memory access
2009-01-31 17:42:17 -08:00
René Scharfe
c7cddc1a2f merge: fix out-of-bounds memory access
The parameter n of unpack_callback() can have a value of up to
MAX_UNPACK_TREES.  The check at the top of unpack_trees() (its only
(indirect) caller) makes sure it cannot exceed this limit.

unpack_callback() passes it and the array src to unpack_nondirectories(),
which has this loop:

	for (i = 0; i < n; i++) {
		/* ... */
		src[i + o->merge] = o->df_conflict_entry;

o->merge can be 0 or 1, so unpack_nondirectories() potentially accesses
the array src at index MAX_UNPACK_TREES.  This patch makes it big enough.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-01-31 10:39:55 -08:00