This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.
A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:
O A B
t t
t-i t-i t-i
t-j t-j
t/1
t/2
The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead. After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".
The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed. When this happens, the pointer o->pos still points at
"t-i" that is the first entry. We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".
While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed. o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the ancestor used to have a blob "P", your tree removed it, and the
tree you are merging with also removed it, the agressive three-way cleanly
merges to remove that blob. If the other tree added a new blob "P/Q"
while removing "P", it should also merge cleanly to remove "P" and create
"P/Q" (since neither the ancestor nor your tree could have had it, so it
is a typical "created in one").
The "aggressive" rule is not new anymore. Reword the stale comment.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When unpack_index_entry() failed, consistently call unpack_failed(),
instead of silently returning -1.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we switch branches with "checkout -f", unpack_trees() feeds two
cache_entries to oneway_merge() function in its src[] array argument. The
zeroth entry comes from the current index, and the first entry represents
what the merge result should be, taken from the tree recorded in the
commit we are switching to.
When we have a blob (either regular file or a symlink) in the index and in
the work tree at path "foo", and the switched-to tree has "foo/bar",
i.e. "foo" becomes a directory, src[0] is obviously that blob currently
registered at "foo". Even though we do not have anything at "foo" in the
switched-to tree, src[1] is _not_ NULL in this case.
The unpack_trees() machinery places a special marker df_conflict_entry
to signal that no blob exists at "foo", but it will become a directory
that may have somthing underneath it (namely "foo/bar"), so a usual 3-way
merge can notice the situation.
But oneway_merge() codepath failed to notice this and passed the special
marker directly to merged_entry(). This happens to remove the "foo" in
the end because the df_conflict_entry does not have any name (hence the
"error" message) and its addition in add_index_entry() is rejected, but it
is wrong.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
In our 'oneway_merge()' we always do an 'lstat()' to see if we might
need to mark the entry for updating.
But we really shouldn't need to do that when the cache entry is already
marked as being ce_uptodate(), and this makes us do unnecessary lstat()
calls if we have index preloading enabled.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The c99 MIPSpro Compiler version 7.4.4m on IRIX 6.5 does not properly
initialize run-time initialized arrays. An array which is initialized with
fewer elements than the length of the array should have the unitialized
elements initialized to zero. This compiler only initializes the remaining
elements when the last element is a static parameter. So work around it
by adding a "NULL" initialization parameter.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Stop the insanity with separate 'path' and 'base' arguments that must
match. We don't need that crazy interface any more, since we cleaned up
handling of 'path' in commit da4b3e8c28.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a few remaining ones, but this fixes the trivial ones. It boils
down to two main issues that sparse complains about:
- warning: Using plain integer as NULL pointer
Sparse doesn't like you using '0' instead of 'NULL'. For various good
reasons, not the least of which is just the visual confusion. A NULL
pointer is not an integer, and that whole "0 works as NULL" is a
historical accident and not very pretty.
A few of these remain: zlib is a total mess, and Z_NULL is just a 0.
I didn't touch those.
- warning: symbol 'xyz' was not declared. Should it be static?
Sparse wants to see declarations for any functions you export. A lack
of a declaration tends to mean that you should either add one, or you
should mark the function 'static' to show that it's in file scope.
A few of these remain: I only did the ones that should obviously just
be made static.
That 'wt_status_submodule_summary' one is debatable. It has a few related
flags (like 'wt_status_use_color') which _are_ declared, and are used by
builtin-commit.c. So maybe we'd like to export it at some point, but it's
not declared now, and not used outside of that file, so 'static' it is in
this patch.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When running "diff-index --cached" after making a change to only a small
portion of the index, there is no point unpacking unchanged subtrees into
the index recursively, only to find that all entries match anyway. Tweak
unpack_trees() logic that is used to read in the tree object to catch the
case where the tree entry we are looking at matches the index as a whole
by looking at the cache-tree.
As an exercise, after modifying a few paths in the kernel tree, here are
a few numbers on my Athlon 64X2 3800+:
(without patch, hot cache)
$ /usr/bin/time git diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.07user 0.02system 0:00.09elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+9407minor)pagefaults 0swaps
(with patch, hot cache)
$ /usr/bin/time ../git.git/git-diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.02user 0.00system 0:00.02elapsed 103%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+2446minor)pagefaults 0swaps
Cold cache numbers are very impressive, but it does not matter very much
in practice:
(without patch, cold cache)
$ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
$ /usr/bin/time git diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.06user 0.17system 0:10.26elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
247032inputs+0outputs (1172major+8237minor)pagefaults 0swaps
(with patch, cold cache)
$ su root sh -c 'echo 3 >/proc/sys/vm/drop_caches'
$ /usr/bin/time ../git.git/git-diff --cached --raw
:100644 100644 b57e1f5... e69de29... M Makefile
:100644 000000 8c86b72... 0000000... D arch/x86/Makefile
:000000 100644 0000000... e69de29... A arche
0.02user 0.01system 0:01.01elapsed 3%CPU (0avgtext+0avgdata 0maxresident)k
18440inputs+0outputs (79major+2369minor)pagefaults 0swaps
This of course helps "git status" as well.
(without patch, hot cache)
$ /usr/bin/time ../git.git/git-status >/dev/null
0.17user 0.18system 0:00.35elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+10970minor)pagefaults 0swaps
(with patch, hot cache)
$ /usr/bin/time ../git.git/git-status >/dev/null
0.10user 0.16system 0:00.27elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+3921minor)pagefaults 0swaps
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps to notice when something's going wrong, especially on
systems which lock open files.
I used the following criteria when selecting the code for replacement:
- it was already printing a warning for the unlink failures
- it is in a function which already printing something or is
called from such a function
- it is in a static function, returning void and the function is only
called from a builtin main function (cmd_)
- it is in a function which handles emergency exit (signal handlers)
- it is in a function which is obvously cleaning up the lockfiles
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/attributes-checkout:
Add a test for checking whether gitattributes is honored by checkout.
Read attributes from the index that is being checked out
Traditionally we used .gitattributes file from the work tree if exists,
and otherwise read from the index as a fallback. When switching to a
branch that has an updated .gitattributes file, and entries in it give
different attributes to other paths being checked out, we should instead
read from the .gitattributes in the index.
This breaks a use case of fixing incorrect entries in the .gitattributes
in the work tree (without adding it to the index) and checking other paths
out, though.
$ edit .gitattributes ;# mark foo.dat as binary
$ rm foo.dat
$ git checkout foo.dat
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git read-tree A B C..." without the "-m" (merge) option is a way to read
these trees on top of each other to get an overlay of them.
An ancient commit ee6566e (Rewrite read-tree, 2005-09-05) passed the
ADD_CACHE_SKIP_DFCHECK flag when calling add_index_entry() to add the
paths obtained from these trees to the index, but it is an incorrect use
of the flag. The flag is meant to be used by callers who know the
addition of the entry does not introduce a D/F conflict to the index in
order to avoid the overhead of checking.
This bug resulted in a bogus index that records both "x" and "x/z" as a
blob after reading three trees that have paths ("x"), ("x", "y"), and
("x/z", "y") respectively. 34110cd (Make 'unpack_trees()' have a separate
source and destination index, 2008-03-06) refactored the callsites of
add_index_entry() incorrectly and added more codepaths that use this flag
when it shouldn't be used.
Also, 0190457 (Move 'unpack_trees()' over to 'traverse_trees()' interface,
2008-03-05) introduced a bug to call add_index_entry() for the tree that
does not have the path in it, passing NULL as a cache entry. This caused
reading multiple trees, one of which has path "x" but another doesn't, to
segfault.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Traditionally, the lack of USE_NSEC meant "do not record nor use the
nanosecond resolution part of the file timestamps". To avoid problems on
filesystems that lose the ns part when the metadata is flushed to the disk
and then later read back in, disabling USE_NSEC has been a good idea in
general.
If you are on a filesystem without such an issue, it does not hurt to read
and store them in the cached stat data in the index entries even if your
git is compiled without USE_NSEC. The index left with such a version of
git can be read by git compiled with USE_NSEC and it can make use of the
nanosecond part to optimize the check to see if the path on the filesystem
hsa been modified since we last looked at.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we inside verify_uptodate() can already tell from the ce entry that
it is already uptodate by testing it with ce_uptodate(ce), there is no
need to call lstat(2) and ie_match_stat() afterwards.
And, reading from the commit log message from:
commit eadb583134
Author: Junio C Hamano <gitster@pobox.com>
Date: Fri Jan 18 23:45:24 2008 -0800
Avoid running lstat(2) on the same cache entry.
this also seems to be correct usage of the ce_uptodate() macro
introduced by that patch.
This will avoid lots of lstat(2) calls in some cases, for example
by running the 'git checkout' command.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.
This will make racy git situations less likely to happen. For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not. The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.
Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by. For a fast Solid State Disk, this patch should be helpfull.
Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.
For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):
CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
409247 100.000 342878 100.000 git
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
260476 63.6476 257843 75.1996 libz.so.1.2.3
100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux
30850 7.5382 7874 2.2964 libc-2.9.so
14775 3.6103 8390 2.4469 git
2020 0.4936 4325 1.2614 libcrypto.so.0.9.8
191 0.0467 32 0.0093 libpthread-2.9.so
58 0.0142 36 0.0105 ld-2.9.so
1 2.4e-04 0 0 libldap-2.3.so.0.2.31
Detail list of the top 20 function entries (libz counted in one blob):
CPU_CLK_UNHALTED INST_RETIRED_ANY_P
samples % samples % image name symbol name
260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3
16587 4.0555 3636 1.0615 libc-2.9.so memcpy
7710 1.8851 277 0.0809 libc-2.9.so memmove
3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate
3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk
3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc
2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user
2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk
2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access
2070 0.5061 514 0.1501 git cache_name_compare
2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8
1965 0.4804 1384 0.4040 git patch_delta
1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659 0.4056 290 0.0847 git find_pack_entry_one
1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks
Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.
Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:
memmove(istate->cache + pos,
istate->cache + pos + 1,
(istate->cache_nr - pos) * sizeof(struct cache_entry *));
remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.
Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).
OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.
To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.
From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:
46 0.0105 15 0.0041 git remove_marked_cache_entries
If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test. And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time. So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.
To fix this and avoid some unnecessary calls to rmdir(), we schedule
each directory for removal and wait much longer before we do the real
call to rmdir().
Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir(). In some cases over 95% of all calls to
rmdir() is removed with this patch.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.
Also, add a note about this fact into the coding guidelines.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parameter n of unpack_callback() can have a value of up to
MAX_UNPACK_TREES. The check at the top of unpack_trees() (its only
(indirect) caller) makes sure it cannot exceed this limit.
unpack_callback() passes it and the array src to unpack_nondirectories(),
which has this loop:
for (i = 0; i < n; i++) {
/* ... */
src[i + o->merge] = o->df_conflict_entry;
o->merge can be 0 or 1, so unpack_nondirectories() potentially accesses
the array src at index MAX_UNPACK_TREES. This patch makes it big enough.
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/lstat-cache:
lstat_cache(): introduce clear_lstat_cache() function
lstat_cache(): introduce invalidate_lstat_cache() function
lstat_cache(): introduce has_dirs_only_path() function
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): more cache effective symlink/directory detection
In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also be told to keep track
of non-existing directories.
So we update the lstat_cache() function to handle this new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.
We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the only caller, verify_absent, relies on the fact that o->pos
points to the next index entry anyways, there is no need to recompute
its position.
Furthermore, if a nondirectory entry were found, this would return too
early, because there could still be an untracked directory in the way.
This is currently not a problem, because verify_absent is only called
if the index does not have this entry.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 0cf73755 (unpack-trees.c: assume submodules are clean during
check-out) changed an argument to verify_absent from 'path' to 'ce',
which is however shadowed by a local variable of the same name.
The bug triggers if verify_absent is used on a tree entry, for which
the index contains one or more subsequent directories of the same
length. The affected subdirectories are removed from the index. The
testcase included in this commit bisects to 55218834 (checkout: do not
lose staged removal), which reveals the bug in this case, but is
otherwise unrelated.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit 203a2fe1 (Allow callers of unpack_trees() to handle failure)
changed the "die on error" behavior to "return failure code".
verify_absent did not handle errors returned by
verify_clean_subdirectory, however.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These were found using gcc 4.3.2-1ubuntu11 with the warning:
warning: format not a string literal and no format arguments
Incorporated suggestions from Brandon Casey <casey@nrlssc.navy.mil>.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most cache_entry structs are allocated by using the
cache_entry_size macro, which rounds the size of the struct
up to the nearest multiple of 8 bytes (presumably to avoid
memory fragmentation).
There is one exception: the special "conflict entry" is
allocated with an empty name, and so is explicitly given
just one extra byte to hold the NUL.
However, later code doesn't realize that this particular
struct has been allocated differently, and happily tries
reading and copying it based on the ce_size macro, which
assumes the 8-byte alignment.
This can lead to reading uninitalized data, though since
that data is simply padding, there shouldn't be any problem
as a result. Still, it makes sense to hold the padding
assumption so as not to surprise later maintainers.
This fixes valgrind errors in t1005, t3030, t4002, and
t4114.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logic to checkout a different commit implements the safety to never
lose user's local changes. For example, switching from a commit to
another commit, when you have changed a path that is different between
them, need to merge your changes to the version from the switched-to
commit, which you may not necessarily be able to resolve easily. By
default, "git checkout" refused to switch branches, to give you a chance
to stash your local changes (or use "-m" to merge, accepting the risks of
getting conflicts).
This safety, however, had one deliberate hole since early June 2005. When
your local change was to remove a path (and optionally to stage that
removal), the command checked out the path from the switched-to commit
nevertheless.
This was to allow an initial checkout to happen smoothly (e.g. an initial
checkout is done by starting with an empty index and switching from the
commit at the HEAD to the same commit). We can tighten the rule slightly
to allow this special case to pass, without losing sight of removal
explicitly done by the user, by noticing if the index is truly empty when
the operation begins.
For historical background, see:
http://thread.gmane.org/gmane.comp.version-control.git/4641/focus=4646
This case is marked as *0* in the message, which both Linus and I said "it
feels somewhat wrong but otherwise we cannot start from an empty index".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.
The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such. The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().
This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.
A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion. But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.
An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick. In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.
The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of uniformly returning -1 on any error, this teaches
unpack_trees() to return -2 when the merge itself is Ok but worktree
refuses to get updated.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The plumbing output is sacred as it is an API. We _could_ change it if it
is broken in such a way that it cannot convey necessary information fully,
but we just do not _reword_ for the sake of rewording. If somebody does
not like it, s/he is complaining too late. S/he should have been here in
early May 2005 and make the language used by the API closer to what humans
read. S/he wasn't here. Too bad, and it is too late.
And people who complain should look at a bigger picture. Look at what was
suggested by one of them and think for five seconds:
$ git checkout mytopic
-fatal: Entry 'frotz' not uptodate. Cannot merge.
+fatal: Entry 'frotz' has local changes. Cannot merge.
If you do not see something wrong with this output, your brain has already
been rotten with use of git for too long a time. Nobody asked us to
"merge" but why are we talking about "Cannot merge"?
This patch introduces a mechanism to allow Porcelains to specify messages
that are different from the ones that is given by the underlying plumbing
implementation of read-tree, so that we can reword the message Porcelains give
without disrupting the output from the plumbing.
$ git-checkout pu
error: You have local changes to 'Makefile'; cannot switch branches.
There are other places that ask unpack_trees() to n-way merge, detect
issues and let it issue error message on its own, but I did this as a
demonstration and replaced only one message.
Yes I know about C99 structure initializers. I'd love to use them but we
try to be nice to compilers without it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.
Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.
The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.
This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.
[ This can - and should - probably be extended upon so that we eventually
never do a bare 'lstat()' on any path entries at *all* when checking the
index, but always check the full path carefully. Right now we do not
generally check the whole path for all our normal quick index
revalidation.
We should also make sure that we're careful about all the invalidation,
ie when we remove a link and replace it by a directory we should
invalidate the symlink cache if it matches (and vice versa for the
directory cache).
But regardless, the basic function needs to be sane to do that. The old
'has_symlink_leading_path()' was not capable enough - or indeed the code
readable enough - to really do that sanely. So I'm pushing this as not
just an optimization, but as a base for further work. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is immaterial on sane filesystems, but if you have a broken (aka
case-insensitive) filesystem, and the objective is to remove the file
'abc' and replace it with the file 'Abc', then we must make sure to do
the removal first.
Otherwise, you'd first update the file 'Abc' - which would just
overwrite the file 'abc' due to the broken case-insensitive filesystem -
and then remove file 'abc' - which would now brokenly remove the just
updated file 'Abc' on that broken filesystem.
By doing removals first, this won't happen.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we find an unexpected file, see if that filename perhaps exists in a
case-insensitive way in the index, and whether the file matches that. If
so, ignore it as a known pre-existing file of a different name.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now nobody uses it, but "index_name_exists()" gets a flag so
you can enable it on a case-by-case basis.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows verify_absent() in unpack_trees() to use the hash chains
rather than looking it up using the binary search.
Perhaps more importantly, it's also going to be useful for the next phase,
where we actually start looking at the cache entry when we do
case-insensitive lookups and checking the result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 34110cd4e3 ("Make 'unpack_trees()'
have a separate source and destination index") I introduced a really
stupid bug in that it would always add merged entries with the CE_UPDATE
flag set. That caused us to always re-write the file, even when it was
already up-to-date in the source index.
Not only is that really stupid from a performance angle, but more
importantly it's actively wrong: if we have dirty state in the tree when
we merge, overwriting it with the result of the merge will incorrectly
overwrite that dirty state.
This trivially fixes the problem - simply don't set the CE_UPDATE flag
when the merge result matches the old state.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
On Sat, 15 Mar 2008, SZEDER G?bor wrote:
>
> The testcase usually fails during the first 25 run, but sometimes it
> runs more than 100 times before failing.
Damn, this series has had more subtle issues than I ever expected.
'git stash' creates its saved working tree object with:
# state of the working tree
w_tree=$( (
rm -f "$TMP-index" &&
cp -p ${GIT_INDEX_FILE-"$GIT_DIR/index"} "$TMP-index" &&
GIT_INDEX_FILE="$TMP-index" &&
export GIT_INDEX_FILE &&
git read-tree -m $i_tree &&
git add -u &&
git write-tree &&
rm -f "$TMP-index"
) ) ||
die "Cannot save the current worktree state"
which creates a new index file with the updates, and writes the tree from
that.
We have this logic where we compare the timestamp of the index with the
timestamp of the files and we then write them out "smudged" if they are
the same, and it basically depends on the fact that the date on the index
file is compared with the date encoded in the stat information itself.
And what is going on is:
- we create a new index file with that "cp". We are careful to preserve
the timestamps by using "-p", so this one should be all ok.
- then we *update* that index by resetting it to the tree with git
read-tree, but now we do *not* preserve the timestamp on this new copy
any more, even though we copy over all the timestamps on the files that
are indexed from the stat information!
Now, we always had that problem when re-writing the index, but we had this
clever workaround in the writing part: if the source had racily clean
entries, then when we wrote those out (and thus can't depend on the index
fiel timestamp showing that they are racily clean any more!), we would
smudge them when writing.
IOW, we handle this issue by having write_index() do this:
for (i = 0; i < entries; i++) {
...
if (is_racy_timestamp(istate, ce))
ce_smudge_racily_clean_entry(ce);
..
when writing out entries. And that all took care of it, because now when
we wrote the new index, we'd change the timestamp on the index, yes, but
we'd smudge the entries we wrote out, so now the resulting index would
still show that file as not-up-to-date any more.
But with commit 34110cd4e3 ("Make
'unpack_trees()' have a separate source and destination index"), this
logic no longer triggers, because we now write out the "result" index, and
that one never got its timestamp updated from the source index, so it had
lost all that "is_racy_timestamp()" information!
This trivial patch fixes it. It looks trivial, and it's a simple fix, but
boy did it take me way too much thinking and explaining to myself to
explain why there was a problem in the first place!
The trivial fix is to just copy the index timestamp from the source index
into the result index. But we only do this if we *have* a source index, of
course, and if we will even bother to use the result.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
read-tree -m can read up to MAX_TREES, which was arbitrarily set to 8 since
August 2007 (4 is needed to deal with 2 merge-base case).
However, the updated unpack_trees() code had an advertised limit of 4
(which it enforced). In reality the code was prepared to take only 3
trees and giving 4 caused it to stomp on its stack. Rename the MAX_TREES
constant to MAX_UNPACK_TREES, move it to the unpack-trees.h common header
file, and use it from both places to avoid future confusion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>