Commit Graph

279 Commits

Author SHA1 Message Date
Junio C Hamano
3b753148b6 Merge branch 'jk/maint-null-in-trees'
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-08-27 11:54:28 -07:00
Junio C Hamano
d0ae7e2e71 Merge branch 'nd/index-errno'
Assignments to errno before calling system functions that used to
matter in the old code were left behind after the code structure
changed sufficiently to make them useless.

* nd/index-errno:
  read_index_from: remove bogus errno assignments
2012-08-22 11:51:42 -07:00
Nguyễn Thái Ngọc Duy
57d84f8d93 read_index_from: remove bogus errno assignments
These assignments comes from the very first commit e83c516 (Initial
revision of "git", the information manager from hell - 2005-04-07).
Back then we did not die() when errors happened so correct errno was
required.

Since 5d1a5c0 ([PATCH] Better error reporting for "git status" -
2005-10-01), read_index_from() learned to die rather than just return
-1 and these assignments became irrelevant. Remove them.

While at it, move die_errno() next to xmmap() call because it's the
mmap's error code that we care about. Otherwise if close(fd); fails,
it could overwrite mmap's errno.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-06 10:01:21 -07:00
Jeff King
4337b5856f do not write null sha1s to on-disk index
We should never need to write the null sha1 into an index
entry (short of the 1 in 2^160 chance that somebody actually
has content that hashes to it). If we attempt to do so, it
is much more likely that it is a bug, since we use the null
sha1 as a sentinel value to mean "not valid".

The presence of null sha1s in the index (which can come
from, among other things, "update-index --cacheinfo", or by
reading a corrupted tree) can cause problems for later
readers, because they cannot distinguish the literal null
sha1 from its use a sentinel value.  For example, "git
diff-files" on such an entry would make it appear as if it
is stat-dirty, and until recently, the diff code assumed
such an entry meant that we should be diffing a working tree
file rather than a blob.

Ideally, we would stop such entries from entering even our
in-core index. However, we do sometimes legitimately add
entries with null sha1s in order to represent these sentinel
situations; simply forbidding them in add_index_entry breaks
a lot of the existing code. However, we can at least make
sure that our in-core sentinel representation never makes it
to disk.

To be thorough, we will test an attempt to add both a blob
and a submodule entry. In the former case, we might run into
problems anyway because we will be missing the blob object.
But in the latter case, we do not enforce connectivity
across gitlink entries, making this our only point of
enforcement. The current implementation does not care which
type of entry we are seeing, but testing both cases helps
future-proof the test suite in case that changes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:13:36 -07:00
Junio C Hamano
30ea575876 Merge branch 'tg/ce-namelen-field'
Split lower bits of ce_flags field and creates a new ce_namelen
field in the in-core index structure.

* tg/ce-namelen-field:
  Strip namelen out of ce_flags into a ce_namelen field
2012-07-23 20:55:21 -07:00
Junio C Hamano
8fc824f397 Merge branch 'tg/maint-cache-name-compare'
Even though the index can record pathnames longer than 1<<12 bytes,
in some places we were not comparing them in full, potentially
replacing index entries instead of adding.

* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-15 21:40:18 -07:00
Thomas Gummerer
b60e188c51 Strip namelen out of ce_flags into a ce_namelen field
Strip the name length from the ce_flags field and move it
into its own ce_namelen field in struct cache_entry. This
will both give us a tiny bit of a performance enhancement
when working with long pathnames and is a refactoring for
more readability of the code.

It enhances readability, by making it more clear what
is a flag, and where the length is stored and make it clear
which functions use stages in comparisions and which only
use the length.

It also makes CE_NAMEMASK private, so that users don't
mistakenly write the name length in the flags.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:42:45 -07:00
Junio C Hamano
01388518c3 Merge branch 'tg/maint-cache-name-compare' into tg/ce-namelen-field
* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-11 09:40:25 -07:00
Junio C Hamano
d5f53338ab cache_name_compare(): do not truncate while comparing paths
We failed to use ce_namelen() equivalent and instead only compared
up to the CE_NAMEMASK bytes by mistake.  Adding an overlong path
that shares the same common prefix as an existing entry in the index
did not add a new entry, but instead replaced the existing one, as
the result.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:25:56 -07:00
Thomas Gummerer
68c4f6a577 Replace strlen() with ce_namelen()
Replace strlen(ce->name) with ce_namelen() in a couple
of places which gives us some additional bits of
performance.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 19:49:34 -07:00
Junio C Hamano
d4a5d872c0 Merge branch 'jc/index-v4'
Trivially shrinks the on-disk size of the index file to save both I/O and
checksum overhead.

The topic should give a solid base to build on further updates, with the
code refactoring in its earlier parts, and the backward compatibility
mechanism in its later parts.

* jc/index-v4:
  index-v4: document the entry format
  unpack-trees: preserve the index file version of original
  update-index: upgrade/downgrade on-disk index version
  read-cache.c: write prefix-compressed names in the index
  read-cache.c: read prefix-compressed names in index on-disk version v4
  read-cache.c: move code to copy incore to ondisk cache to a helper function
  read-cache.c: move code to copy ondisk to incore cache to a helper function
  read-cache.c: report the header version we do not understand
  read-cache.c: make create_from_disk() report number of bytes it consumed
  read-cache.c: allow unaligned mapping of the index file
  cache.h: hide on-disk index details
  varint: make it available outside the context of pack
2012-05-02 13:51:13 -07:00
Junio C Hamano
9d227781b6 read-cache.c: write prefix-compressed names in the index
Teach the code to write the index in the v4 on-disk format.

Record the format version of the on-disk index we read from in the
index_state, and use the format when writing the new index out.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-04 09:57:49 -07:00
Junio C Hamano
6c9cd161d9 read-cache.c: read prefix-compressed names in index on-disk version v4
Because the entries are sorted by path, adjacent entries in the index tend
to share the leading components of them, and it makes sense to only store
the differences in later entries.  In the v4 on-disk format of the index,
each on-disk cache entry stores the number of bytes to be stripped from
the end of the previous name, and the bytes to append to the result, to
come up with its name.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
f136f7bfe8 read-cache.c: move code to copy incore to ondisk cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
3fc22b5331 read-cache.c: move code to copy ondisk to incore cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
0136bac9b8 read-cache.c: report the header version we do not understand
Instead of just saying "bad index version", report the value we read
from the disk.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano
936f53d055 read-cache.c: make create_from_disk() report number of bytes it consumed
The function is the one that is reading from the data stream. It only is
natural to make it responsible for reporting this number, not the caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano
d60c49c2d7 read-cache.c: allow unaligned mapping of the index file
Both the on-disk format v2 and v3 pads the "name" field to the multiple of
eight to make sure that various quantities in network long/short type can
be accessed with ntohl/ntohs without having to worry about alignment, but
this forces us to waste disk I/O bandwidth.

Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were
the regular ntohs()/ntohl() on a field that may not be aligned correctly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano
db3b313c84 cache.h: hide on-disk index details
The on-disk format of the index file is a detail whose implementation is
neatly encapsulated in read-cache.c; there is no need to expose it to the
general public that include the cache.h header file.

Also add a prominent mark to read-cache.c to delineate the parts that deal
with the index file I/O routines from the remainder of the file.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Jeff King
f8582cad8d make is_empty_blob_sha1 available everywhere
The read-cache implementation defines this static function,
but it is a generally useful concept in git. Let's give
the empty blob the same treatment as the empty tree,
providing both hex and binary forms of the sha1.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-23 13:52:13 -07:00
Junio C Hamano
3d1f148c33 refresh_index: do not show unmerged path that is outside pathspec
When running "git add --refresh <pathspec>", we incorrectly showed the
path that is unmerged even if it is outside the specified pathspec, even
though we did honor pathspec and refreshed only the paths that matched.

Note that this cange does not affect "git update-index --refresh"; for
hysterical raisins, it does not take a pathspec (it takes real paths) and
more importantly itss command line options are parsed and executed one by
one as they are encountered, so "git update-index --refresh foo" means
"first refresh the index, and then update the entry 'foo' by hashing the
contents in file 'foo'", not "refresh only entry 'foo'".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-17 10:11:05 -08:00
Junio C Hamano
ef87690b27 Merge branch 'rs/allocate-cache-entry-individually'
* rs/allocate-cache-entry-individually:
  cache.h: put single NUL at end of struct cache_entry
  read-cache.c: allocate index entries individually

Conflicts:
	read-cache.c
2011-12-09 13:36:56 -08:00
Jeff King
73b7eae60c refresh_index: make porcelain output more specific
If you have a deleted file and a porcelain refreshes the
cache, we print:

  Unstaged changes after reset:
  M	file

This is technically correct, in that the file is modified,
but it's friendlier to the user if we further differentiate
the case of a deleted file (especially because this output
looks a lot like "diff --name-status", which would also make
the distinction).

Similarly, we can distinguish typechanges ("T") and
intent-to-add files ("A"), both of which appear as just "M"
in the current output.

The plumbing output for all cases remains "needs update" for
historical compatibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:55:58 -08:00
Jeff King
4bd4e73093 refresh_index: rename format variables
When refreshing the index, for modified (or unmerged) files we will print
"needs update" (or "needs merge") for plumbing, or line similar to the
output from "diff --name-status" for porcelain.

The variables holding which type of message to show are named after the
plumbing messages. However, as we begin to differentiate more cases at the
porcelain level (with the plumbing message staying the same), that naming
scheme will become awkward.

Instead, name the variables after which case we found (modified or
unmerged), not what we will output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:55:05 -08:00
Jeff King
d05e697010 read-cache: let refresh_cache_ent pass up changed flags
This will enable refresh_cache to differentiate more cases
of modification (such as typechange) when telling the user
what isn't fresh.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-18 11:53:46 -08:00
René Scharfe
debed2a629 read-cache.c: allocate index entries individually
The code to estimate the in-memory size of the index based on its on-disk
representation is subtly wrong for certain architecture-dependent struct
layouts.  Instead of fixing it, replace the code to keep the index entries
in a single large block of memory and allocate each entry separately
instead.  This is both simpler and more flexible, as individual entries
can now be freed.  Actually using that added flexibility is left for a
later patch.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 15:25:59 -07:00
René Scharfe
8f41c07f90 read-cache.c: fix index memory allocation
estimate_cache_size() tries to guess how much memory is needed for the
in-memory representation of an index file.  It does that by using the
file size, the number of entries and the difference of the sizes of the
on-disk and in-memory structs -- without having to check the length of
the name of each entry, which varies for each entry, but their sums are
the same no matter the representation.

Except there can be a difference.  First of all, the size is really
calculated by ce_size and ondisk_ce_size based on offsetof(..., name),
not sizeof, which can be different.  And entries are padded with 1 to 8
NULs at the end (after the variable name) to make their total length a
multiple of eight.

So in order to allocate enough memory to hold the index, change the
delta calculation to be based on offsetof(..., name) and round up to
the next multiple of eight.

On a 32-bit Linux, this delta was used before:

	sizeof(struct cache_entry)        == 72
	sizeof(struct ondisk_cache_entry) == 64
	                                    ---
	                                      8

The actual difference for an entry with a filename length of one was,
however (find the definitions are in cache.h):

	offsetof(struct cache_entry, name)        == 72
	offsetof(struct ondisk_cache_entry, name) == 62

	ce_size        == (72 + 1 + 8) & ~7 == 80
	ondisk_ce_size == (62 + 1 + 8) & ~7 == 64
	                                      ---
	                                       16

So eight bytes less had been allocated for such entries.  The new
formula yields the correct delta:

	(72 - 62 + 7) & ~7 == 16

Reported-by: John Hsing <tsyj2007@gmail.com>
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-10-26 14:35:16 -07:00
Junio C Hamano
1952e102b7 Merge branch 'maint'
* maint:
  whitespace: have SP on both sides of an assignment "="
  update-ref: whitespace fix
2011-08-25 16:00:07 -07:00
Junio C Hamano
cd2b8ae983 whitespace: have SP on both sides of an assignment "="
I've deliberately excluded the borrowed code in compat/nedmalloc
directory.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-25 14:47:07 -07:00
Junio C Hamano
033c2dc436 Merge branch 'ef/maint-win-verify-path'
* ef/maint-win-verify-path:
  verify_dotfile(): do not assume '/' is the path seperator
  verify_path(): simplify check at the directory boundary
  verify_path: consider dos drive prefix
  real_path: do not assume '/' is the path seperator
  A Windows path starting with a backslash is absolute
2011-06-29 17:09:17 -07:00
Theo Niessink
e0f530ff8a verify_dotfile(): do not assume '/' is the path seperator
verify_dotfile() currently assumes that the path seperator is '/', but on
Windows it can also be '\\', so use is_dir_sep() instead.

Signed-off-by: Theo Niessink <theo@taletn.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-08 16:34:38 -07:00
Junio C Hamano
3bdf09c7f5 verify_path(): simplify check at the directory boundary
We simply want to say "At a directory boundary, be careful with a name
that begins with a dot, forbid a name that ends with the boundary
character or has duplicated bounadry characters".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-07 12:22:51 -07:00
Erik Faye-Lund
56948cb6aa verify_path: consider dos drive prefix
If someone manage to create a repo with a 'C:' entry in the
root-tree, files can be written outside of the working-dir. This
opens up a can-of-worms of exploits.

Fix it by explicitly checking for a dos drive prefix when verifying
a paht. While we're at it, make sure that paths beginning with '\' is
considered absolute as well.

Noticed-by: Theo Niessink <theo@taletn.com>
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-27 10:59:18 -07:00
Junio C Hamano
c4ce46fc7a index_fd(): turn write_object and format_check arguments into one flag
The "format_check" parameter tucked after the existing parameters is too
ugly an afterthought to live in any reasonable API.

Combine it with the other boolean parameter "write_object" into a single
"flags" parameter.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-05-09 11:58:19 -07:00
Junio C Hamano
44ec754dc7 Merge branch 'jc/index-update-if-able' into maint
* jc/index-update-if-able:
  update $GIT_INDEX_FILE when there are racily clean entries
  diff/status: refactor opportunistic index update
2011-04-03 12:33:05 -07:00
Junio C Hamano
149971badc Merge branch 'jc/index-update-if-able'
* jc/index-update-if-able:
  update $GIT_INDEX_FILE when there are racily clean entries
  diff/status: refactor opportunistic index update
2011-03-26 20:13:16 -07:00
Junio C Hamano
483fbe2b7c update $GIT_INDEX_FILE when there are racily clean entries
Traditional "opportunistic index update" done by read-only "diff" and
"status" was about updating cached lstat(2) information in the index for
the next round.  We missed another obvious optimization opportunity: when
there are racily clean entries that will cease to be racily clean by
updating $GIT_INDEX_FILE.  Detect that case and write $GIT_INDEX_FILE out
to give it a newer timestamp.

Noticed by Lasse Makholm by stracing "git status" in a fresh checkout and
counting the number of open(2) calls.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-21 14:49:46 -07:00
Junio C Hamano
ccdc4ec304 diff/status: refactor opportunistic index update
When we had to refresh the index internally before running diff or status,
we opportunistically updated the $GIT_INDEX_FILE so that later invocation
of git can use the lstat(2) we already did in this invocation.

Make them share a helper function to do so.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-21 12:43:10 -07:00
Junio C Hamano
fc7ae9c156 Merge branch 'nd/hash-object-sanity'
* nd/hash-object-sanity:
  Make hash-object more robust against malformed objects

Conflicts:
	cache.h
2011-02-27 21:58:30 -08:00
Junio C Hamano
d5c87a802d Merge branch 'nd/struct-pathspec'
* nd/struct-pathspec: (22 commits)
  t6004: add pathspec globbing test for log family
  t7810: overlapping pathspecs and depth limit
  grep: drop pathspec_matches() in favor of tree_entry_interesting()
  grep: use writable strbuf from caller for grep_tree()
  grep: use match_pathspec_depth() for cache/worktree grepping
  grep: convert to use struct pathspec
  Convert ce_path_match() to use match_pathspec_depth()
  Convert ce_path_match() to use struct pathspec
  struct rev_info: convert prune_data to struct pathspec
  pathspec: add match_pathspec_depth()
  tree_entry_interesting(): optimize wildcard matching when base is matched
  tree_entry_interesting(): support wildcard matching
  tree_entry_interesting(): fix depth limit with overlapping pathspecs
  tree_entry_interesting(): support depth limit
  tree_entry_interesting(): refactor into separate smaller functions
  diff-tree: convert base+baselen to writable strbuf
  glossary: define pathspec
  Move tree_entry_interesting() to tree-walk.c and export it
  tree_entry_interesting(): remove dependency on struct diff_options
  Convert struct diff_options to use struct pathspec
  ...
2011-02-27 21:17:36 -08:00
Jonathan Nieder
046613c546 update-index --refresh --porcelain: add missing const
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-22 16:51:21 -08:00
Nguyễn Thái Ngọc Duy
c879daa237 Make hash-object more robust against malformed objects
Commits, trees and tags have structure. Don't let users feed git
with malformed ones. Sooner or later git will die() when
encountering them.

Note that this patch does not check semantics. A tree that points
to non-existent objects is perfectly OK (and should be so, users
may choose to add commit first, then its associated tree for example).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-07 15:05:25 -08:00
Nguyễn Thái Ngọc Duy
898bbd9fb4 Convert ce_path_match() to use match_pathspec_depth()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Nguyễn Thái Ngọc Duy
eb9cb55b94 Convert ce_path_match() to use struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-02-03 14:08:30 -08:00
Junio C Hamano
5e738ae820 Merge branch 'jj/icase-directory'
* jj/icase-directory:
  Support case folding in git fast-import when core.ignorecase=true
  Support case folding for git add when core.ignorecase=true
  Add case insensitivity support when using git ls-files
  Add case insensitivity support for directories when using git status
  Case insensitivity support for .gitignore via core.ignorecase
  Add string comparison functions that respect the ignore_case variable.
  Makefile & configure: add a NO_FNMATCH_CASEFOLD flag
  Makefile & configure: add a NO_FNMATCH flag

Conflicts:
	Makefile
	config.mak.in
	configure.ac
	fast-import.c
2010-12-03 16:10:34 -08:00
Joshua Jensen
dc1ae70487 Support case folding for git add when core.ignorecase=true
When MyDir/ABC/filea.txt is added to Git, the disk directory MyDir/ABC/
is renamed to mydir/aBc/, and then mydir/aBc/fileb.txt is added, the
index will contain MyDir/ABC/filea.txt and mydir/aBc/fileb.txt. Although
the earlier portions of this patch series account for those differences
in case, this patch makes the pathing consistent by folding the case of
newly added files against the first file added with that path.

In read-cache.c's add_to_index(), the index_name_exists() support used
for git status's case insensitive directory lookups is used to find the
proper directory case according to what the user already checked in.
That is, MyDir/ABC/'s case is used to alter the stored path for
fileb.txt to MyDir/ABC/fileb.txt (instead of mydir/aBc/fileb.txt).

This is especially important when cloning a repository to a case
sensitive file system. MyDir/ABC/ and mydir/aBc/ exist in the same
directory on a Windows machine, but on Linux, the files exist in two
separate directories. The update to add_to_index(), in effect, treats a
Windows file system as case sensitive by making path case consistent.

Signed-off-by: Joshua Jensen <jjensen@workspacewhiz.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-06 11:19:59 -07:00
Jonathan Nieder
59efba64ac core: Stop leaking ondisk_cache_entrys
Noticed with valgrind.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-08-11 09:57:43 -07:00
Shawn O. Pearce
b659b49bb0 Correct spelling of 'REUC' extension
The new dircache extension CACHE_EXT_RESOLVE_UNDO, whose value is
0x52455543, is actually the ASCII sequence 'REUC', not the ASCII
sequence 'REUN'.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-02 09:54:34 -08:00
Junio C Hamano
125fd98434 Make ce_uptodate() trustworthy again
The rule has always been that a cache entry that is ce_uptodate(ce)
means that we already have checked the work tree entity and we know
there is no change in the work tree compared to the index, and nobody
should have to double check.  Note that false ce_uptodate(ce) does not
mean it is known to be dirty---it only means we don't know if it is
clean.

There are a few codepaths (refresh-index and preload-index are among
them) that mark a cache entry as up-to-date based solely on the return
value from ie_match_stat(); this function uses lstat() to see if the
work tree entity has been touched, and for a submodule entry, if its
HEAD points at the same commit as the commit recorded in the index of
the superproject (a submodule that is not even cloned is considered
clean).

A submodule is no longer considered unmodified merely because its HEAD
matches the index of the superproject these days, in order to prevent
people from forgetting to commit in the submodule and updating the
superproject index with the new submodule commit, before commiting the
state in the superproject.  However, the patch to do so didn't update
the codepath that marks cache entries up-to-date based on the updated
definition and instead worked it around by saying "we don't trust the
return value of ce_uptodate() for submodules."

This makes ce_uptodate() trustworthy again by not marking submodule
entries up-to-date.

The next step _could_ be to introduce a few "in-core" flag bits to
cache_entry structure to record "this entry is _known_ to be dirty",
call is_submodule_modified() from ie_match_stat(), and use these new
bits to avoid running this rather expensive check more than once, but
that can be a separate patch.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-24 00:15:29 -08:00
Linus Torvalds
fb7d3f32b2 Remove diff machinery dependency from read-cache
Exal Sibeaz pointed out that some git files are way too big, and that
add_files_to_cache() brings in all the diff machinery to any git binary
that needs the basic git SHA1 object operations from read-cache.c. Which
is pretty much all of them.

It's doubly silly, since add_files_to_cache() is only used by builtin
programs (add, checkout and commit), so it's fairly easily fixed by just
moving the thing to builtin-add.c, and avoiding the dependency entirely.

I initially argued to Exal that it would probably be best to try to depend
on smart compilers and linkers, but after spending some time trying to
make -ffunction-sections work and giving up, I think Exal was right, and
the fix is to just do some trivial cleanups like this.

This trivial cleanup results in pretty stunning file size differences.
The diff machinery really is mostly used by just the builtin programs, and
you have things like these trivial before-and-after numbers:

  -rwxr-xr-x 1 torvalds torvalds 1727420 2010-01-21 10:53 git-hash-object
  -rwxrwxr-x 1 torvalds torvalds  940265 2010-01-21 11:16 git-hash-object

Now, I'm not saying that 940kB is good either, but that's mostly all the
debug information - you can see the real code with 'size':

   text	   data	    bss	    dec	    hex	filename
 418675	   3920	 127408	 550003	  86473	git-hash-object (before)
 230650	   2288	 111728	 344666	  5425a	git-hash-object (after)

ie we have a nice 24% size reduction from this trivial cleanup.

It's not just that one file either. I get:

	[torvalds@nehalem git]$ du -s /home/torvalds/libexec/git-core
	45640	/home/torvalds/libexec/git-core (before)
	33508	/home/torvalds/libexec/git-core (after)

so we're talking 12MB of diskspace here.

(Of course, stripping all the binaries brings the 33MB down to 9MB, so the
whole debug information thing is still the bulk of it all, but that's a
separate issue entirely)

Now, I'm sure there are other things we should do, and changing our
compiler flags from -O2 to -Os would bring the text size down by an
additional almost 20%, but this thing Exal pointed out seems to be some
good low-hanging fruit.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-21 17:05:13 -08:00