Commit Graph

311 Commits

Author SHA1 Message Date
Junio C Hamano
d6a58b7773 Merge branch 'es/name-hash-no-trailing-slash-in-dirs'
Clean up the internal of the name-hash mechanism used to work
around case insensitivity on some filesystems to cleanly fix a
long-standing API glitch where the caller of cache_name_exists()
that ask about a directory with a counted string was required to
have '/' at one location past the end of the string.

* es/name-hash-no-trailing-slash-in-dirs:
  dir: revert work-around for retired dangerous behavior
  name-hash: stop storing trailing '/' on paths in index_state.dir_hash
  employ new explicit "exists in index?" API
  name-hash: refactor polymorphic index_name_exists()
2013-10-17 15:55:16 -07:00
Junio C Hamano
541dc4dfa0 Merge branch 'jk/write-broken-index-with-nul-sha1'
Earlier we started rejecting an attempt to add 0{40} object name to
the index and to tree objects, but it sometimes is necessary to
allow so to be able to use tools like filter-branch to correct such
broken tree objects.

* jk/write-broken-index-with-nul-sha1:
  write_index: optionally allow broken null sha1s
2013-09-17 11:40:27 -07:00
Eric Sunshine
d28eec2673 name-hash: stop storing trailing '/' on paths in index_state.dir_hash
When 5102c617 (Add case insensitivity support for directories when using
git status, 2010-10-03) added directories to the name-hash there was
only a single hash table in which both real cache entries and leading
directory prefixes were registered. To distinguish between the two types
of entries, directories were stored with a trailing '/'.

2092678c (name-hash.c: fix endless loop with core.ignorecase=true,
2013-02-28), however, moved directories to a separate hash table
(index_state.dir_hash) but retained the (now) redundant trailing '/',
thus callers continue to bear the burden of ensuring the slash's
presence before searching the index for a directory. Eliminate this
redundancy by storing paths in the dir-hash without the trailing '/'.

An important benefit of this change is that it eliminates undocumented
and dangerous behavior of dir.c:directory_exists_in_index_icase() in
which it assumes not only that it can validly access one character
beyond the end of its incoming directory argument, but also that that
character will unconditionally be a '/'. This perilous behavior was
"tolerated" because the string passed in by its lone caller always had a
'/' in that position, however, things broke [1] when 2eac2a4c (ls-files
-k: a directory only can be killed if the index has a non-directory,
2013-08-15) added a new caller which failed to respect the undocumented
assumption.

[1]: http://thread.gmane.org/gmane.comp.version-control.git/232727

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:08:07 -07:00
Eric Sunshine
ebbd7439b1 employ new explicit "exists in index?" API
Each caller of index_name_exists() knows whether it is looking for a
directory or a file, and can avoid the unnecessary indirection of
index_name_exists() by instead calling index_dir_exists() or
index_file_exists() directly.

Invoking the appropriate search function explicitly will allow a
subsequent patch to relieve callers of the artificial burden of having
to add a trailing '/' to the pathname given to index_dir_exists().

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-09-17 10:07:37 -07:00
Junio C Hamano
b0d974d6d9 Merge branch 'tg/index-struct-sizes'
The code that reads from a region that mmaps an on-disk index
assumed that "int"/"short" are always 32/16 bits.

* tg/index-struct-sizes:
  read-cache: use fixed width integer types
2013-09-09 14:50:38 -07:00
Junio C Hamano
b02f5aeda6 Merge branch 'jl/submodule-mv'
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.

* jl/submodule-mv: (53 commits)
  rm: delete .gitmodules entry of submodules removed from the work tree
  mv: update the path entry in .gitmodules for moved submodules
  submodule.c: add .gitmodules staging helper functions
  mv: move submodules using a gitfile
  mv: move submodules together with their work trees
  rm: do not set a variable twice without intermediate reading.
  t6131 - skip tests if on case-insensitive file system
  parse_pathspec: accept :(icase)path syntax
  pathspec: support :(glob) syntax
  pathspec: make --literal-pathspecs disable pathspec magic
  pathspec: support :(literal) syntax for noglob pathspec
  kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
  parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
  parse_pathspec: make sure the prefix part is wildcard-free
  rename field "raw" to "_raw" in struct pathspec
  tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
  remove match_pathspec() in favor of match_pathspec_depth()
  remove init_pathspec() in favor of parse_pathspec()
  remove diff_tree_{setup,release}_paths
  convert common_prefix() to use struct pathspec
  ...
2013-09-09 14:36:15 -07:00
Jeff King
83bd7437ca write_index: optionally allow broken null sha1s
Commit 4337b58 (do not write null sha1s to on-disk index,
2012-07-28) added a safety check preventing git from writing
null sha1s into the index. The intent was to catch errors in
other parts of the code that might let such an entry slip
into the index (or worse, a tree).

Some existing repositories may have invalid trees that
contain null sha1s already, though.  Until 4337b58, a common
way to clean this up would be to use git-filter-branch's
index-filter to repair such broken entries.  That now fails
when filter-branch tries to write out the index.

Introduce a GIT_ALLOW_NULL_SHA1 environment variable to
relax this check and make it easier to recover from such a
history.

It is tempting to not involve filter-branch in this commit
at all, and instead require the user to manually invoke

	GIT_ALLOW_NULL_SHA1=1 git filter-branch ...

to perform an index-filter on a history with trees with null
sha1s.  That would be slightly safer, but requires some
specialized knowledge from the user.  So let's set the
GIT_ALLOW_NULL_SHA1 variable automatically when checking out
the to-be-filtered trees.  Advice on using filter-branch to
remove such entries already exists on places like
stackoverflow, and this patch makes it Just Work again on
recent versions of git.

Further commands that touch the index will still notice and
fail, unless they actually remove the broken entries.  A
filter-branch whose filters do not touch the index at all
will not error out (since we complain of the null sha1 only
on writing, not when making a tree out of the index), but
this is acceptable, as we still print a loud warning, so the
problem is unlikely to go unnoticed.

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-28 20:54:43 -07:00
Thomas Gummerer
7800c1ebcc read-cache: use fixed width integer types
Use the fixed width integer types uint16_t and uint32_t for on-disk
structures; unsigned short and unsigned int do not have a guaranteed
size.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-20 12:29:42 -07:00
Ondřej Bílka
98e023dea4 many small typofixes
Signed-off-by: Ondřej Bílka <neleai@seznam.cz>
Reviewed-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-29 12:32:25 -07:00
Junio C Hamano
65ed8684c4 Merge branch 'rs/discard-index-discard-array' into maint
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-07-19 10:39:01 -07:00
Nguyễn Thái Ngọc Duy
9b2d61499b convert refresh_index to take struct pathspec
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-15 10:56:08 -07:00
Nguyễn Thái Ngọc Duy
9c5e6c802c Convert "struct cache_entry *" to "const ..." wherever possible
I attempted to make index_state->cache[] a "const struct cache_entry **"
to find out how existing entries in index are modified and where. The
question I have is what do we do if we really need to keep track of on-disk
changes in the index. The result is

 - diff-lib.c: setting CE_UPTODATE

 - name-hash.c: setting CE_HASHED

 - preload-index.c, read-cache.c, unpack-trees.c and
   builtin/update-index: obvious

 - entry.c: write_entry() may refresh the checked out entry via
   fill_stat_cache_info(). This causes "non-const struct cache_entry
   *" in builtin/apply.c, builtin/checkout-index.c and
   builtin/checkout.c

 - builtin/ls-files.c: --with-tree changes stagemask and may set
   CE_UPDATE

Of these, write_entry() and its call sites are probably most
interesting because it modifies on-disk info. But this is stat info
and can be retrieved via refresh, at least for porcelain
commands. Other just uses ce_flags for local purposes.

So, keeping track of "dirty" entries is just a matter of setting a
flag in index modification functions exposed by read-cache.c. Except
unpack-trees, the rest of the code base does not do anything funny
behind read-cache's back.

The actual patch is less valueable than the summary above. But if
anyone wants to re-identify the above sites. Applying this patch, then
this:

    diff --git a/cache.h b/cache.h
    index 430d021..1692891 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -267,7 +267,7 @@ static inline unsigned int canon_mode(unsigned int mode)
     #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)

     struct index_state {
    -	struct cache_entry **cache;
    +	const struct cache_entry **cache;
     	unsigned int version;
     	unsigned int cache_nr, cache_alloc, cache_changed;
     	struct string_list *resolve_undo;

will help quickly identify them without bogus warnings.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-09 09:12:48 -07:00
Junio C Hamano
ac5611a1cc Merge branch 'fc/do-not-use-the-index-in-add-to-index' into maint
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-07-03 15:40:38 -07:00
Junio C Hamano
079424a2cf Merge branch 'mh/ref-races'
"git pack-refs" that races with new ref creation or deletion have
been susceptible to lossage of refs under right conditions, which
has been tightened up.

* mh/ref-races:
  for_each_ref: load all loose refs before packed refs
  get_packed_ref_cache: reload packed-refs file when it changes
  add a stat_validity struct
  Extract a struct stat_data from cache_entry
  packed_ref_cache: increment refcount when locked
  do_for_each_entry(): increment the packed refs cache refcount
  refs: manage lifetime of packed refs cache via reference counting
  refs: implement simple transactions for the packed-refs file
  refs: wrap the packed refs cache in a level of indirection
  pack_refs(): split creation of packed refs and entry writing
  repack_without_ref(): split list curation and entry writing
2013-06-30 15:40:05 -07:00
Junio C Hamano
08bcd774f4 Merge branch 'rs/discard-index-discard-array'
* rs/discard-index-discard-array:
  read-cache: free cache in discard_index
  read-cache: add simple performance test
2013-06-20 16:02:30 -07:00
Michael Haggerty
3861253224 add a stat_validity struct
It can sometimes be useful to know whether a path in the
filesystem has been updated without going to the work of
opening and re-reading its content. We trust the stat()
information on disk already to handle index updates, and we
can use the same trick here.

This patch introduces a "stat_validity" struct which
encapsulates the concept of checking the stat-freshness of a
file. It is implemented on top of "struct stat_data" to
reuse the logic about which stat entries to trust for a
particular platform, but hides the complexity behind two
simple functions: check and update.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Michael Haggerty
c21d39d7c7 Extract a struct stat_data from cache_entry
Add public functions fill_stat_data() and match_stat_data() to work
with it.  This infrastructure will later be used to check the validity
of other types of file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-20 15:50:17 -07:00
Junio C Hamano
6bf2227b92 Merge branch 'fc/do-not-use-the-index-in-add-to-index'
* fc/do-not-use-the-index-in-add-to-index:
  read-cache: trivial style cleanups
  read-cache: fix wrong 'the_index' usage
2013-06-11 13:30:28 -07:00
René Scharfe
a0fc4db01d read-cache: free cache in discard_index
discard_cache doesn't have to free the array of cache entries, because
the next call of read_cache can simply reuse it, as they all operate on
the global variable the_index.

discard_index on the other hand does have to free it, because it can be
used e.g. with index_state variables on the stack, in which case a
missing free would cause an unrecoverable leak.  This patch releases the
memory and removes a comment that was relevant for discard_cache but has
become outdated.

Since discard_cache is just a wrapper around discard_index nowadays, we
lose the optimization that avoids reallocation of that array within
loops of read_cache and discard_cache.  That doesn't cause a performance
regression for me, however (HEAD = this patch, HEAD^ = master + p0002):

  Test           //              HEAD^             HEAD
  ---------------\\-----------------------------------------------------
  0002.1: read_ca// 1000 times   0.62(0.58+0.04)   0.61(0.58+0.02) -1.6%

Suggested-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-09 17:03:01 -07:00
Felipe Contreras
c4aa3167fe read-cache: trivial style cleanups
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:38 -07:00
Felipe Contreras
582eb8536b read-cache: fix wrong 'the_index' usage
We are dealing with the 'istate' index, not 'the_index'.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-03 10:10:25 -07:00
René Scharfe
21a6b9fa42 read-cache: mark cache_entry pointers const
ie_match_stat and ie_modified only derefence their struct cache_entry
pointers for reading.  Add const to the parameter declaration here and
do the same for the static helper function used by them, as it's the
same there as well.  This allows callers to pass in const pointers.

Signed-off-by: René Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-02 15:31:12 -07:00
Junio C Hamano
4b35b007a6 Merge branch 'lf/read-blob-data-from-index'
Reduce duplicated code between convert.c and attr.c.

* lf/read-blob-data-from-index:
  convert.c: remove duplicate code
  read_blob_data_from_index(): optionally return the size of blob data
  attr.c: extract read_index_data() as read_blob_data_from_index()
2013-04-21 18:39:45 -07:00
Lukas Fleischer
ff36682505 read_blob_data_from_index(): optionally return the size of blob data
This allows for optionally getting the size of the returned data and
will be used in a follow-up patch.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:51:47 -07:00
Lukas Fleischer
29fb37b272 attr.c: extract read_index_data() as read_blob_data_from_index()
Extract the read_index_data() function from attr.c and move it to
read-cache.c; rename it to read_blob_data_from_index() and update
the function signature of it to align better with index/cache API
functions.

This allows for reusing the function in convert.c later.

Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 09:49:11 -07:00
Junio C Hamano
c81e2c61b3 Merge branch 'kb/name-hash' into maint-1.8.1
* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-03 08:44:54 -07:00
Junio C Hamano
c044bed8f0 Merge branch 'kb/name-hash'
The code to keep track of what directory names are known to Git on
platforms with case insensitive filesystems can get confused upon
a hash collision between these pathnames and looped forever.

* kb/name-hash:
  name-hash.c: fix endless loop with core.ignorecase=true
2013-04-01 08:59:53 -07:00
Junio C Hamano
865e99b5fd Merge branch 'nd/doc-index-format'
Update the index format documentation to mention the v4 format.

* nd/doc-index-format:
  update-index: list supported idx versions and their features
  read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()
  index-format.txt: mention of v4 is missing in some places
2013-03-19 12:15:14 -07:00
Karsten Blees
2092678cd5 name-hash.c: fix endless loop with core.ignorecase=true
With core.ignorecase=true, name-hash.c builds a case insensitive index of
all tracked directories. Currently, the existing cache entry structures are
added multiple times to the same hashtable (with different name lengths and
hash codes). However, there's only one dir_next pointer, which gets
completely messed up in case of hash collisions. In the worst case, this
causes an endless loop if ce == ce->dir_next (see t7062).

Use a separate hashtable and separate structures for the directory index
so that each directory entry has its own next pointer. Use reference
counting to track which directory entry contains files.

There are only slight changes to the name-hash.c API:
- new free_name_hash() used by read_cache.c::discard_index()
- remove_name_hash() takes an additional index_state parameter
- index_name_exists() for a directory (trailing '/') may return a cache
  entry that has been removed (CE_UNHASHED). This is not a problem as the
  return value is only used to check if the directory exists (dir.c) or to
  normalize casing of directory names (read-cache.c).

Getting rid of cache_entry.dir_next reduces memory consumption, especially
with core.ignorecase=false (which doesn't use that member at all).

With core.ignorecase=true, building the directory index is slightly faster
as we add / check the parent directory first (instead of going through all
directory levels for each file in the index). E.g. with WebKit (~200k
files, ~7k dirs), time spent in lazy_init_name_hash is reduced from 176ms
to 130ms.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-27 23:29:04 -08:00
Nguyễn Thái Ngọc Duy
b82a7b5bbc read-cache.c: use INDEX_FORMAT_{LB,UB} in verify_hdr()
9d22778 (read-cache.c: write prefix-compressed names in the index -
2012-04-04) defined these. Interestingly, they were not used by
read-cache.c, or anywhere in that patch. They were used in
builtin/update-index.c later for checking supported index
versions. Use them here too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-22 12:48:41 -08:00
Robin Rosenberg
c08e4d5b5c Enable minimal stat checking
Specifically the fields uid, gid, ctime, ino and dev are set to zero
by JGit. Other implementations, eg. Git in cygwin are allegedly also
somewhat incompatible with Git For Windows and on *nix platforms
the resolution of the timestamps may differ.

Any stat checking by git will then need to check content, which may
be very slow, particularly on Windows. Since mtime and size
is typically enough we should allow the user to tell git to avoid
checking these fields if they are set to zero in the index.

This change introduces a core.checkstat config option where the
the user can select to check all fields (default), or just size
and the whole second part of mtime (minimal).

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-22 09:33:16 -08:00
Junio C Hamano
357e9c69c9 read-cache.c: mark a private file-scope symbol as static
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-15 22:58:21 -07:00
Junio C Hamano
3b753148b6 Merge branch 'jk/maint-null-in-trees'
We do not want a link to 0{40} object stored anywhere in our objects.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-08-27 11:54:28 -07:00
Junio C Hamano
d0ae7e2e71 Merge branch 'nd/index-errno'
Assignments to errno before calling system functions that used to
matter in the old code were left behind after the code structure
changed sufficiently to make them useless.

* nd/index-errno:
  read_index_from: remove bogus errno assignments
2012-08-22 11:51:42 -07:00
Nguyễn Thái Ngọc Duy
57d84f8d93 read_index_from: remove bogus errno assignments
These assignments comes from the very first commit e83c516 (Initial
revision of "git", the information manager from hell - 2005-04-07).
Back then we did not die() when errors happened so correct errno was
required.

Since 5d1a5c0 ([PATCH] Better error reporting for "git status" -
2005-10-01), read_index_from() learned to die rather than just return
-1 and these assignments became irrelevant. Remove them.

While at it, move die_errno() next to xmmap() call because it's the
mmap's error code that we care about. Otherwise if close(fd); fails,
it could overwrite mmap's errno.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-06 10:01:21 -07:00
Jeff King
4337b5856f do not write null sha1s to on-disk index
We should never need to write the null sha1 into an index
entry (short of the 1 in 2^160 chance that somebody actually
has content that hashes to it). If we attempt to do so, it
is much more likely that it is a bug, since we use the null
sha1 as a sentinel value to mean "not valid".

The presence of null sha1s in the index (which can come
from, among other things, "update-index --cacheinfo", or by
reading a corrupted tree) can cause problems for later
readers, because they cannot distinguish the literal null
sha1 from its use a sentinel value.  For example, "git
diff-files" on such an entry would make it appear as if it
is stat-dirty, and until recently, the diff code assumed
such an entry meant that we should be diffing a working tree
file rather than a blob.

Ideally, we would stop such entries from entering even our
in-core index. However, we do sometimes legitimately add
entries with null sha1s in order to represent these sentinel
situations; simply forbidding them in add_index_entry breaks
a lot of the existing code. However, we can at least make
sure that our in-core sentinel representation never makes it
to disk.

To be thorough, we will test an attempt to add both a blob
and a submodule entry. In the former case, we might run into
problems anyway because we will be missing the blob object.
But in the latter case, we do not enforce connectivity
across gitlink entries, making this our only point of
enforcement. The current implementation does not care which
type of entry we are seeing, but testing both cases helps
future-proof the test suite in case that changes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:13:36 -07:00
Junio C Hamano
30ea575876 Merge branch 'tg/ce-namelen-field'
Split lower bits of ce_flags field and creates a new ce_namelen
field in the in-core index structure.

* tg/ce-namelen-field:
  Strip namelen out of ce_flags into a ce_namelen field
2012-07-23 20:55:21 -07:00
Junio C Hamano
8fc824f397 Merge branch 'tg/maint-cache-name-compare'
Even though the index can record pathnames longer than 1<<12 bytes,
in some places we were not comparing them in full, potentially
replacing index entries instead of adding.

* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-15 21:40:18 -07:00
Thomas Gummerer
b60e188c51 Strip namelen out of ce_flags into a ce_namelen field
Strip the name length from the ce_flags field and move it
into its own ce_namelen field in struct cache_entry. This
will both give us a tiny bit of a performance enhancement
when working with long pathnames and is a refactoring for
more readability of the code.

It enhances readability, by making it more clear what
is a flag, and where the length is stored and make it clear
which functions use stages in comparisions and which only
use the length.

It also makes CE_NAMEMASK private, so that users don't
mistakenly write the name length in the flags.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:42:45 -07:00
Junio C Hamano
01388518c3 Merge branch 'tg/maint-cache-name-compare' into tg/ce-namelen-field
* tg/maint-cache-name-compare:
  cache_name_compare(): do not truncate while comparing paths
2012-07-11 09:40:25 -07:00
Junio C Hamano
d5f53338ab cache_name_compare(): do not truncate while comparing paths
We failed to use ce_namelen() equivalent and instead only compared
up to the CE_NAMEMASK bytes by mistake.  Adding an overlong path
that shares the same common prefix as an existing entry in the index
did not add a new entry, but instead replaced the existing one, as
the result.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-11 09:25:56 -07:00
Thomas Gummerer
68c4f6a577 Replace strlen() with ce_namelen()
Replace strlen(ce->name) with ce_namelen() in a couple
of places which gives us some additional bits of
performance.

Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 19:49:34 -07:00
Junio C Hamano
d4a5d872c0 Merge branch 'jc/index-v4'
Trivially shrinks the on-disk size of the index file to save both I/O and
checksum overhead.

The topic should give a solid base to build on further updates, with the
code refactoring in its earlier parts, and the backward compatibility
mechanism in its later parts.

* jc/index-v4:
  index-v4: document the entry format
  unpack-trees: preserve the index file version of original
  update-index: upgrade/downgrade on-disk index version
  read-cache.c: write prefix-compressed names in the index
  read-cache.c: read prefix-compressed names in index on-disk version v4
  read-cache.c: move code to copy incore to ondisk cache to a helper function
  read-cache.c: move code to copy ondisk to incore cache to a helper function
  read-cache.c: report the header version we do not understand
  read-cache.c: make create_from_disk() report number of bytes it consumed
  read-cache.c: allow unaligned mapping of the index file
  cache.h: hide on-disk index details
  varint: make it available outside the context of pack
2012-05-02 13:51:13 -07:00
Junio C Hamano
9d227781b6 read-cache.c: write prefix-compressed names in the index
Teach the code to write the index in the v4 on-disk format.

Record the format version of the on-disk index we read from in the
index_state, and use the format when writing the new index out.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-04 09:57:49 -07:00
Junio C Hamano
6c9cd161d9 read-cache.c: read prefix-compressed names in index on-disk version v4
Because the entries are sorted by path, adjacent entries in the index tend
to share the leading components of them, and it makes sense to only store
the differences in later entries.  In the v4 on-disk format of the index,
each on-disk cache entry stores the number of bytes to be stripped from
the end of the previous name, and the bytes to append to the result, to
come up with its name.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
f136f7bfe8 read-cache.c: move code to copy incore to ondisk cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
3fc22b5331 read-cache.c: move code to copy ondisk to incore cache to a helper function
This makes the change in a later patch look less scary.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:46 -07:00
Junio C Hamano
0136bac9b8 read-cache.c: report the header version we do not understand
Instead of just saying "bad index version", report the value we read
from the disk.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano
936f53d055 read-cache.c: make create_from_disk() report number of bytes it consumed
The function is the one that is reading from the data stream. It only is
natural to make it responsible for reporting this number, not the caller.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00
Junio C Hamano
d60c49c2d7 read-cache.c: allow unaligned mapping of the index file
Both the on-disk format v2 and v3 pads the "name" field to the multiple of
eight to make sure that various quantities in network long/short type can
be accessed with ntohl/ntohs without having to worry about alignment, but
this forces us to waste disk I/O bandwidth.

Introduce ntoh_s()/ntoh_l() macros that the callers can use as if they were
the regular ntohs()/ntohl() on a field that may not be aligned correctly.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-03 16:24:45 -07:00