If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know the positions of replaced entries via the replace bitmap in
"link" extension, so the "name" path does not have to be stored (it's
still in the shared index). With this, we also have a way to
distinguish additions vs replacements at load time and can catch
broken "link" extensions.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We are sure that after merge_base_index() is done. cache-tree can
still be used with the final index. So don't destroy cache tree.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This split-index mode is designed to keep write cost proportional to
the number of changes the user has made, not the size of the work
tree. (Read cost is another matter, to be dealt separately.)
This mode stores index info in a pair of $GIT_DIR/index and
$GIT_DIR/sharedindex.<SHA-1>. sharedindex is large and unchanged over
time while "index" is smaller and updated often. Format details are in
index-format.txt, although not everything is implemented in this
patch.
Shared indexes are not automatically removed, because it's unclear if
the shared index is needed by any (even temporary) indexes by just
looking at it. After a while you'll collect stale shared indexes. The
good news is one shared index is useable for long, until
$GIT_DIR/index becomes too big and sluggish that the new shared index
must be created.
The safest way to clean shared indexes is to turn off split index
mode, so shared files are all garbage, delete them all, then turn on
split index mode again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Also update SHA-1 after writing. If we do not do that, the second
read_index() will see "initialized" variable already set and not read
.git/index again, which is fine, except istate->sha1 now has a stale
value.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're running out of room for in-memory flags. But since b60e188
(Strip namelen out of ce_flags into a ce_namelen field - 2012-07-11),
we copy the namelen (first 12 bits) to ce_namelen field. So those bits
are free to use. Just make sure we do not accidentally write any
in-memory flags back.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is now only used by write_locked_index(). Move it to
read-cache.c (because read-cache.c will need to be aware of
alternate_index_output later) and unexport it.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mh/replace-refs-variable-rename:
Document some functions defined in object.c
Add docstrings for lookup_replace_object() and do_lookup_replace_object()
rename read_replace_refs to check_replace_refs
* mh/object-code-cleanup:
sha1_file.c: document a bunch of functions defined in the file
sha1_file_name(): declare to return a const string
find_pack_entry(): document last_found_pack
replace_object: use struct members instead of an array
Tighten codepaths that parse timestamps in commit objects.
* jk/commit-dates-parsing-fix:
show_ident_date: fix tz range check
log: do not segfault on gmtime errors
log: handle integer overflow in timestamps
date: check date overflow against time_t
fsck: report integer overflow in author timestamps
t4212: test bogus timestamps with git-log
"git config" learned to read from the standard input when "-" is
given as the value to its "--file" parameter (attempting an
operation to update the configuration in the standard input of
course is rejected).
* ks/config-file-stdin:
config: teach "git config --file -" to read from the standard input
config: change git_config_with_options() interface
builtin/config.c: rename check_blob_write() -> check_write()
config: disallow relative include paths from blobs
Allow running "gc --auto" in the background.
* nd/daemonize-gc:
gc: config option for running --auto in background
daemon: move daemonize() to libgit.a
--sort=version:refname (or --sort=v:refname for short) sorts tags as
if they are versions. --sort=-refname reverses the order (with or
without ":version").
versioncmp() is copied from string/strverscmp.c in glibc commit
ee9247c38a8def24a59eb5cfb7196a98bef8cfdc, reformatted to Git coding
style. The implementation is under LGPL-2.1 and according to [1] I can
relicense it to GPLv2.
[1] http://www.gnu.org/licenses/gpl-faq.html#AllCompatibility
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Borrow the bitmap index into packfiles from JGit to speed up
enumeration of objects involved in a commit range without having to
fully traverse the history.
* jk/pack-bitmap: (26 commits)
ewah: unconditionally ntohll ewah data
ewah: support platforms that require aligned reads
read-cache: use get_be32 instead of hand-rolled ntoh_l
block-sha1: factor out get_be and put_be wrappers
do not discard revindex when re-preparing packfiles
pack-bitmap: implement optional name_hash cache
t/perf: add tests for pack bitmaps
t: add basic bitmap functionality tests
count-objects: recognize .bitmap in garbage-checking
repack: consider bitmaps when performing repacks
repack: handle optional files created by pack-objects
repack: turn exts array into array-of-struct
repack: stop using magic number for ARRAY_SIZE(exts)
pack-objects: implement bitmap writing
rev-list: add bitmap mode to speed up object lists
pack-objects: use bitmaps when packing objects
pack-objects: split add_object_entry
pack-bitmap: add support for bitmap indexes
documentation: add documentation for the bitmap format
ewah: compressed bitmap implementation
...
Allow "git cmd path/", when the 'path' is where a submodule is
bound to the top-level working tree, to match 'path', despite the
extra and unnecessary trailing slash.
* nd/submodule-pathspec-ending-with-slash:
clean: use cache_name_is_other()
clean: replace match_pathspec() with dir_path_match()
pathspec: pass directory indicator to match_pathspec_item()
match_pathspec: match pathspec "foo/" against directory "foo"
dir.c: prepare match_pathspec_item for taking more flags
pathspec: rename match_pathspec_depth() to match_pathspec()
pathspec: convert some match_pathspec_depth() to dir_path_match()
pathspec: convert some match_pathspec_depth() to ce_path_match()
Allow "merge-recursive" to work in an empty (temporary) working
tree again when there are renames involved, correcting an old
regression in 1.7.7 era.
* bk/refresh-missing-ok-in-merge-recursive:
merge-recursive.c: tolerate missing files while refreshing index
read-cache.c: extend make_cache_entry refresh flag with options
read-cache.c: refactor --ignore-missing implementation
t3030-merge-recursive: test known breakage with empty work tree
Improvements to our hash table to get it to meet the needs of the
msysgit fscache project, with some nice performance improvements.
* kb/fast-hashmap:
name-hash: retire unused index_name_exists()
hashmap.h: use 'unsigned int' for hash-codes everywhere
test-hashmap.c: drop unnecessary #includes
.gitignore: test-hashmap is a generated file
read-cache.c: fix memory leaks caused by removed cache entries
builtin/update-index.c: cleanup update_one
fix 'git update-index --verbose --again' output
remove old hash.[ch] implementation
name-hash.c: remove cache entries instead of marking them CE_UNHASHED
name-hash.c: use new hash map implementation for cache entries
name-hash.c: remove unreferenced directory entries
name-hash.c: use new hash map implementation for directories
diffcore-rename.c: use new hash map implementation
diffcore-rename.c: simplify finding exact renames
diffcore-rename.c: move code around to prepare for the next patch
buitin/describe.c: use new hash map implementation
add a hashtable implementation that supports O(1) removal
submodule: don't access the .gitmodules cache entry after removing it
db5360f3f4 (name-hash: refactor polymorphic index_name_exists();
2013-09-17) split index_name_exists() into index_file_exists() and
index_dir_exists() but retained index_name_exists() as a thin wrapper
to avoid disturbing possible in-flight topics. Since this change
landed in 'master' some time ago and there are no in-flight topics
referencing index_name_exists(), retire it.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This helps reduce the number of match_pathspec_depth() call sites and
show how match_pathspec_depth() is used.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert the make_cache_entry boolean 'refresh' argument to a more
general 'refresh_options' argument. Pass the value through to the
underlying refresh_cache_ent call. Add option CE_MATCH_REFRESH to
enable stat refresh. Update call sites to use the new signature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move lstat ENOENT handling from refresh_index to refresh_cache_ent and
activate it with a new CE_MATCH_IGNORE_MISSING option. This will allow
other call paths into refresh_cache_ent to use the feature.
Signed-off-by: Brad King <brad.king@kitware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions. We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the return value of sha1_file_name() to (const char *).
(Callers have no business mucking about here.) Change callers
accordingly, deleting a few superfluous temporary variables along the
way.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The semantics of this flag was changed in commit
e1111cef23 inline lookup_replace_object() calls
but wasn't renamed at the time to minimize code churn. Rename it now,
and add a comment explaining its use.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The patch extends git config --file interface to allow read config from
stdin.
Editing stdin or setting value in stdin is an error.
Include by absolute path is allowed in stdin config, but not by relative
path.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We're going to have more options for config source.
Let's alter git_config_with_options() interface to accept struct with
all source options.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --mixed is used, entries could be removed from index if the
target ref does not have them. When "reset" is used in preparation for
commit spliting (in a dirty worktree), it could be hard to track what
files to be added back. The new option --intent-to-add simplifies it
by marking all removed files intent-to-add.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Code clean-up and protection against concurrent write access to the
ref namespace.
* mh/safe-create-leading-directories:
rename_tmp_log(): on SCLD_VANISHED, retry
rename_tmp_log(): limit the number of remote_empty_directories() attempts
rename_tmp_log(): handle a possible mkdir/rmdir race
rename_ref(): extract function rename_tmp_log()
remove_dir_recurse(): handle disappearing files and directories
remove_dir_recurse(): tighten condition for removing unreadable dir
lock_ref_sha1_basic(): if locking fails with ENOENT, retry
lock_ref_sha1_basic(): on SCLD_VANISHED, retry
safe_create_leading_directories(): add new error value SCLD_VANISHED
cmd_init_db(): when creating directories, handle errors conservatively
safe_create_leading_directories(): introduce enum for return values
safe_create_leading_directories(): always restore slash at end of loop
safe_create_leading_directories(): split on first of multiple slashes
safe_create_leading_directories(): rename local variable
safe_create_leading_directories(): add explicit "slash" pointer
safe_create_leading_directories(): reduce scope of local variable
safe_create_leading_directories(): fix format of "if" chaining
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).
* nd/shallow-clone: (31 commits)
t5537: fix incorrect expectation in test case 10
shallow: remove unused code
send-pack.c: mark a file-local function static
git-clone.txt: remove shallow clone limitations
prune: clean .git/shallow after pruning objects
clone: use git protocol for cloning shallow repo locally
send-pack: support pushing from a shallow clone via http
receive-pack: support pushing to a shallow clone via http
smart-http: support shallow fetch/clone
remote-curl: pass ref SHA-1 to fetch-pack as well
send-pack: support pushing to a shallow clone
receive-pack: allow pushes that update .git/shallow
connected.c: add new variant that runs with --shallow-file
add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
receive/send-pack: support pushing from a shallow clone
receive-pack: reorder some code in unpack()
fetch: add --update-shallow to accept refs that update .git/shallow
upload-pack: make sure deepening preserves shallow roots
fetch: support fetching from a shallow repository
clone: support remote shallow repository
...
We used to use two separate rules for the normal ref resolution
dwimming and dwimming done to decide which remote ref to grab. The
third parameter to refname_match() selected which rules to use.
When these two rules were harmonized in
2011-11-04 dd621df9cd refs DWIMmery: use the same rule for both "git fetch" and others
, ref_fetch_rules was #defined to avoid potential breakages for
in-flight topics.
It is now safe to remove the backwards-compatibility code, so remove
refname_match()'s third parameter, make ref_rev_parse_rules private to
refs.c, and remove ref_fetch_rules entirely.
Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.
* jk/oi-delta-base:
cat-file: provide %(deltabase) batch format
sha1_object_info_extended: provide delta base sha1s
Add a new possible error result that can be returned by
safe_create_leading_directories() and
safe_create_leading_directories_const(): SCLD_VANISHED. This value
indicates that a file or directory on the path existed at one point
(either it already existed or the function created it), but then it
disappeared. This probably indicates that another process deleted the
directory while we were working. If SCLD_VANISHED is returned, the
caller might want to retry the function call, as there is a chance
that a new attempt will succeed.
Why doesn't safe_create_leading_directories() do the retrying
internally? Because an empty directory isn't really ever safe until
it holds a file. So even if safe_create_leading_directories() were
absolutely sure that the directory existed before it returned, there
would be no guarantee that the directory still existed when the caller
tried to write something in it.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of returning magic integer values (which a couple of callers
go to the trouble of distinguishing), return values from an enum. Add
a docstring.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A caller of sha1_object_info_extended technically has enough
information to determine the base sha1 from the results of
the call. It knows the pack, offset, and delta type of the
object, which is sufficient to find the base.
However, the functions to do so are not publicly available,
and the code itself is intimate enough with the pack details
that it should be abstracted away. We could add a public
helper to allow callers to query the delta base separately,
but it is simpler and slightly more efficient to optionally
grab it along with the rest of the object_info data.
For cases where the object is not stored as a delta, we
write the null sha1 into the query field. A careful caller
could check "oi.whence == OI_PACKED && oi.u.packed.is_delta"
before looking at the base sha1, but using the null sha1
provides a simple alternative (and gives a better sanity
check for a non-careful caller than simply returning random
bytes).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This parameter is not used yet, but it will be used to tell
sha1_object_info_extended() if it should perform object
replacement or not.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently, there is only one caller to lookup_replace_object()
that can benefit from passing it some flags, but we expect
that there could be more.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>