* kb/checkout-optim:
Revert "lstat_cache(): print a warning if doing ping-pong between cache types"
checkout bugfix: use stat.mtime instead of stat.ctime in two places
Makefile: Set compiler switch for USE_NSEC
Create USE_ST_TIMESPEC and turn it on for Darwin
Not all systems use st_[cm]tim field for ns resolution file timestamp
Record ns-timestamps if possible, but do not use it without USE_NSEC
write_index(): update index_state->timestamp after flushing to disk
verify_uptodate(): add ce_uptodate(ce) test
make USE_NSEC work as expected
fix compile error when USE_NSEC is defined
check_updates(): effective removal of cache entries marked CE_REMOVE
lstat_cache(): print a warning if doing ping-pong between cache types
show_patch_diff(): remove a call to fstat()
write_entry(): use fstat() instead of lstat() when file is open
write_entry(): cleanup of some duplicated code
create_directories(): remove some memcpy() and strchr() calls
unlink_entry(): introduce schedule_dir_for_removal()
lstat_cache(): swap func(length, string) into func(string, length)
lstat_cache(): generalise longest_match_lstat_cache()
lstat_cache(): small cleanup and optimisation
When "git push" is not told what refspecs to push, it pushes all matching
branches to the current remote. For some workflows this default is not
useful, and surprises new users. Some have even found that this default
behaviour is too easy to trigger by accident with unwanted consequences.
Introduce a new configuration variable "push.default" that decides what
action git push should take if no refspecs are given or implied by the
command line arguments or the current remote configuration.
Possible values are:
'nothing' : Push nothing;
'matching' : Current default behaviour, push all branches that already
exist in the current remote;
'tracking' : Push the current branch to whatever it is tracking;
'current' : Push the current branch to a branch of the same name,
i.e. HEAD.
Signed-off-by: Finn Arne Gangstad <finnag@pvv.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-1.6.0-keep-pack:
is_kept_pack(): final clean-up
Simplify is_kept_pack()
Consolidate ignore_packed logic more
has_sha1_kept_pack(): take "struct rev_info"
has_sha1_pack(): refactor "pretend these packs do not exist" interface
git-repack: resist stray environment variable
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.
Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.
This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack(). The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Its "ignore_packed" parameter always comes from struct rev_info. This
patch makes the function take a pointer to the surrounding structure, so
that the refactoring in the next patch becomes easier to review.
There is an unfortunate header file dependency and the easiest workaround
is to temporarily move the function declaration from cache.h to
revision.h; this will be moved back to cache.h once the function loses
this "ignore_packed" parameter altogether in the later part of the
series.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the callers of this function except only one pass NULL to its last
parameter, ignore_packed.
Introduce has_sha1_kept_pack() function that has the function signature
and the semantics of this function, and convert the sole caller that does
not pass NULL to call this new function.
All other callers and has_sha1_pack() lose the ignore_packed parameter.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since it doesn't actually touch its argument, this makes
sense.
However, we still want to return a non-const version (which
requires a cast) so that this:
struct ref *a, *b;
a = find_ref_by_name(b);
works. Unfortunately, you can also silently strip the const
from a variable:
struct ref *a;
const struct ref *b;
a = find_ref_by_name(b);
This is a classic C const problem because there is no way to
say "return the type with the same constness that was passed
to us"; we provide the same semantics as standard library
functions like strchr.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Jay Soffian <jaysoffian@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since this timestamp is used to check for racy-clean files, it is
important to keep it uptodate.
For the 'git checkout' command without the '-q' option, this make a
huge difference. Before, each and every file which was updated, was
racy-clean after the call to unpack_trees() and write_index() but
before the GIT process ended.
And because of the call to show_local_changes() in builtin-checkout.c,
we ended up reading those files back into memory, doing a SHA1 to
check if the files was really different from the index. And, of
course, no file was different.
With this fix, 'git checkout' without the '-q' option should now be
almost as fast as with the '-q' option, but not quite, as we still do
some few lstat(2) calls more without the '-q' option.
Below is some average numbers for 10 checkout's to v2.6.27 and 10 to
v2.6.25 of the Linux kernel, to show the difference:
before (git version 1.6.2.rc1.256.g58a87):
7.860 user 2.427 sys 19.465 real 52.8% CPU faults: 0 major 95331 minor
after:
6.184 user 2.160 sys 17.619 real 47.4% CPU faults: 0 major 38994 minor
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Talking about --date, one thing I wanted for the 1234567890 date was to
get things in the raw format. Sure, you get them with --pretty=raw, but it
felt a bit sad that you couldn't just ask for the date in raw format.
So here's a throw-away patch (meaning: I won't be re-sending it, because I
really don't think it's a big deal) to add "--date=raw". It just prints
out the internal raw git format - seconds since epoch plus timezone (put
another way: 'date +"%s %z"' format)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Just saying that index.lock exists doesn't tell the user _what_ to do
to fix the problem. We should give an indication that it's normally
safe to delete index.lock after making sure git isn't running here.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function strip_path_suffix() will try to strip a given suffix from
a given path. The suffix must start at a directory boundary (i.e. "core"
is not a path suffix of "libexec/git-core", but "git-core" is).
Arbitrary runs of directory separators ("slashes") are assumed identical.
Example:
strip_path_suffix("C:\\msysgit/\\libexec\\git-core",
"libexec///git-core", &prefix)
will set prefix to "C:\\msysgit" and return 0.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.
This will make racy git situations less likely to happen. For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not. The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.
Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by. For a fast Solid State Disk, this patch should be helpfull.
Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.
For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Below is oprofile output from GIT command 'git chekcout -q my-v2.6.25'
(move from tag v2.6.27 to tag v2.6.25 of the Linux kernel):
CPU: Core 2, speed 1999.95 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit
mask of 0x00 (Unhalted core cycles) count 20000
Counted INST_RETIRED_ANY_P events (number of instructions retired) with a
unit mask of 0x00 (No unit mask) count 20000
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
409247 100.000 342878 100.000 git
CPU_CLK_UNHALT...|INST_RETIRED:2...|
samples| %| samples| %|
------------------------------------
260476 63.6476 257843 75.1996 libz.so.1.2.3
100876 24.6492 64378 18.7758 kernel-2.6.28.4_2.vmlinux
30850 7.5382 7874 2.2964 libc-2.9.so
14775 3.6103 8390 2.4469 git
2020 0.4936 4325 1.2614 libcrypto.so.0.9.8
191 0.0467 32 0.0093 libpthread-2.9.so
58 0.0142 36 0.0105 ld-2.9.so
1 2.4e-04 0 0 libldap-2.3.so.0.2.31
Detail list of the top 20 function entries (libz counted in one blob):
CPU_CLK_UNHALTED INST_RETIRED_ANY_P
samples % samples % image name symbol name
260476 63.6862 257843 75.2725 libz.so.1.2.3 /lib/libz.so.1.2.3
16587 4.0555 3636 1.0615 libc-2.9.so memcpy
7710 1.8851 277 0.0809 libc-2.9.so memmove
3679 0.8995 1108 0.3235 kernel-2.6.28.4_2.vmlinux d_validate
3546 0.8670 2607 0.7611 kernel-2.6.28.4_2.vmlinux __getblk
3174 0.7760 1813 0.5293 libc-2.9.so _int_malloc
2396 0.5858 3681 1.0746 kernel-2.6.28.4_2.vmlinux copy_to_user
2270 0.5550 2528 0.7380 kernel-2.6.28.4_2.vmlinux __link_path_walk
2205 0.5391 1797 0.5246 kernel-2.6.28.4_2.vmlinux ext4_mark_iloc_dirty
2103 0.5142 1203 0.3512 kernel-2.6.28.4_2.vmlinux find_first_zero_bit
2077 0.5078 997 0.2911 kernel-2.6.28.4_2.vmlinux do_get_write_access
2070 0.5061 514 0.1501 git cache_name_compare
2043 0.4995 1501 0.4382 kernel-2.6.28.4_2.vmlinux rcu_irq_exit
2022 0.4944 1732 0.5056 kernel-2.6.28.4_2.vmlinux __ext4_get_inode_loc
2020 0.4939 4325 1.2626 libcrypto.so.0.9.8 /usr/lib/libcrypto.so.0.9.8
1965 0.4804 1384 0.4040 git patch_delta
1708 0.4176 984 0.2873 kernel-2.6.28.4_2.vmlinux rcu_sched_grace_period
1682 0.4112 727 0.2122 kernel-2.6.28.4_2.vmlinux sysfs_slab_alias
1659 0.4056 290 0.0847 git find_pack_entry_one
1480 0.3619 1307 0.3816 kernel-2.6.28.4_2.vmlinux ext4_writepage_trans_blocks
Notice the memmove line, where the CPU did 7710 / 277 = 27.8 cycles
per instruction, and compared to the total cycles spent inside the
source code of GIT for this command, all the memmove() calls
translates to (7710 * 100) / 14775 = 52.2% of this.
Retesting with a GIT program compiled for gcov usage, I found out that
the memmove() calls came from remove_index_entry_at() in read-cache.c,
where we have:
memmove(istate->cache + pos,
istate->cache + pos + 1,
(istate->cache_nr - pos) * sizeof(struct cache_entry *));
remove_index_entry_at() is called 4902 times from check_updates() in
unpack-trees.c, and each time called we move each cache_entry pointers
(from the removed one) one step to the left.
Since we have 28828 entries in the cache this time, and if we on
average move half of them each time, we in total move approximately
4902 * 0.5 * 28828 * 4 = 282 629 712 bytes, or twice this amount if
each pointer is 8 bytes (64 bit).
OK, is seems that the function check_updates() is called 28 times, so
the estimated guess above had been more correct if check_updates() had
been called only once, but the point is: we get lots of bytes moved.
To fix this, and use an O(N) algorithm instead, where N is the number
of cache_entries, we delete/remove all entries in one loop through all
entries.
From a retest, the new remove_marked_cache_entries() from the patch
below, ended up with the following output line from oprofile:
46 0.0105 15 0.0041 git remove_marked_cache_entries
If we can trust the numbers from oprofile in this case, we saved
approximately ((7710 - 46) * 20000) / (2 * 1000 * 1000 * 1000) = 0.077
seconds CPU time with this fix for this particular test. And notice
that now the CPU did only 46 / 15 = 3.1 cycles/instruction.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ms/mailmap:
Move mailmap documentation into separate file
Change current mailmap usage to do matching on both name and email of author/committer.
Add map_user() and clear_mailmap() to mailmap
Add find_insert_index, insert_at_index and clear_func functions to string_list
Add mailmap.file as configurational option for mailmap location
* js/maint-1.6.0-path-normalize:
Remove unused normalize_absolute_path()
Test and fix normalize_path_copy()
Fix GIT_CEILING_DIRECTORIES on Windows
Move sanitary_path_copy() to path.c and rename it to normalize_path_copy()
Make test-path-utils more robust against incorrect use
Otherwise we may reuse the same memory address for a totally
different "struct packed_git", and a previously cached object from
the prior occupant might be returned when trying to unpack an object
from the new pack.
Found-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently inside unlink_entry() if we get a successful removal of one
file with unlink(), we try to remove the leading directories each and
every time. So if one directory containing 200 files is moved to an
other location we get 199 failed calls to rmdir() and 1 successful
call.
To fix this and avoid some unnecessary calls to rmdir(), we schedule
each directory for removal and wait much longer before we do the real
call to rmdir().
Since the unlink_entry() function is called with alphabetically sorted
names, this new function end up being very effective to avoid
unnecessary calls to rmdir(). In some cases over 95% of all calls to
rmdir() is removed with this patch.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Swap function argument pair (length, string) into (string, length) to
conform with the commonly used order inside the GIT source code.
Also, add a note about this fact into the coding guidelines.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows us to augment the repo mailmap file, and to use
mailmap files elsewhere than the repository root. Meaning
that the entries in mailmap.file will override the entries
in "./.mailmap", should they match.
Signed-off-by: Marius Storm-Olsen <marius@trolltech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is now superseded by normalize_path_copy().
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function and normalize_absolute_path() do almost the same thing. The
former already works on Windows, but the latter crashes.
In subsequent changes we will remove normalize_absolute_path(). Here we
make the replacement function reusable. On the way we rename it to reflect
that it does some path normalization. Apart from that this is only moving
around code.
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* js/notes:
git-notes: fix printing of multi-line notes
notes: fix core.notesRef documentation
Add an expensive test for git-notes
Speed up git notes lookup
Add a script to edit/inspect notes
Introduce commit notes
Conflicts:
pretty.c
* kb/lstat-cache:
lstat_cache(): introduce clear_lstat_cache() function
lstat_cache(): introduce invalidate_lstat_cache() function
lstat_cache(): introduce has_dirs_only_path() function
lstat_cache(): introduce has_symlink_or_noent_leading_path() function
lstat_cache(): more cache effective symlink/directory detection
If you want to completely clear the contents of the lstat_cache(), then
call this new function.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases it could maybe be necessary to say to the cache that
"Hey, I deleted/changed the type of this pathname and if you currently
have it inside your cache, you should deleted it".
This patch introduce a function which support this.
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The create_directories() function in entry.c currently calls stat()
or lstat() for each path component of the pathname 'path' each and every
time. For the 'git checkout' command, this function is called on each
file for which we must do an update (ce->ce_flags & CE_UPDATE), so we get
lots and lots of calls.
To fix this, we make a new wrapper to the lstat_cache() function, and
call the wrapper function instead of the calls to the stat() or the
lstat() functions. Since the paths given to the create_directories()
function, is sorted alphabetically, the new wrapper would be very
cache effective in this situation.
To support it we must update the lstat_cache() function to be able to
say that "please test the complete length of 'name'", and also to give
it the length of a prefix, where the cache should use the stat()
function instead of the lstat() function to test each path component.
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In some cases, especially inside the unpack-trees.c file, and inside
the verify_absent() function, we can avoid some unnecessary calls to
lstat(), if the lstat_cache() function can also be told to keep track
of non-existing directories.
So we update the lstat_cache() function to handle this new fact,
introduce a new wrapper function, and the result is that we save lots
of lstat() calls for a removed directory which previously contained
lots of files, when we call this new wrapper of lstat_cache() instead
of the old one.
We do similar changes inside the unlink_entry() function, since if we
can already say that the leading directory component of a pathname
does not exist, it is not necessary to try to remove a pathname below
it!
Thanks to Junio C Hamano, Linus Torvalds and Rene Scharfe for valuable
comments to this patch!
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Implement a shortcut @{-N} for the N-th last branch checked out, that
works by parsing the reflog for the message added by previous
git-checkout invocations. We expand the @{-N} to the branch name, so
that you end up on an attached HEAD on that branch.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both versions have the same functionality. This removes any
redundancy.
This also adds makes two extensions to match_pathspec:
- If pathspec is NULL, return 1. This reflects the behavior of git
commands, for which no paths usually means "match all paths".
- If seen is NULL, do not use it.
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is only used from "sha1_file.c".
And as we want to add a "replace_object" hook in "read_sha1_file",
we must not let people bypass the hook using something other than
"read_sha1_file".
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
R. Tyler Ballance reported a mysterious transient repository corruption;
after much digging, it turns out that we were not catching and reporting
memory allocation errors from some calls we make to zlib.
This one _just_ wraps things; it doesn't do the "retry on low memory
error" part, at least not yet. It is an independent issue from the
reporting. Some of the errors are expected and passed back to the caller,
but we die when zlib reports it failed to allocate memory for now.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commit notes are blobs which are shown together with the commit
message. These blobs are taken from the notes ref, which you can
configure by the config variable core.notesRef, which in turn can
be overridden by the environment variable GIT_NOTES_REF.
The notes ref is a branch which contains "files" whose names are
the names of the corresponding commits (i.e. the SHA-1).
The rationale for putting this information into a ref is this: we
want to be able to fetch and possibly union-merge the notes,
maybe even look at the date when a note was introduced, and we
want to store them efficiently together with the other objects.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Especially on Windows where an opened file cannot be replaced, make
sure pack-objects always close packs it is about to replace. Even on
non Windows systems, this could save potential bad results if ever
objects were to be read from the new pack file using offset from the old
index.
This should fix t5303 on Windows.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Tested-by: Johannes Sixt <j6t@kdbg.org> (MinGW)
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* bc/maint-keep-pack:
repack: only unpack-unreachable if we are deleting redundant packs
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
This uses the extended index flag mechanism introduced earlier to mark
the entries added to the index via "git add -N" with CE_INTENT_TO_ADD.
The logic to detect an "intent to add" entry for the purpose of allowing
"git rm --cached $path" is tightened to check not just for a staged empty
blob, but with the CE_INTENT_TO_ADD bit. This protects an empty blob that
was explicitly added and then modified in the work tree from being dropped
with this sequence:
$ >empty
$ git add empty
$ echo "non empty" >empty
$ git rm --cached empty
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This can do the lstat() storm in parallel, giving potentially much
improved performance for cold-cache cases or things like NFS that have
weak metadata caching.
Just use "read_cache_preload()" instead of "read_cache()" to force an
optimistic preload of the index stat data. The function takes a
pathspec as its argument, allowing us to preload only the relevant
portion of the index.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* np/pack-safer:
t5303: fix printf format string for portability
t5303: work around printf breakage in dash
pack-objects: don't leak pack window reference when splitting packs
extend test coverage for latest pack corruption resilience improvements
pack-objects: allow "fixing" a corrupted pack without a full repack
make find_pack_revindex() aware of the nasty world
make check_object() resilient to pack corruptions
make packed_object_info() resilient to pack corruptions
make unpack_object_header() non fatal
better validation on delta base object offsets
close another possibility for propagating pack corruption
* bc/maint-keep-pack:
t7700: test that 'repack -a' packs alternate packed objects
pack-objects: extend --local to mean ignore non-local loose objects too
sha1_file.c: split has_loose_object() into local and non-local counterparts
t7700: demonstrate mishandling of loose objects in an alternate ODB
builtin-gc.c: use new pack_keep bitfield to detect .keep file existence
repack: do not fall back to incremental repacking with [-a|-A]
repack: don't repack local objects in packs with .keep file
pack-objects: new option --honor-pack-keep
packed_git: convert pack_local flag into a bitfield and add pack_keep
t7700: demonstrate mishandling of objects in packs with a .keep file
* maint:
Start 1.6.0.5 cycle
Fix pack.packSizeLimit and --max-pack-size handling
checkout: Fix "initial checkout" detection
Remove the period after the git-check-attr summary
Conflicts:
RelNotes
Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07)
tightened the rule to prevent switching branches from losing local
changes, so that staged removal of paths can be protected, while
attempting to keep a loophole to still allow a special case of switching
out of an un-checked-out state.
However, the loophole was made a bit too tight, and did not allow
switching from one branch (in an un-checked-out state) to check out
another branch.
The change to builtin-checkout.c in this commit loosens it to allow this,
by not insisting the original commit and the new commit to be the same.
It also introduces a new function, is_index_unborn (and an associated
macro, is_cache_unborn), to check if the repository is truly in an
un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE
did not exist when populating the in-core index structure. A few places
the earlier commit 5521883 added the check for the initial checkout
condition are updated to use this function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This can potentially be used in a few places, so let's make
it available to all parts of the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
pack_keep will be set when a pack file has an associated .keep file.
Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/maint-mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Fix potentially dangerous uses of mkpath and git_path
Fix mkpath abuse in dwim_ref and dwim_log of sha1_name.c
Add mksnpath which allows you to specify the output buffer
Conflicts:
builtin-revert.c
rerere.c
* mv/maint-branch-m-symref:
update-ref --no-deref -d: handle the case when the pointed ref is packed
git branch -m: forbid renaming of a symref
Fix git update-ref --no-deref -d.
rename_ref(): handle the case when the reflog of a ref does not exist
Fix git branch -m for symrefs.
* ar/mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Fix potentially dangerous uses of mkpath and git_path
Fix potentially dangerous uses of mkpath and git_path
Fix mkpath abuse in dwim_ref and dwim_log of sha1_name.c
Add mksnpath which allows you to specify the output buffer
Conflicts:
builtin-revert.c
* mv/maint-branch-m-symref:
update-ref --no-deref -d: handle the case when the pointed ref is packed
git branch -m: forbid renaming of a symref
Fix git update-ref --no-deref -d.
rename_ref(): handle the case when the reflog of a ref does not exist
Fix git branch -m for symrefs.
It is possible to have pack corruption in the object header. Currently
unpack_object_header() simply die() on them instead of letting the caller
deal with that gracefully.
So let's have unpack_object_header() return an error instead, and find
a better name for unpack_object_header_gently() in that context. All
callers of unpack_object_header() are ready for it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Abstract
--------
With index v2 we have a per object CRC to allow quick and safe reuse of
pack data when repacking. This, however, doesn't currently prevent a
stealth corruption from being propagated into a new pack when _not_
reusing pack data as demonstrated by the modification to t5302 included
here.
The Context
-----------
The Git database is all checksummed with SHA1 hashes. Any kind of
corruption can be confirmed by verifying this per object hash against
corresponding data. However this can be costly to perform systematically
and therefore this check is often not performed at run time when
accessing the object database.
First, the loose object format is entirely compressed with zlib which
already provide a CRC verification of its own when inflating data. Any
disk corruption would be caught already in this case.
Then, packed objects are also compressed with zlib but only for their
actual payload. The object headers and delta base references are not
deflated for obvious performance reasons, however this leave them
vulnerable to potentially undetected disk corruptions. Object types
are often validated against the expected type when they're requested,
and deflated size must always match the size recorded in the object header,
so those cases are pretty much covered as well.
Where corruptions could go unnoticed is in the delta base reference.
Of course, in the OBJ_REF_DELTA case, the odds for a SHA1 reference to
get corrupted so it actually matches the SHA1 of another object with the
same size (the delta header stores the expected size of the base object
to apply against) are virtually zero. In the OBJ_OFS_DELTA case, the
reference is a pack offset which would have to match the start boundary
of a different base object but still with the same size, and although this
is relatively much more "probable" than in the OBJ_REF_DELTA case, the
probability is also about zero in absolute terms. Still, the possibility
exists as demonstrated in t5302 and is certainly greater than a SHA1
collision, especially in the OBJ_OFS_DELTA case which is now the default
when repacking.
Again, repacking by reusing existing pack data is OK since the per object
CRC provided by index v2 guards against any such corruptions. What t5302
failed to test is a full repack in such case.
The Solution
------------
As unlikely as this kind of stealth corruption can be in practice, it
certainly isn't acceptable to propagate it into a freshly created pack.
But, because this is so unlikely, we don't want to pay the run time cost
associated with extra validation checks all the time either. Furthermore,
consequences of such corruption in anything but repacking should be rather
visible, and even if it could be quite unpleasant, it still has far less
severe consequences than actively creating bad packs.
So the best compromize is to check packed object CRC when unpacking
objects, and only during the compression/writing phase of a repack, and
only when not streaming the result. The cost of this is minimal (less
than 1% CPU time), and visible only with a full repack.
Someone with a stats background could provide an objective evaluation of
this, but I suspect that it's bad RAM that has more potential for data
corruptions at this point, even in those cases where this extra check
is not performed. Still, it is best to prevent a known hole for
corruption when recreating object data into a new pack.
What about the streamed pack case? Well, any client receiving a pack
must always consider that pack as untrusty and perform full validation
anyway, hence no such stealth corruption could be propagated to remote
repositoryes already. It is therefore worthless doing local validation
in that case.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/maint-mksnpath:
Use git_pathdup instead of xstrdup(git_path(...))
git_pathdup: returns xstrdup-ed copy of the formatted path
Fix potentially dangerous use of git_path in ref.c
Add git_snpath: a .git path formatting routine with output buffer
Conflicts:
builtin-revert.c
refs.c
rerere.c
The function's purpose is to replace git_path where the buffer of
formatted path may not be reused by subsequent calls of the function
or will be copied anyway.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/maint-mksnpath:
Fix potentially dangerous uses of mkpath and git_path
Fix mkpath abuse in dwim_ref and dwim_log of sha1_name.c
Add mksnpath which allows you to specify the output buffer
This is just vsnprintf's but additionally calls cleanup_path() on the
result. To be used as alternatives to mkpath() where the buffer for the
created path may not be reused by subsequent calls of the same formatting
function.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This had two problems with symrefs. First, it copied the actual sha1
instead of the "pointer", second it failed to remove the old ref after a
successful rename.
Given that till now delete_ref() always dereferenced symrefs, a new
parameters has been introduced to delete_ref() to allow deleting refs
without a dereference.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a file is different between the working tree copy, the index, and the
HEAD, then we do not allow it to be deleted without --force.
However, this is overly tight in the face of "git add --intent-to-add":
$ git add --intent-to-add file
$ : oops, I don't actually want to stage that yet
$ git rm --cached file
error: 'empty' has staged content different from both the
file and the HEAD (use -f to force removal)
$ git rm -f --cached file
Unfortunately, there is currently no way to distinguish between an empty
file that has been added and an "intent to add" file. The ideal behavior
would be to disallow the former while allowing the latter.
This patch loosens the safety valve to allow the deletion only if we are
deleting the cached entry and the cached content is empty. This covers
the intent-to-add situation, and assumes there is little harm in not
protecting users who have legitimately added an empty file. In many
cases, the file will still be empty, in which case the safety valve does
not trigger anyway (since the content remains untouched in the working
tree). Otherwise, we do remove the fact that no content was staged, but
given that the content is by definition empty, it is not terribly
difficult for a user to recreate it.
However, we still document the desired behavior in the form of two
tests. One checks the correct removal of an intent-to-add file. The other
checks that we still disallow removal of empty files, but is marked as
expect_failure to indicate this compromise. If the intent-to-add feature
is ever extended to differentiate between normal empty files and
intent-to-add files, then the safety valve can be re-tightened.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/maint-co-track:
Enhance hold_lock_file_for_{update,append}() API
demonstrate breakage of detached checkout with symbolic link HEAD
Fix "checkout --track -b newbranch" on detached HEAD
Conflicts:
builtin-commit.c
This changes the "die_on_error" boolean parameter to a mere "flags", and
changes the existing callers of hold_lock_file_for_update/append()
functions to pass LOCK_DIE_ON_ERROR.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the "git status" display code was originally converted
to C, we copied the code from ls-files to discover whether a
pathname returned by read_directory was an "other", or
untracked, file.
Much later, 5698454e updated the code in ls-files to handle
some new cases caused by gitlinks. This left the code in
wt-status.c broken: it would display submodule directories
as untracked directories. Nobody noticed until now, however,
because unless status.showUntrackedFiles was set to "all",
submodule directories were not actually reported by
read_directory. So the bug was only triggered in the
presence of a submodule _and_ this config option.
This patch pulls the ls-files code into a new function,
cache_name_is_other, and uses it in both places. This should
leave the ls-files functionality the same and fix the bug
in status.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The on-disk format of index only saves 16 bit flags, nearly all have
been used. The last bit (CE_EXTENDED) is used to for future extension.
This patch extends index entry format to save more flags in future.
The new entry format will be used when CE_EXTENDED bit is 1.
Because older implementation may not understand CE_EXTENDED bit and
misread the new format, if there is any extended entry in index, index
header version will turn 3, which makes it incompatible for older git.
If there is none, header version will return to 2 again.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On ARM I have the following compilation errors:
CC fast-import.o
In file included from cache.h:8,
from builtin.h:6,
from fast-import.c:142:
arm/sha1.h:14: error: conflicting types for 'SHA_CTX'
/usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here
arm/sha1.h:16: error: conflicting types for 'SHA1_Init'
/usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here
arm/sha1.h:17: error: conflicting types for 'SHA1_Update'
/usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here
arm/sha1.h:18: error: conflicting types for 'SHA1_Final'
/usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here
make: *** [fast-import.o] Error 1
This is because openssl header files are always included in
git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not
set, which somehow brings in <openssl/sha1.h> clashing with the custom
ARM version. Compilation of git is probably broken on PPC too for the
same reason.
Turns out that the only file requiring openssl/ssl.h and openssl/err.h
is imap-send.c. But only moving those problematic includes there
doesn't solve the issue as it also includes cache.h which brings in the
conflicting local SHA1 header file.
As suggested by Jeff King, the best solution is to rename our references
to SHA1 functions and structure to something git specific, and define those
according to the implementation used.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This function is not used in any other file.
Signed-off-by: Nanako Shiraishi <nanako3@lavabit.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This function is used to learn whether git_dir is already set up or not.
It is necessary, because we want to read configuration in compat/cygwin.c
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Earlier, when pushing into a repository that borrows from alternate object
stores, we followed the longstanding design decision not to trust refs in
the alternate repository that houses the object store we are borrowing
from. If your public repository is borrowing from Linus's public
repository, you pushed into it long time ago, and now when you try to push
your updated history that is in sync with more recent history from Linus,
you will end up sending not just your own development, but also the
changes you acquired through Linus's tree, even though the objects needed
for the latter already exists at the receiving end. This is because the
receiving end does not advertise that the objects only reachable from the
borrowed repository (i.e. Linus's) are already available there.
This solves the issue by making the receiving end advertise refs from
borrowed repositories. They are not sent with their true names but with a
phoney name ".have" to make sure that the old senders will safely ignore
them (otherwise, the old senders will misbehave, trying to push matching
refs, and mirror push that deletes refs that only exist at the receiving
end).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git push" enhancement allows the receiving end to report not only its own
refs but refs in repositories it borrows from via the alternate object
store mechanism. By telling the sender that objects reachable from these
extra refs are already complete in the receiving end, the number of
objects that need to be transfered can be cut down.
These entries are sent over the wire with string ".have", instead of the
actual names of the refs. This string was chosen so that they are ignored
by older programs at the sending end. If we sent some random but valid
looking refnames for these entries, "matching refs" rule (triggered when
running "git push" without explicit refspecs, where the sender learns what
refs the receiver has, and updates only the ones with the names of the
refs the sender also has) and "delete missing" rule (triggered when "git
push --mirror" is used, where the sender tells the receiver to delete the
refs it itself does not have) would try to update/delete them, which is
not what we want.
This prepares the send-pack (and "push" that runs native protocol) to
accept extended existing ref information and make use of it. The ".have"
entries are excluded from ref matching rules, and are exempt from deletion
rule while pushing with --mirror option, but are still used for pack
generation purposes by providing more "bottom" range commits.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A simple "grep -e stat --and -e S_ISDIR" revealed there are many
open-coded implementations of this function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This adds "--intent-to-add" option to "git add". This is to let the
system know that you will tell it the final contents to be staged later,
iow, just be aware of the presense of the path with the type of the blob
for now. It is implemented by staging an empty blob as the content.
With this sequence:
$ git reset --hard
$ edit newfile
$ git add -N newfile
$ edit newfile oldfile
$ git diff
the diff will show all changes relative to the current commit. Then you
can do:
$ git commit -a ;# commit everything
or
$ git commit oldfile ;# only oldfile, newfile not yet added
to pretend you are working with an index-free system like CVS.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint:
unpack_trees(): protect the handcrafted in-core index from read_cache()
git-p4: Fix one-liner in p4_write_pipe function.
Completion: add missing '=' for 'diff --diff-filter'
Fix 'git help help'
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.
The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such. The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().
This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.
A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion. But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.
An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick. In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.
The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The code handles additionally "refs/remotes/<something>/name",
"remotes/<something>/name", and "refs/<namespace>/name".
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We do not have any more bits in the on-disk index flags word, but we would
need to have more in the future. Use the last remaining bits as a signal
to tell us that the index entry we are looking at is an extended one.
Since we do not understand the extended format yet, we will just error out
when we see it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
index_fd can now work with file descriptors that are not normal files
but any readable file. If the given file descriptor is a regular file
then mmap() is used; for other files, strbuf_read is used.
The path parameter, which has been used as hint for filters, can be
NULL now to indicate that the file should be hashed literally without
any filter.
The index_pipe function is removed as redundant.
Signed-off-by: Dmitry Potapov <dpotapov@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A new configuration variable 'core.trustctime' is introduced to
allow ignoring st_ctime information when checking if paths
in the working tree has changed, because there are situations where
it produces too much false positives. Like when file system crawlers
keep changing it when scanning and using the ctime for marking scanned
files.
The default is to notice ctime changes.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The rewrite of git-mv from a shell script to a builtin was perhaps
a little too straightforward: the git add and git rm queues were
emulated directly, which resulted in a rather complicated code and
caused an inconsistent behaviour when moving dirty index entries;
git mv would update the entry based on working tree state,
except in case of overwrites, where the new entry would still have
sha1 of the old file.
This patch introduces rename_index_entry_at() into the index toolkit,
which will rename an entry while removing any entries the new entry
might render duplicate. This is then used in git mv instead
of all the file queues, resulting in a major simplification
of the code and an inevitable change in git mv -n output format.
Also the code used to refuse renaming overwriting symlink with a regular
file and vice versa; there is no need for that.
A few new tests have been added to the testsuite to reflect this change.
Signed-off-by: Petr Baudis <pasky@suse.cz>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A private function add_files_to_cache() in builtin-add.c was borrowed by
checkout and commit re-implementors without getting properly refactored to
more library-ish place. This does the refactoring.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git update-index --refresh", "git reset" and "git add --refresh" have
reported paths that have local modifications as "needs update" since the
beginning of git.
Although this is logically correct in that you need to update the index at
that path before you can commit that change, it is now becoming more and
more clear, especially with the continuous push for user friendliness
since 1.5.0 series, that the message is suboptimal. After all, the change
may be something the user might want to get rid of, and "updating" would
be absolutely a wrong thing to do if that is the case.
I prepared two alternatives to solve this. Both aim to reword the message
to more neutral "locally modified".
This patch is a more intrusive variant that changes the message for only
Porcelain commands ("add" and "reset") while keeping the plumbing
"update-index" intact.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
5fdeacb (Teach update-index about --ignore-submodules, 2008-05-14) added a
new refresh option flag but did not assign a unique bit for it correctly,
and broke "update-index --ignore-missing".
This fixes it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* mv/merge-in-c:
reduce_heads(): protect from duplicate input
reduce_heads(): thinkofix
Add a new test for git-merge-resolve
t6021: add a new test for git-merge-resolve
Teach merge.log to "git-merge" again
Build in merge
Fix t7601-merge-pull-config.sh on AIX
git-commit-tree: make it usable from other builtins
Add new test case to ensure git-merge prepends the custom merge message
Add new test case to ensure git-merge reduces octopus parents when possible
Introduce reduce_heads()
Introduce get_merge_bases_many()
Add new test to ensure git-merge handles more than 25 refs.
Introduce get_octopus_merge_bases() in commit.c
git-fmt-merge-msg: make it usable from other builtins
Move read_cache_unmerged() to read-cache.c
Add new test to ensure git-merge handles pull.twohead and pull.octopus
Move parse-options's skip_prefix() to git-compat-util.h
Move commit_list_count() to commit.c
Move split_cmdline() to alias.c
Conflicts:
Makefile
parse-options.c
Since commit 8eca0b47ff, it is possible
for read_sha1_file() to return NULL even with existing objects when they
are corrupted. Previously a corrupted object would have terminated the
program immediately, effectively making read_sha1_file() return NULL
only when specified object is not found.
Let's restore this behavior for all users of read_sha1_file() and
provide a separate function with the ability to not terminate when
bad objects are encountered.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* dr/ceiling:
Eliminate an unnecessary chdir("..")
Add support for GIT_CEILING_DIRECTORIES
Fold test-absolute-path into test-path-utils
Implement normalize_absolute_path
Conflicts:
cache.h
setup.c
* j6t/mingw: (38 commits)
compat/pread.c: Add a forward declaration to fix a warning
Windows: Fix ntohl() related warnings about printf formatting
Windows: TMP and TEMP environment variables specify a temporary directory.
Windows: Make 'git help -a' work.
Windows: Work around an oddity when a pipe with no reader is written to.
Windows: Make the pager work.
When installing, be prepared that template_dir may be relative.
Windows: Use a relative default template_dir and ETC_GITCONFIG
Windows: Compute the fallback for exec_path from the program invocation.
Turn builtin_exec_path into a function.
Windows: Use a customized struct stat that also has the st_blocks member.
Windows: Add a custom implementation for utime().
Windows: Add a new lstat and fstat implementation based on Win32 API.
Windows: Implement a custom spawnve().
Windows: Implement wrappers for gethostbyname(), socket(), and connect().
Windows: Work around incompatible sort and find.
Windows: Implement asynchronous functions as threads.
Windows: Disambiguate DOS style paths from SSH URLs.
Windows: A rudimentary poll() emulation.
Windows: Implement start_command().
...
For everything other than using "git config" to read or write a
git-style config file that isn't the current repo's config file,
GIT_CONFIG was actively detrimental. Rather than argue over which
programs are important enough to have work anyway, just fix all of
them at the root.
Also removes GIT_LOCAL_CONFIG, which would only be useful for programs
that do want to use global git-specific config, but not the repo's own
git-specific config, and want to use some other, presumably
git-specific config. Despite being documented, I can't find any sign that
it was ever used.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin-read-tree has a read_cache_unmerged() which is useful for other
builtins, for example builtin-merge uses it as well. Move it to
read-cache.c to avoid code duplication.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
split_cmdline() is currently used for aliases only, but later it can be
useful for other builtins as well. Move it to alias.c for now,
indicating that originally it's for aliases, but we'll have it in libgit
this way.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a patch adds new blank lines at the end, "git apply --whitespace"
warns. This teaches "diff --check" to do the same.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The function name was too bland and not explicit enough as to what it is
checking. Split it into two, and call the one that checks if there is a
whitespace breakage "ws_check()", and call the other one that checks and
emits the line after color coding "ws_check_emit()".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/config-fsync:
Add config option to enable 'fsync()' of object files
Split up default "i18n" and "branch" config parsing into helper routines
Split up default "user" config parsing into helper routine
Split up default "core" config parsing into helper routine
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.
We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:
1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
creates bar, but this function just creates the leading
directories.
2. mkdir_p took a mode argument, but it was completely
ignored.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using find_pack_entry_one() to get object offsets is rather suboptimal
when nth_packed_object_offset() can be used directly.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The shell version used to use "mkdir -p" to create the repo
path, but the C version just calls "mkdir". Let's replicate
the old behavior. We have to create the git and worktree
leading dirs separately; while most of the time, the
worktree dir contains the git dir (as .git), the user can
override this using GIT_WORK_TREE.
We can reuse safe_create_leading_directories, but we need to
make a copy of our const buffer to do so. Since
merge-recursive uses the same pattern, we can factor this
out into a global function. This has two other cleanup
advantages for merge-recursive:
1. mkdir_p wasn't a very good name. "mkdir -p foo/bar" actually
creates bar, but this function just creates the leading
directories.
2. mkdir_p took a mode argument, but it was completely
ignored.
Acked-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We should be able to fall back to loose objects or alternative packs when
a pack becomes corrupted. This is especially true when an object exists
in one pack only as a delta but its base object is corrupted. Currently
there is no way to retrieve the former object even if the later is
available in another pack or loose.
This patch allows for a delta to be resolved (with a performance cost)
using a base object from a source other than the pack where that delta
is located. Same thing for non-delta objects: rather than failing
outright, a search is made in other packs or used loose when the
currently active pack has it but corrupted.
Of course git will become extremely noisy with error messages when that
happens. However, if the operation succeeds nevertheless, a simple
'git repack -a -f -d' will "fix" the corrupted repository given that all
corrupted objects have a good duplicate somewhere in the object store,
possibly manually copied from another source.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
GIT's guts work with a forward slash as a path separators. We do not change
that. Rather we make sure that only "normalized" paths enter the depths
of the machinery.
We have to translate backslashes to forward slashes in the prefix and in
command line arguments. Fortunately, all of them are passed through
functions in setup.c.
A macro has_dos_drive_path() is defined that checks whether a path begins
with a drive letter+colon combination. This predicate is always false on
Unix. Another macro is_dir_sep() abstracts that a backslash is also a
directory separator on Windows.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Once we find the absolute paths for git_dir and work_tree, we can make
git_dir a relative path since we know pwd will be work_tree. This should
save the kernel some time traversing the path to work_tree all the time
if git_dir is inside work_tree.
Daniel's patch didn't apply for me as-is, so I recreated it with some
differences, and here are the numbers from ten runs each.
There is some IO for me - probably due to more-or-less random flushing of
the journal - so the variation is bigger than I'd like, but whatever:
Before:
real 0m8.135s
real 0m7.933s
real 0m8.080s
real 0m7.954s
real 0m7.949s
real 0m8.112s
real 0m7.934s
real 0m8.059s
real 0m7.979s
real 0m8.038s
After:
real 0m7.685s
real 0m7.968s
real 0m7.703s
real 0m7.850s
real 0m7.995s
real 0m7.817s
real 0m7.963s
real 0m7.955s
real 0m7.848s
real 0m7.969s
Now, going by "best of ten" (on the assumption that the longer numbers
are all due to IO), I'm saying a 7.933s -> 7.685s reduction, and it does
seem to be outside of the noise (ie the "after" case never broke 8s, while
the "before" case did so half the time).
So looks like about 3% to me.
Doing it for a slightly smaller test-case (just the "arch" subdirectory)
gets more stable numbers probably due to not filling the journal with
metadata updates, so we have:
Before:
real 0m1.633s
real 0m1.633s
real 0m1.633s
real 0m1.632s
real 0m1.632s
real 0m1.630s
real 0m1.634s
real 0m1.631s
real 0m1.632s
real 0m1.632s
After:
real 0m1.610s
real 0m1.609s
real 0m1.610s
real 0m1.608s
real 0m1.607s
real 0m1.610s
real 0m1.609s
real 0m1.611s
real 0m1.608s
real 0m1.611s
where I'ld just take the averages and say 1.632 vs 1.610, which is just
over 1% peformance improvement.
So it's not in the noise, but it's not as big as I initially thought and
measured.
(That said, it obviously depends on how deep the working directory path is
too, and whether it is behind NFS or something else that might need to
cause more work to look up).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As explained in the documentation[*] this is totally useless on
filesystems that do ordered/journalled data writes, but it can be a
useful safety feature on filesystems like HFS+ that only journal the
metadata, not the actual file contents.
It defaults to off, although we could presumably in theory some day
auto-enable it on a per-filesystem basis.
[*] Yes, I updated the docs for the thing. Hell really _has_ frozen
over, and the four horsemen are probably just beyond the horizon.
EVERYBODY PANIC!
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It was implemented as a thin wrapper around an otherwise unused
helper function parse_pack_index_file(). The code becomes simpler
and easier to read by consolidating the two.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
write_sha1_from_fd() and write_sha1_to_fd() were dead code nobody called,
neither the latter's helper repack_object() was.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Particularly for the "alternates" file, if one will be created, we
want a path that doesn't depend on the current directory, but we want
to retain any symlinks in the path as given and any in the user's view
of the current directory when the path was given.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This means that we can depend on packs always being stable on disk,
simplifying a lot of the object serialization worries. And unlike loose
objects, serializing pack creation IO isn't going to be a performance
killer.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/add-n-u:
Make git add -n and git -u -n output consistent
"git-add -n -u" should not add but just report
Conflicts:
builtin-add.c
builtin-mv.c
cache.h
read-cache.c
* db/clone-in-c:
Add test for cloning with "--reference" repo being a subset of source repo
Add a test for another combination of --reference
Test that --reference actually suppresses fetching referenced objects
clone: fall back to copying if hardlinking fails
builtin-clone.c: Need to closedir() in copy_or_link_directory()
builtin-clone: fix initial checkout
Build in clone
Provide API access to init_db()
Add a function to set a non-default work tree
Allow for having for_each_ref() list extra refs
Have a constant extern refspec for "--tags"
Add a library function to add an alternate to the alternates file
Add a lockfile function to append to a file
Mark the list of refs to fetch as const
Conflicts:
cache.h
t/t5700-clone-reference.sh
* js/ignore-submodule:
Ignore dirty submodule states during rebase and stash
Teach update-index about --ignore-submodules
diff options: Introduce --ignore-submodules
* bc/repack:
Documentation/git-repack.txt: document new -A behaviour
let pack-objects do the writing of unreachable objects as loose objects
add a force_object_loose() function
builtin-gc.c: deprecate --prune, it now really has no effect
git-gc: always use -A when manually repacking
repack: modify behavior of -A option to leave unreferenced objects unpacked
Conflicts:
builtin-pack-objects.c
Make git recognize a new environment variable that prevents it from
chdir'ing up into specified directories when looking for a GIT_DIR.
Useful for avoiding slow network directories.
For example, I use git in an environment where homedirs are automounted
and "ls /home/nonexistent" takes about 9 seconds. Setting
GIT_CEILING_DIRS="/home" allows "git help -a" (for bash completion) and
"git symbolic-ref" (for my shell prompt) to run in a reasonable time.
Signed-off-by: David Reiss <dreiss@facebook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
normalize_absolute_path removes several oddities form absolute paths,
giving nice clean paths like "/dir/sub1/sub2". Also add a test case
for this utility, based on a new test program (in the style of test-sha1).
Signed-off-by: David Reiss <dreiss@facebook.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* ar/add-unreadable:
Add a config option to ignore errors for git-add
Add a test for git-add --ignore-errors
Add --ignore-errors to git-add to allow it to skip files with read errors
Extend interface of add_files_to_cache to allow ignore indexing errors
Make the exit code of add_file_to_index actually useful
Like with the diff machinery, update-index should sometimes just
ignore submodules (e.g. to determine a clean state before a rebase).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* sb/committer:
commit: Show committer if automatic
commit: Show author if different from committer
Preparation to call determine_author_info from prepare_to_commit
git_config() only had a function parameter, but no callback data
parameter. This assumes that all callback functions only modify
global variables.
With this patch, every callback gets a void * parameter, and it is hoped
that this will help the libification effort.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is meant to force the creation of a loose object even if it
already exists packed. Needed for the next commit.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/core-optim:
Optimize symlink/directory detection
Avoid some unnecessary lstat() calls
is_racy_timestamp(): do not check timestamp for gitlinks
diff-lib.c: rename check_work_tree_entity()
diff: a submodule not checked out is not modified
Add t7506 to test submodule related functions for git-status
t4027: test diff for submodule with empty directory
Make git-add behave more sensibly in a case-insensitive environment
When adding files to the index, add support for case-independent matches
Make unpack-tree update removed files before any updated files
Make branch merging aware of underlying case-insensitive filsystems
Add 'core.ignorecase' option
Make hash_name_lookup able to do case-independent lookups
Make "index_name_exists()" return the cache_entry it found
Move name hashing functions into a file of its own
Make unpack_trees_options bit flags actual bitfields
Change cd67e4d4 introduced a new configuration parameter that told
pull to automatically perform a rebase instead of a merge. This
change provides a configuration option to enable this feature
automatically when creating a new branch.
If the variable branch.autosetuprebase applies for a branch that's
being created, that branch will have branch.<name>.rebase set to true.
Signed-off-by: Dustin Sallings <dustin@spy.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is the base for making symlink detection in the middle fo a pathname
saner and (much) more efficient.
Under various loads, we want to verify that the full path leading up to a
filename is a real directory tree, and that when we successfully do an
'lstat()' on a filename, we don't get a false positive due to a symlink in
the middle of the path that git should have seen as a symlink, not as a
normal path component.
The 'has_symlink_leading_path()' function already did this, and cached
a single level of symlink information, but didn't cache the _lack_ of a
symlink, so the normal behaviour was actually the wrong way around, and we
ended up doing an 'lstat()' on each path component to check that it was a
real directory.
This caches the last detected full directory and symlink entries, and
speeds up especially deep directory structures a lot by avoiding to
lstat() all the directories leading up to each entry in the index.
[ This can - and should - probably be extended upon so that we eventually
never do a bare 'lstat()' on any path entries at *all* when checking the
index, but always check the full path carefully. Right now we do not
generally check the whole path for all our normal quick index
revalidation.
We should also make sure that we're careful about all the invalidation,
ie when we remove a link and replace it by a directory we should
invalidate the symlink cache if it matches (and vice versa for the
directory cache).
But regardless, the basic function needs to be sane to do that. The old
'has_symlink_leading_path()' was not capable enough - or indeed the code
readable enough - to really do that sanely. So I'm pushing this as not
just an optimization, but as a base for further work. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The commit sequence used to do
if (file_exists(p->path))
add_file_to_cache(p->path, 0);
where both "file_exists()" and "add_file_to_cache()" needed to do a
lstat() on the path to do their work.
This cuts down 'lstat()' calls for the partial commit case by two
for each path we know about (because we do this twice per path).
Just move the lstat() to the caller instead (that's all that
"file_exists()" really does), and pass the stat information down to the
add_to_cache() function.
This essentially makes 'add_to_index()' the core function that adds a path
to the index, getting the index pointer, the pathname and the stat
information as arguments. There are then shorthand helper functions that
use this core function:
- 'add_to_cache()' is just 'add_to_index()' with the default index
- 'add_file_to_cache/index()' is the same, but does the lstat() call
itself, so you can pass just the pathname if you don't already have the
stat information available.
So old users of the 'add_file_to_xyzzy()' are essentially left unchanged,
and this just exposes the more generic helper function that can take
existing stat information into account.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lt/case-insensitive:
Make git-add behave more sensibly in a case-insensitive environment
When adding files to the index, add support for case-independent matches
Make unpack-tree update removed files before any updated files
Make branch merging aware of underlying case-insensitive filsystems
Add 'core.ignorecase' option
Make hash_name_lookup able to do case-independent lookups
Make "index_name_exists()" return the cache_entry it found
Move name hashing functions into a file of its own
Make unpack_trees_options bit flags actual bitfields
To warn the user in case he/she might be using an unintended
committer identity.
Signed-off-by: Santi Béjar <sbejar@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* lh/git-file:
Teach GIT-VERSION-GEN about the .git file
Teach git-submodule.sh about the .git file
Teach resolve_gitlink_ref() about the .git file
Add platform-independent .git "symlink"
The caller first calls set_git_dir() to specify the GIT_DIR, and then
calls init_db() to initialize it. This also cleans up various parts of
the code to account for the fact that everything is done with GIT_DIR
set, so it's unnecessary to pass the specified directory around.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function may only be used before the work tree is used.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is in the core so that, if the alternates file has already been
read, the addition can be parsed and put into effect for the current
process.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This takes care of copying the original contents into the replacement
file after the lock is held, so that concurrent additions can't miss
each other's changes.
[jc: munged to drop mmap in favor of copy_file.]
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
xread() and xwrite() return ssize_t values as their native POSIX
counterparts read(2) and write(2).
To be consistent, read_in_full() and write_in_full() should also return
ssize_t values.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes a struct ref able to represent a symref, and makes http.c
able to recognize one, and makes transport.c look for "HEAD" as a ref
in the list, and makes it dereference symrefs for the resulting ref,
if any.
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>