When we are applying a patch that creates a blob at a path, or
when we are switching from a branch that does not have a blob at
the path to another branch that has one, we need to make sure
that there is nothing at the path in the working tree, as such a
file is a local modification made by the user that would be lost
by the operation.
Normally, lstat() on the path and making sure ENOENT is returned
is good enough for that purpose. However there is a twist. We
may be creating a regular file arch/x86_64/boot/Makefile, while
removing an existing symbolic link at arch/x86_64/boot that
points at existing ../i386/boot directory that has Makefile in
it. We always first check without touching filesystem and then
perform the actual operation, so when we verify the new file,
arch/x86_64/boot/Makefile, does not exist, we haven't removed
the symbolic link arc/x86_64/boot symbolic link yet. lstat() on
the file sees through the symbolic link and reports the file is
there, which is not what we want.
The function has_symlink_leading_path() function takes a path,
and sees if any of the leading directory component is a symbolic
link.
When files in a new directory are created, we tend to process
them together because both index and tree are sorted. The
function takes advantage of this and allows the caller to cache
and reuse which symbolic link on the filesystem caused the
function to return true.
The calling sequence would be:
char last_symlink[PATH_MAX];
*last_symlink = '\0';
for each index entry {
if (!lose)
continue;
if (lstat(it))
if (errno == ENOENT)
; /* happy */
else
error;
else if (has_symlink_leading_path(it, last_symlink))
; /* happy */
else
error; /* would lose local changes */
unlink_entry(it, last_symlink);
}
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add config variables pack.compression and core.loosecompression ,
and switch --compression=level to pack-objects.
Loose objects will be compressed using core.loosecompression if set,
else core.compression if set, else Z_BEST_SPEED.
Packed objects will be compressed using --compression=level if seen,
else pack.compression if set, else core.compression if set,
else Z_DEFAULT_COMPRESSION. This is the "pack compression level".
Loose objects added to a pack undeltified will be recompressed
to the pack compression level if it is unequal to the current
loose compression level by the preceding rules, or if the loose
object was written while core.legacyheaders = true. Newly
deltified loose objects are always compressed to the current
pack compression level.
Previously packed objects added to a pack are recompressed
to the current pack compression level exactly when their
deltification status changes, since the previous pack data
cannot be reused.
In either case, the --no-reuse-object switch from the first
patch below will always force recompression to the current pack
compression level, instead of assuming the pack compression level
hasn't changed and pack data can be reused when possible.
This applies on top of the following patches from Nicolas Pitre:
[PATCH] allow for undeltified objects not to be reused
[PATCH] make "repack -f" imply "pack-objects --no-reuse-object"
Signed-off-by: Dana L. How <danahow@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now that we encourage and actively preserve objects in a packed form
more agressively than we did at the time the new loose object format and
core.legacyheaders were introduced, that extra loose object format
doesn't appear to be worth it anymore.
Because the packing of loose objects has to go through the delta match
loop anyway, and since most of them should end up being deltified in
most cases, there is really little advantage to have this parallel loose
object format as the CPU savings it might provide is rather lost in the
noise in the end.
This patch gets rid of core.legacyheaders, preserve the legacy format as
the only writable loose object format and deprecate the other one to
keep things simpler.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds --date={local,relative,default} option to log family of commands,
to allow displaying timestamps in user's local timezone, relative time, or
the default format.
Existing --relative-date option is a synonym of --date=relative; we could
probably deprecate it in the long run.
Signed-off-by: Junio C Hamano <junkio@cox.net>
get_sha1_with_mode basically behaves as get_sha1. It has an additional
parameter for storing the mode of the object.
If the mode can not be determined, it stores S_IFINVALID.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
S_IFINVALID is used to signal, that no mode information is available.
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This makes all low-level functions defined in read-cache.c to
take an explicit index_state structure as their first parameter,
to specify which index to work on. These functions
traditionally operated on "the_index" and were named foo_cache();
the counterparts this patch introduces are called foo_index().
The traditional foo_cache() functions are made into macros that
give "the_index" to their corresponding foo_index() functions.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This defines a index_state structure and moves index-related
global variables into it. Currently there is one instance of
it, the_index, and everybody accesses it, so there is no code
change.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* 'jc/attr': (28 commits)
lockfile: record the primary process.
convert.c: restructure the attribute checking part.
Fix bogus linked-list management for user defined merge drivers.
Simplify calling of CR/LF conversion routines
Document gitattributes(5)
Update 'crlf' attribute semantics.
Documentation: support manual section (5) - file formats.
Simplify code to find recursive merge driver.
Counto-fix in merge-recursive
Fix funny types used in attribute value representation
Allow low-level driver to specify different behaviour during internal merge.
Custom low-level merge driver: change the configuration scheme.
Allow the default low-level merge driver to be configured.
Custom low-level merge driver support.
Add a demonstration/test of customized merge.
Allow specifying specialized merge-backend per path.
merge-recursive: separate out xdl_merge() interface.
Allow more than true/false to attributes.
Document git-check-attr
Change attribute negation marker from '!' to '-'.
...
* lt/gitlink:
Tests for core subproject support
Expose subprojects as special files to "git diff" machinery
Fix some "git ls-files -o" fallout from gitlinks
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic to not follow gitlinks
Fix gitlink index entry filesystem matching
Teach "git-read-tree -u" to check out submodules as a directory
Teach git list-objects logic not to follow gitlinks
Don't show gitlink directories when we want "other" files
Teach git-update-index about gitlinks
Teach directory traversal about subprojects
Fix thinko in subproject entry sorting
Teach core object handling functions about gitlinks
Teach "fsck" not to follow subproject links
Add "S_IFDIRLNK" file mode infrastructure for git links
Add 'resolve_gitlink_ref()' helper function
Avoid overflowing name buffer in deep directory structures
diff-lib: use ce_mode_from_stat() rather than messing with modes manually
* np/pack: (27 commits)
document --index-version for index-pack and pack-objects
pack-objects: remove obsolete comments
pack-objects: better check_object() performances
add get_size_from_delta()
pack-objects: make in_pack_header_size a variable of its own
pack-objects: get rid of create_final_object_list()
pack-objects: get rid of reuse_cached_pack
pack-objects: clean up list sorting
pack-objects: rework check_delta_limit usage
pack-objects: equal objects in size should delta against newer objects
pack-objects: optimize preferred base handling a bit
clean up add_object_entry()
tests for various pack index features
use test-genrandom in tests instead of /dev/urandom
simple random data generator for tests
validate reused pack data with CRC when possible
allow forcing index v2 and 64-bit offset treshold
pack-redundant.c: learn about index v2
show-index.c: learn about index v2
sha1_file.c: learn about index version 2
...
The usual process flow is the main process opens and holds the lock to
the index, does its thing, perhaps spawning children during the course,
and then writes the resulting index out by releaseing the lock.
However, the lockfile interface uses atexit(3) to clean it up, without
regard to who actually created the lock. This typically leads to a
confusing behaviour of lock being released too early when the child
exits, and then the parent process when it calls commit_lockfile()
finds that it cannot unlock it.
This fixes the problem by recording who created and holds the lock, and
upon atexit(3) handler, child simply ignores the lockfile the parent
created.
Signed-off-by: Junio C Hamano <junkio@cox.net>
delete_ref function does not change the 'sha1' parameter. Non-const pointer
causes a compiler warning if you call to the function using a const argument.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to use a different allocator scheme for when we didn't know the
object type. That meant that objects that were created without any
up-front knowledge of the type would not go through the same allocation
paths as normal object allocations, and would miss out on the statistics.
But perhaps more importantly than the statistics (that are useful when
looking at memory usage but not much else), if we want to make the
object hash tables use a denser object pointer representation, we need
to make sure that they all go through the same blocking allocator.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
... which consists of existing code split out of packed_delta_info()
for other callers to use it as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:
* diff crlf
*.ps !diff
proprietary.o !diff !crlf
That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:
* diff crlf merge-3way
*.ps !diff
proprietary.o !diff !crlf !merge-3way
The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.
All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".
So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:
[attr] binary !diff !crlf !merge-3way
When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.
It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:
(built-in attribute configuration)
[attr] binary !diff !crlf !merge-3way
* diff crlf merge-3way
(project specific .gitattributes)
proprietary.o binary
(user preference $GIT_DIR/info/attributes)
*.ps !diff
There are a few caveats.
* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.
* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:
[attr] text diff crlf
[attr] ps text !diff
*.ps ps
while this would most likely not (I haven't tried):
[attr] ps text !diff
[attr] text diff crlf
*.ps ps
* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:
[attr] text diff crlf
[attr] ps text !diff
*.txt !ps
path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.
An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.
Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.
A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.
This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
GIT 1.5.1.1
cvsserver: Fix handling of diappeared files on update
fsck: do not complain on detached HEAD.
(encode_85, decode_85): Mark source buffer pointer as "const".
This just adds the basic helper functions to recognize and work with git
tree entries that are links to other git repositories ("subprojects").
They still aren't actually connected up to any of the code-paths, but
now all the infrastructure is in place.
The next commit will start actually adding actual subproject support.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The coming index format change doesn't allow for the number of objects
to be determined from the size of the index file directly. Instead, Let's
initialize a field in the packed_git structure with the object count when
the index is validated since the count is always known at that point.
While at it let's reorder some struct packed_git fields to avoid padding
due to needed 64-bit alignment for some of them.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This merge strategy largely piggy-backs on git-merge-recursive.
When merging trees A and B, if B corresponds to a subtree of A,
B is first adjusted to match the tree structure of A, instead of
reading the trees at the same level. This adjustment is also
done to the common ancestor tree.
If you are pulling updates from git-gui repository into git.git
repository, the root level of the former corresponds to git-gui/
subdirectory of the latter. The tree object of git-gui's toplevel
is wrapped in a fake tree object, whose sole entry has name 'git-gui'
and records object name of the true tree, before being used by
the 3-way merge code.
If you are merging the other way, only the git-gui/ subtree of
git.git is extracted and merged into git-gui's toplevel.
The detection of corresponding subtree is done by comparing the
pathnames and types in the toplevel of the tree.
Heuristics galore! That's the git way ;-).
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/index-output:
git-read-tree --index-output=<file>
_GIT_INDEX_OUTPUT: allow plumbing to output to an alternative index file.
Conflicts:
builtin-apply.c
This function was not called "add_file_to_cache()" only because
an ancient program, update-cache, used that name as an internal
function name that does something slightly different. Now that
is gone, we can take over the better name.
The plan is to name all functions that operate on the default
index xxx_cache(). Later patches create a variant of them that
take an explicit parameter xxx_index(), and then turn
xxx_cache() functions into macros that use "the_index".
Signed-off-by: Junio C Hamano <junkio@cox.net>
This error message should not usually trigger, but the function
make_cache_entry() called by add_cacheinfo() can return early
without calling into refresh_cache_entry() that sets cache_errno.
Also the error message had a wrong function name reported, and
it did not say anything about which path failed either.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Let's avoid the open coded pack index reference in pack-object and use
nth_packed_object_sha1() instead. This will help encapsulating index
format differences in one place.
And while at it there is no reason to copy SHA1's over and over while a
direct pointer to it in the index will do just fine.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Acked-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This corrects the interface mistake of the previous one, and
gives a command line parameter to the only plumbing command that
currently needs it: "git-read-tree".
We can add the calls to set_alternate_index_output() to other
plumbing commands that update the index if/when needed.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When defined, this allows plumbing commands that update the
index (add, apply, checkout-index, merge-recursive, mv,
read-tree, rm, update-index, and write-tree) to write their
resulting index to an alternative index file while holding a
lock to the original index file. With this, git-commit that
jumps the index does not have to make an extra copy of the index
file, and more importantly, it can do the update while holding
the lock on the index.
However, I think the interface to let an environment variable
specify the output is a mistake, as shown in the documentation.
If a curious user has the environment variable set to something
other than the file GIT_INDEX_FILE points at, almost everything
will break. This should instead be a command line parameter to
tell these plumbing commands to write the result in the named
file, to prevent stupid mistakes.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Use hash_sha1_file() instead of duplicating code to compute object SHA1.
While at it make it accept a const pointer.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new configuration variable core.deltaBaseCacheLimit allows the
user to control how much memory they are willing to give to Git for
caching base objects of deltas. This is not normally meant to be
a user tweakable knob; the "out of the box" settings are meant to
be suitable for almost all workloads.
We default to 16 MiB under the assumption that the cache is not
meant to consume all of the user's available memory, and that the
cache's main purpose was to cache trees, for faster path limiters
during revision traversal. Since trees tend to be relatively small
objects, this relatively small limit should still allow a large
number of objects.
On the other hand we don't want the cache to start storing 200
different versions of a 200 MiB blob, as this could easily blow
the entire address space of a 32 bit process.
We evict OBJ_BLOB from the cache first (credit goes to Junio) as
we want to favor OBJ_TREE within the cache. These are the objects
that have the highest inflate() startup penalty, as they tend to
be small and thus don't have that much of a chance to ammortize
that penalty over the entire data.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/run-command:
Use run_command within send-pack
Use run_command within receive-pack to invoke index-pack
Use run_command within merge-index
Use run_command for proxy connections
Use RUN_GIT_CMD to run push backends
Correct new compiler warnings in builtin-revert
Replace fork_with_pipe in bundle with run_command
Teach run-command to redirect stdout to /dev/null
Teach run-command about stdout redirection
Especially with the new index format to come, it is more appropriate
to encapsulate more into check_packed_git_idx() and assume less of the
index format in struct packed_git.
To that effect, the index_base is renamed to index_data with void * type
so it is not used directly but other pointers initialized with it. This
allows for a couple pointer cast removal, as well as providing a better
generic name to grep for when adding support for new index versions or
formats.
And index_data is declared const too while at it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new builtin-revert code introduces a few new compiler errors
when I'm building with my stricter set of checks enabled in CFLAGS.
These all just stem from trying to store a constant string into
a non-const char*. Simple fix, make the variables const char*.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When accessing objects, we first look for them in packs that
are linked together in the reverse order of discovery.
Since younger packs tend to contain more recent objects, which
are more likely to be accessed often, and local packs tend to
contain objects more relevant to our specific projects, sort the
list of packs before starting to access them. In addition,
favoring local packs over the ones borrowed from alternates can
be a win when alternates are mounted on network file systems.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In order to track and build on top of a branch 'topic' you track from
your upstream repository, you often would end up doing this sequence:
git checkout -b mytopic origin/topic
git config --add branch.mytopic.remote origin
git config --add branch.mytopic.merge refs/heads/topic
This would first fork your own 'mytopic' branch from the 'topic'
branch you track from the 'origin' repository; then it would set up two
configuration variables so that 'git pull' without parameters does the
right thing while you are on your own 'mytopic' branch.
This commit adds a --track option to git-branch, so that "git
branch --track mytopic origin/topic" performs the latter two actions
when creating your 'mytopic' branch.
If the configuration variable branch.autosetupmerge is set to true, you
do not have to pass the --track option explicitly; further patches in
this series allow setting the variable with a "git remote add" option.
The configuration variable is off by default, and there is a --no-track
option to countermand it even if the variable is set.
Signed-off-by: Paolo Bonzini <bonzini@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Not all platforms have declared 'unsigned long' to be a 64 bit value,
but we want to support a 64 bit packfile (or close enough anyway)
in the near future as some projects are getting large enough that
their packed size exceeds 4 GiB.
By using off_t, the POSIX type that is declared to mean an offset
within a file, we support whatever maximum file size the underlying
operating system will handle. For most modern systems this is up
around 2^60 or higher.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
As we permit up to 2^32-1 objects in a single packfile we cannot
use a signed int to represent the object offset within a packfile,
after 2^31-1 objects we will start seeing negative indexes and
error out or compute bad addresses within the mmap'd index.
This is a minor cleanup that does not introduce any significant
logic changes. It is roach free.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We did not detect broken loose object files, either when
underlying inflate() signalled the breakage, nor inflate()
finished and we had garbage trailing at the end. We do better
now.
We also make unpack_sha1_file() a static function to
sha1_file.c, since it is not used by anybody outside.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Some file systems that can host git repositories and their working copies
do not support symbolic links. But then if the repository contains a symbolic
link, it is impossible to check out the working copy.
This patch enables partial support of symbolic links so that it is possible
to check out a working copy on such a file system. A new flag
core.symlinks (which is true by default) can be set to false to indicate
that the filesystem does not support symbolic links. In this case, symbolic
links that exist in the trees are checked out as small plain files, and
checking in modifications of these files preserve the symlink property in
the database (as long as an entry exists in the index).
Of course, this does not magically make symbolic links work on such defective
file systems; hence, this solution does not help if the working copy relies
on that an entry is a real symbolic link.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows for keeping the common idiom which consists of using
negative values to signal error conditions by ensuring that the enum
will be a signed type.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Now, show_date() can print three different kinds of dates: normal,
relative and short (%Y-%m-%s) dates.
To achieve this, the "int relative" was changed to "enum date_mode
mode", which has three states: DATE_NORMAL, DATE_RELATIVE and
DATE_SHORT.
Since existing users of show_date() only call it with relative_date
being either 0 or 1, and DATE_NORMAL and DATE_RELATIVE having these
values, no behaviour is changed.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We currently have two parallel notation for dealing with object types
in the code: a string and a numerical value. One of them is obviously
redundent, and the most used one requires more stack space and a bunch
of strcmp() all over the place.
This is an initial step for the removal of the version using a char array
found in object reading code paths. The patch is unfortunately large but
there is no sane way to split it in smaller parts without breaking the
system.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lt/crlf:
Teach core.autocrlf to 'git apply'
t0020: add test for auto-crlf
Make AutoCRLF ternary variable.
Lazy man's auto-CRLF
* jc/apply-config:
t4119: test autocomputing -p<n> for traditional diff input.
git-apply: guess correct -p<n> value for non-git patches.
git-apply: notice "diff --git" patch again
Fix botched "leak fix"
t4119: add test for traditional patch and different p_value
apply: fix memory leak in prefix_one()
git-apply: require -p<n> when working in a subdirectory.
git-apply: do not lose cwd when run from a subdirectory.
Teach 'git apply' to look at $HOME/.gitconfig even outside of a repository
Teach 'git apply' to look at $GIT_DIR/config
When we do not trust executable bit from lstat(2), we copied
existing ce_mode bits without checking if the filesystem object
is a regular file (which is the only thing we apply the "trust
executable bit" business) nor if the blob in the index is a
regular file (otherwise, we should do the same as registering a
new regular file, which is to default non-executable).
Noticed by Johannes Sixt.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It currently does NOT know about file attributes, so it does its
conversion purely based on content. Maybe that is more in the "git
philosophy" anyway, since content is king, but I think we should try to do
the file attributes to turn it off on demand.
Anyway, BY DEFAULT it is off regardless, because it requires a
[core]
AutoCRLF = true
in your config file to be enabled. We could make that the default for
Windows, of course, the same way we do some other things (filemode etc).
But you can actually enable it on UNIX, and it will cause:
- "git update-index" will write blobs without CRLF
- "git diff" will diff working tree files without CRLF
- "git checkout" will write files to the working tree _with_ CRLF
and things work fine.
Funnily, it actually shows an odd file in git itself:
git clone -n git test-crlf
cd test-crlf
git config core.autocrlf true
git checkout
git diff
shows a diff for "Documentation/docbook-xsl.css". Why? Because we have
actually checked in that file *with* CRLF! So when "core.autocrlf" is
true, we'll always generate a *different* hash for it in the index,
because the index hash will be for the content _without_ CRLF.
Is this complete? I dunno. It seems to work for me. It doesn't use the
filename at all right now, and that's probably a deficiency (we could
certainly make the "is_binary()" heuristics also take standard filename
heuristics into account).
I don't pass in the filename at all for the "index_fd()" case
(git-update-index), so that would need to be passed around, but this
actually works fine.
NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours
truly. I will not guarantee that they work at all reasonable. Caveat
emptor. But it _is_ simple, and it _is_ safe, since it's all off by
default.
The patch is pretty simple - the biggest part is the new "convert.c" file,
but even that is really just basic stuff that anybody can write in
"Teaching C 101" as a final project for their first class in programming.
Not to say that it's bug-free, of course - but at least we're not talking
about rocket surgery here.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since "git log origin/master" uses dwim_log() to match
"refs/remotes/origin/master", it makes sense to do that for
"git log --reflog", too.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The new interface allows an application to temporarily hash a
small number of objects and pretend that they are available in
the object store without actually writing them.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Back when only handful commands that created commit and tag were
the only users of committer identity information, it made sense
to explicitly call setup_ident() to pre-fill the default value
from the gecos information. But it is much simpler for programs
to make the call automatic when get_ident() is called these days,
since many more programs want to use the information when updating
the reflog.
Signed-off-by: Junio C Hamano <junkio@cox.net>
The code that uses committer_info() in reflog can barf and die
whenever it is asked to update a ref. And I do not think
calling ignore_missing_committer_name() upfront like recent
receive-pack did in the aplication is a reasonable workaround.
What the patch does.
- git_committer_info() takes one parameter. It used to be "if
this is true, then die() if the name is not available due to
bad GECOS, otherwise issue a warning once but leave the name
empty". The reason was because we wanted to prevent bad
commits from being made by git-commit-tree (and its
callers). The value 0 is only used by "git var -l".
Now it takes -1, 0 or 1. When set to -1, it does not
complain but uses the pw->pw_name when name is not
available. Existing 0 and 1 values mean the same thing as
they used to mean before. 0 means issue warnings and leave
it empty, 1 means barf and die.
- ignore_missing_committer_name() and its existing caller
(receive-pack, to set the reflog) have been removed.
- git-format-patch, to come up with the phoney message ID when
asked to thread, now passes -1 to git_committer_info(). This
codepath uses only the e-mail part, ignoring the name. It
used to barf and die. The other call in the same program
when asked to add signed-off-by line based on committer
identity still passes 1 to make sure it barfs instead of
adding a bogus s-o-b line.
- log_ref_write in refs.c, to come up with the name to record
who initiated the ref update in the reflog, passes -1. It
used to barf and die.
The last change means that git-update-ref, git-branch, and
commit walker backends can now be used in a repository with
reflog by somebody who does not have the user identity required
to make a commit. They all used to barf and die.
I've run tests and all of them seem to pass, and also tried "git
clone" as a user whose GECOS is empty -- git clone works again
now (it was broken when reflog was enabled by default).
But this definitely needs extra sets of eyeballs.
Signed-off-by: Junio C Hamano <junkio@cox.net>
For example, it makes no sense to check the presence of a file
named "HEAD" when calling "git log HEAD" in a bare repository.
Noticed by Han-Wen Nienhuys.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Originally I introduced read_or_die for the purpose of reading
the pack header and trailer, and I was too lazy to print proper
error messages.
Linus Torvalds <torvalds@osdl.org>:
> For a read error, at the very least you have to say WHICH FILE
> couldn't be read, because it's usually a matter of some file just
> being too short, not some system-wide problem.
and of course Linus is right. Make it so.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/bare:
Disallow working directory commands in a bare repository.
git-fetch: allow updating the current branch in a bare repository.
Introduce is_bare_repository() and core.bare configuration variable
Move initialization of log_all_ref_updates
* jc/detached-head:
git-checkout: handle local changes sanely when detaching HEAD
git-checkout: safety check for detached HEAD checks existing refs
git-checkout: fix branch name output from the command
git-checkout: safety when coming back from the detached HEAD state.
git-checkout: rewording comments regarding detached HEAD.
git-checkout: do not warn detaching HEAD when it is already detached.
Detached HEAD (experimental)
git-branch: show detached HEAD
git-status: show detached HEAD
We have a number of badly checked read() calls. Often we are
expecting read() to read exactly the size we requested or fail, this
fails to handle interrupts or short reads. Add a read_in_full()
providing those semantics. Otherwise we at a minimum need to check
for EINTR and EAGAIN, where this is appropriate use xread().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We recently introduced a write_in_full() which would either write
the specified object or emit an error message and fail. In order
to fix the read side we now want to introduce a read_in_full()
but without an error emit. This patch cleans up the naming
of this family of calls:
1) convert the existing write_or_whine() to write_or_whine_pipe()
to better indicate its pipe specific nature,
2) convert the existing write_in_full() calls to write_or_whine()
to better indicate its nature,
3) introduce a write_in_full() providing a write or fail semantic,
and
4) convert write_or_whine() and write_or_whine_pipe() to use
write_in_full().
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows "git checkout v1.4.3" to dissociate the HEAD of
repository from any branch. After this point, "git branch"
starts reporting that you are not on any branch. You can go
back to an existing branch by saying "git checkout master", for
example.
This is still experimental. While I think it makes sense to
allow commits on top of detached HEAD, it is rather dangerous
unless you are careful in the current form. Next "git checkout
master" will obviously lose what you have done, so we might want
to require "git checkout -f" out of a detached HEAD if we find
that the HEAD commit is not an ancestor of any other branches.
There is no such safety valve implemented right now.
On the other hand, the reason the user did not start the ad-hoc
work on a new branch with "git checkout -b" was probably because
the work was of a throw-away nature, so the convenience of not
having that safety valve might be even better. The user, after
accumulating some commits on top of a detached HEAD, can always
create a new branch with "git checkout -b" not to lose useful
work done while the HEAD was detached.
We'll see.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This removes the old is_bare_git_dir(const char *) to ask if a
directory, if it is a GIT_DIR, is a bare repository, and
replaces it with is_bare_repository(void *). The function looks
at core.bare configuration variable if exists but uses the old
heuristics: if it is ".git" or ends with "/.git", then it does
not look like a bare repository, otherwise it does.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/mmap: (27 commits)
Spell default packedgitlimit slightly differently
Increase packedGit{Limit,WindowSize} on 64 bit systems.
Update packedGit config option documentation.
mmap: set FD_CLOEXEC for file descriptors we keep open for mmap()
pack-objects: fix use of use_pack().
Fix random segfaults in pack-objects.
Cleanup read_cache_from error handling.
Replace mmap with xmmap, better handling MAP_FAILED.
Release pack windows before reporting out of memory.
Default core.packdGitWindowSize to 1 MiB if NO_MMAP.
Test suite for sliding window mmap implementation.
Create pack_report() as a debugging aid.
Support unmapping windows on 'temporary' packfiles.
Improve error message when packfile mmap fails.
Ensure core.packedGitWindowSize cannot be less than 2 pages.
Load core configuration in git-verify-pack.
Fully activate the sliding window pack access.
Unmap individual windows rather than entire files.
Document why header parsing won't exceed a window.
Loop over pack_windows when inflating/accessing data.
...
Conflicts:
cache.h
pack-check.c
It was stupid to link the same element twice to lock_file_list
and end up in a loop, so we certainly need a fix.
But it is not like we are taking a lock on multiple files in
this case. It is just that we leave the linked element on the
list even after commit_lock_file() successfully removes the
cruft.
We cannot remove the list element in commit_lock_file(); if we
are interrupted in the middle of list manipulation, the call to
remove_lock_file_on_signal() will happen with a broken list
structure pointed by lock_file_list, which would cause the cruft
to remain, so not removing the list element is the right thing
to do. Instead we should be reusing the element already on the
list.
There is already a code for that in lock_file() function in
lockfile.c. The code checks lk->next and the element is linked
only when it is not already on the list -- which is incorrect
for the last element on the list (which has NULL in its next
field), but if you read the check as "is this element already on
the list?" it actually makes sense. We do not want to link it
on the list again, nor we would want to set up signal/atexit
over and over.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When passing the revisions list to pack-objects we do not check for
errors nor short writes. Introduce a new write_in_full which will
handle short writes and report errors to the caller. Use this to
short cut the send on failure, allowing us to wait for and report
the child in case the failure is its fault.
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Much like the alloc_report() function can be useful to report on
object allocation statistics while debugging the new pack_report()
function can be useful to report on the behavior of the mmap window
code used for packfile access.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.
We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset. This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.
In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window. This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.
I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing. We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Part of the implementation concept of the sliding mmap window for
pack access is to permit multiple windows per pack to be mapped
independently. Since the inuse_cnt is associated with the mmap and
not with the file, this value is in struct pack_window and needs to
be incremented/decremented for each pack_window accessed by any code.
To faciliate that implementation we need to replace all uses of
use_packed_git() and unuse_packed_git() with a different API that
follows struct pack_window objects rather than struct packed_git.
The way this works is when we need to start accessing a pack for
the first time we should setup a new window 'cursor' by declaring
a local and setting it to NULL:
struct pack_windows *w_curs = NULL;
To obtain the memory region which contains a specific section of
the pack file we invoke use_pack(), supplying the address of our
current window cursor:
unsigned int len;
unsigned char *addr = use_pack(p, &w_curs, offset, &len);
the returned address `addr` will be the first byte at `offset`
within the pack file. The optional variable len will also be
updated with the number of bytes remaining following the address.
Multiple calls to use_pack() with the same window cursor will
update the window cursor, moving it from one window to another
when necessary. In this way each window cursor variable maintains
only one struct pack_window inuse at a time.
Finally before exiting the scope which originally declared the window
cursor we must invoke unuse_pack() to unuse the current window (which
may be different from the one that was first obtained from use_pack):
unuse_pack(&w_curs);
This implementation is still not complete with regards to multiple
windows, as only one window per pack file is supported right now.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
To efficiently support mmaping of multiple regions of the same pack
file we want to keep the pack's file descriptor open while we are
actively working with that pack. So we are now keeping that file
descriptor in packed_git.pack_fd and closing it only after we unmap
the last window.
This is going to increase the number of file descriptors that are
in use at once, however that will be bounded by the total number of
pack files present and therefore should not be very high. It is
a small tradeoff which we may need to revisit after some testing
can be done on various repositories and systems.
For code clarity we also want to seperate out the implementation
of how we open a pack file from the implementation which locates
a suitable window (or makes a new one) from the given pack file.
Since this is a rather large delta I'm taking advantage of doing
it now, in a fairly isolated change.
When we open a pack file we need to examine the header and trailer
without having a mmap in place, as we may only need to mmap
the middle section of this particular pack. Consequently the
verification code has been refactored to make use of the new
read_or_die function.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like write_or_die read_or_die reads the entire length requested
or it kills the current process with a die call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since the index_size and pack_size members of struct packed_git
are the lengths of those corresponding files we should use the
off_t size of the operating system to store these file lengths,
rather than an unsigned long. This would help in the future should
we ever resurrect Junio's 64 bit index implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The idea behind the sliding mmap window pack reader implementation
is to have multiple mmap regions active against the same pack file,
thereby allowing the process to mmap in only the active/hot sections
of the pack and reduce overall virtual address space usage.
To implement this we need to refactor the mmap related data
(pack_base, pack_use_cnt) out of struct packed_git and move them
into a new struct pack_window.
We are refactoring the code to support a single struct pack_window
per packfile, thereby emulating the prior behavior of mmap'ing the
entire pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Rather than hardcoding the maximum number of bytes which can be
mmapped from pack files we should make this value configurable,
allowing the end user to increase or decrease this limit on a
per-repository basis depending on the size of the repository
and the capabilities of their operating system.
In general users should not need to manually tune such a low-level
setting within the core code, but being able to artifically limit
the number of bytes which we can mmap at once from pack files will
make it easier to craft test cases for the new mmap sliding window
implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The unpack_entry_gently function currently has only two callers:
the delta base resolution in sha1_file.c and the main loop of
pack-check.c. Both of these must change to using unpack_entry
directly when we implement sliding window mmap logic, so I'm doing
it earlier to help break down the change set.
This may cause a slight performance decrease for delta base
resolution as well as for pack-check.c's verify_packfile(), as
the pack use counter will be incremented and decremented for every
object that is unpacked.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
We broke the discipline Linus set up to allow compiler help us
avoid typos in environment names in the early days of git over
time. This defines a handful preprocessor constants for
environment variable names used in relatively core parts of the
system.
I've left out variable names specific to subsystems such as HTTP
and SSL as I do not think they are big problems.
Signed-off-by: Junio C Hamano <junkio@cox.net>
If GIT_COMMITTER_NAME is not available in receive-pack but reflogs
are enabled we would normally die out with an error message asking
the user to correct their environment settings.
Now that reflogs are enabled by default in (what we guessed to be)
non-bare Git repositories this may cause problems for some users
who don't have their full name in the gecos field and who don't
have access to the remote system to correct the problem.
So rather than die()'ing out in receive-pack when we try to log a
ref change and have no committer name we default to the username,
as obtained from the host's password database.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Given a config like this:
# A config
[very.interesting.section]
not
The command
$ git repo-config --rename-section very.interesting.section bla.1
will lead to this config:
# A config
[bla "1"]
not
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
New and experienced Git users alike are finding out too late that
they forgot to enable reflogs in the current repository, and cannot
use the information stored within it to recover from an incorrectly
entered command such as `git reset --hard HEAD^^^` when they really
meant HEAD^^ (aka HEAD~2).
So enable reflogs by default in all future versions of Git, unless
the user specifically disables it with:
[core]
logAllRefUpdates = false
in their .git/config or ~/.gitconfig.
We only enable reflogs in repositories that have a working directory
associated with them, as shared/bare repositories do not have
an easy means to prune away old log entries, or may fail logging
entirely if the user's gecos information is not valid during a push.
This heuristic was suggested on the mailing list by Junio.
Documentation was also updated to indicate the new default behavior.
We probably should start to teach usuing the reflog to recover
from mistakes in some of the tutorial material, as new users are
likely to make a few along the way and will feel better knowing
they can recover from them quickly and easily, without fsck-objects'
lost+found features.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since functions in fetch-clone.c were only used from fetch-pack.c,
its content has been merged with fetch-pack.c. This allows for better
coupling of features with much simpler implementations.
One new thing is that the (abscence of) --thin also enforce it on
index-pack now, such that index-pack will abort if a thin pack was
_not_ asked for.
The -k or --keep, when provided twice, now causes the fetched pack
to be left as a kept pack just like receive-pack currently does.
Eventually this will be used to close a race against concurrent
repacking.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since keeping a pushed pack or exploding it into loose objects
should be a local repository decision this teaches receive-pack
to decide if it should call unpack-objects or index-pack --stdin
--fix-thin based on the setting of receive.unpackLimit and the
number of objects contained in the received pack.
If the number of objects (hdr_entries) in the received pack is
below the value of receive.unpackLimit (which is 5000 by default)
then we unpack-objects as we have in the past.
If the hdr_entries >= receive.unpackLimit then we call index-pack and
ask it to include our pid and hostname in the .keep file to make it
easier to identify why a given pack has been kept in the repository.
Currently this leaves every received pack as a kept pack. We really
don't want that as received packs will tend to be small. Instead we
want to delete the .keep file automatically after all refs have
been updated. That is being left as room for future improvement.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* lj/refs: (63 commits)
Fix show-ref usagestring
t3200: git-branch testsuite update
sha1_name.c: avoid compilation warnings.
Make git-branch a builtin
ref-log: fix D/F conflict coming from deleted refs.
git-revert with conflicts to behave as git-merge with conflicts
core.logallrefupdates thinko-fix
git-pack-refs --all
core.logallrefupdates create new log file only for branch heads.
Remove bashism from t3210-pack-refs.sh
ref-log: allow ref@{count} syntax.
pack-refs: call fflush before fsync.
pack-refs: use lockfile as everybody else does.
git-fetch: do not look into $GIT_DIR/refs to see if a tag exists.
lock_ref_sha1_basic does not remove empty directories on BSD
Do not create tag leading directories since git update-ref does it.
Check that a tag exists using show-ref instead of looking for the ref file.
Use git-update-ref to delete a tag instead of rm()ing the ref file.
Fix refs.c;:repack_without_ref() clean-up path
Clean up "git-branch.sh" and add remove recursive dir test cases.
...
The 'receive.denynonfastforwards' option has nothing to do with
the repository format version. Since receive-pack already uses
git_config to initialize itself before executing any updates we
can use the normal configuration strategy and isolate the receive
specific variables away from the core variables.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* np/pack:
add the capability for index-pack to read from a stream
index-pack: compare only the first 20-bytes of the key.
git-repack: repo.usedeltabaseoffset
pack-objects: document --delta-base-offset option
allow delta data reuse even if base object is a preferred base
zap a debug remnant
let the GIT native protocol use offsets to delta base when possible
make pack data reuse compatible with both delta types
make git-pack-objects able to create deltas with offset to base
teach git-index-pack about deltas with offset to base
teach git-unpack-objects about deltas with offset to base
introduce delta objects with offset to base
Most callers of write_sha1_file_prepare() are only interested in the
resulting hash but don't care about the returned file name or the header.
This patch adds a simple wrapper named hash_sha1_file() which does just
that, and converts potential callers.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <junkio@cox.net>