A new diffcore transformation, diffcore-break.c, is introduced.
When the -B flag is given, a patch that represents a complete
rewrite is broken into a deletion followed by a creation. This
makes it easier to review such a complete rewrite patch.
The -B flag takes the same syntax as the -M and -C flags to
specify the minimum amount of non-source material the resulting
file needs to have to be considered a complete rewrite, and
defaults to 99% if not specified.
As the new test t4008-diff-break-rewrite.sh demonstrates, if a
file is a complete rewrite, it is broken into a delete/create
pair, which can further be subjected to the usual rename
detection if -M or -C is used. For example, if file0 gets
completely rewritten to make it as if it were rather based on
file1 which itself disappeared, the following happens:
The original change looks like this:
file0 --> file0' (quite different from file0)
file1 --> /dev/null
After diffcore-break runs, it would become this:
file0 --> /dev/null
/dev/null --> file0'
file1 --> /dev/null
Then diffcore-rename matches them up:
file1 --> file0'
The internal score values are finer grained now. Earlier
maximum of 10000 has been raised to 60000; there is no user
visible changes but there is no reason to waste available bits.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The second round similarity estimator simply used the size of
the xdelta itself to estimate the extent of damage. This patch
keeps that logic to detect big insertions to terminate the check
early, but otherwise looks at the generated delta in order to
estimate the extent of edit more accurately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This applies git patches (and old-style unified diffs)
in the index, rather than doing it in the working directory.
That allows for a lot more flexibility, and means that if a
patch fails, we aren't going to mess up the working directory.
NOTE! This is just the first cut at it, and right now it only
parses the incoming patch, it doesn't actually apply it yet.
Thomas Glanzmann points out that it doesn't work well with different
clients accessing the repository over NFS - they have different views
on what the "device" for the filesystem is.
Of course, other filesystems may not even have stable inode numbers.
But we don't care. At least for now.
This moves the path selection logic from individual programs to a new
diffcore transformer (diff-tree still needs to have its own for
performance reasons). Also the header printing code in diff-tree was
tweaked not to produce anything when pickaxe is in effect and there is
nothing interesting to report. An interesting example is the following
in the GIT archive itself:
$ git-whatchanged -p -C -S'or something in a real script'
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This steals the "pickaxe" feature from JIT and make it available
to the bare Plumbing layer. From the command line, the user
gives a string he is intersted in.
Using the diff-core infrastructure previously introduced, it
filters the differences to limit the output only to the diffs
between <src> and <dst> where the string appears only in one but
not in the other. For example:
$ ./git-rev-list HEAD | ./git-diff-tree -Sdiff-tree-helper --stdin -M
would show the diffs that touch the string "diff-tree-helper".
In real software-archaeologist application, you would typically
look for a few to several lines of code and see where that code
came from.
The "pickaxe" module runs after "rename/copy detection" module,
so it even crosses the file rename boundary, as the above
example demonstrates.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This introduces the diff-core, the layer between the diff-tree
family and the external diff interface engine. The calls to the
interface diff-tree family uses (diff_change and diff_addremove)
have not changed and will not change. The purpose of the
diff-core layer is to provide an infrastructure to transform the
set of differences sent from the applications, before sending
them to the external diff interface.
The recently introduced rename detection code has been rewritten
to use the diff-core facility. When applications send in
separate creates and deletes, matching ones are transformed into
a single rename-and-edit diff, and sent out to the external diff
interface as such.
This patch also enhances the rename detection code further to be
able to detect copies. Currently this happens only as long as
copy sources appear as part of the modified files, but there
already is enough provision for callers to report unmodified
files to diff-core, so that they can be also used as copy source
candidates. Extending the callers this way will be done in a
separate patch.
Please see and marvel at how well this works by trying out the
newly added t/t4003-diff-rename-1.sh test script.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds the ability to actually create delta objects using a new tool:
git-mkdelta. It uses an ordered list of potential objects to deltafy
against earlier objects in the list. A cap on the depth of delta
references can be provided as well, otherwise the default is to not have
any limit. A limit of 0 will also undeltafy any given object.
Also provided is the beginning of a script to deltafy an entire
repository.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds knowledge of delta objects to fsck-cache and various object
parsing code. A new switch to git-fsck-cache is provided to display the
maximum delta depth found in a repository.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With -u flag, git-checkout-cache picks up the stat information
from newly created file and updates the cache. This removes the
need to run git-update-cache --refresh immediately after running
git-checkout-cache.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds the basic library functions to create and replay delta
information. Also included is a test-delta utility to validate the
code.
diff-delta was based on LibXDiff written by Davide Libenzi
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The documentation of the test harness still refer to old
numbering and also contains an obvious typo.
Also "make test" should be run after making sure we have built
all binaries, since test is designed to test the newly built
ones.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Petr Baudis <pasky@ucw.cz>
It used to be that diff-tree needed helper support to parse its
raw output to generate diffs, but these days git-diff-* family
produces the same output and the helper is not tied to diff-tree
anymore. Drop "tree" from its name.
This commit is done separately to record just the rename and no
file content changes. The changes in the renamed files are recorded
in the next commit.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Bundled with the changes in the unrenamed files.
Signed-off-by: Petr Baudis <pasky@ucw.cz>
Mark Allen had trouble with building GIT on his Darwin and
posted a patch to link with -lcrypto instead of -lssl on Darwin.
Later Daniel Barkalow suggested to change it for everybody who
uses openssl, because the relevant functionality is in -lcrypto
not in -lssl, and the current linking happens to work only
because -lssl pulls in -lcrypto.
Signed-off-by: Junio C Hamano <junkio@cox.net>
H. Peter Anvin mentioned that using SHA1_whatever as an
environment variable name is not nice and we should instead use
names starting with "GIT_" prefix to avoid conflicts. Here is
what this patch does:
* Renames the following environment variables:
New name Old Name
GIT_AUTHOR_DATE AUTHOR_DATE
GIT_AUTHOR_EMAIL AUTHOR_EMAIL
GIT_AUTHOR_NAME AUTHOR_NAME
GIT_COMMITTER_EMAIL COMMIT_AUTHOR_EMAIL
GIT_COMMITTER_NAME COMMIT_AUTHOR_NAME
GIT_ALTERNATE_OBJECT_DIRECTORIES SHA1_FILE_DIRECTORIES
GIT_OBJECT_DIRECTORY SHA1_FILE_DIRECTORY
* Introduces a compatibility macro, gitenv(), which does an
getenv() and if it fails calls gitenv_bc(), which in turn
picks up the value from old name while giving a warning about
using an old name.
* Changes all users of the environment variable to fetch
environment variable with the new name using gitenv().
* Updates the documentation and scripts shipped with Linus GIT
distribution.
The transition plan is as follows:
* We will keep the backward compatibility list used by gitenv()
for now, so the current scripts and user environments
continue to work as before. The users will get warnings when
they have old name but not new name in their environment to
the stderr.
* The Porcelain layers should start using new names. However,
just in case it ends up calling old Plumbing layer
implementation, they should also export old names, taking
values from the corresponding new names, during the
transition period.
* After a transition period, we would drop the compatibility
support and drop gitenv(). Revert the callers to directly
call getenv() but keep using the new names.
The last part is probably optional and the transition
duration needs to be set to a reasonable value.
Signed-off-by: Junio C Hamano <junkio@cox.net>
On Solaris machines gnu install called ginstall
<JC> Editorial notes. I've also changed it to use $(COPTS), $(prefix),
and $(bin) because I always get confused without compiling it with -O1
when I single step in gdb. The default is left as Linus shipped.
Date: Sat, 7 May 2005 10:41:54 +0200
Signed-off-by: Thomas Glanzmann <sithglan@stud.uni-erlangen.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Separate out the merge resolve from the actual getting of the
data. Also, update the resolve phase to take advantage of the
fact that we don't need to do the commit->tree object lookup
by hand, since all the actors involved happily just act on a
commit object these days.
A new command, git-write-blob, is introduced. This registers
the contents of any file on the filesystem as a blob in the
object database and reports its SHA1 to the standard output.
To implement it, the patch promotes index_fd() from a static
function in update-cache.c to extern and moves it to a library
source, sha1_file.c.
This command is used to update git-merge-one-file-script so that
it does not smudge the work tree.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds the git-local-pull command as a smaller brother of
http-pull and rpull.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I said:
- Stop attempting to be compatible with cg-patch, and drop
(mode:XXXXXX) bits from the diff.
- Do keep the /dev/null change for created and deleted case.
- No "Index:" line, no "Mode change:" line, anywhere in the
output. Anything that wants the mode bits and sha1 hash can
do things from GIT_EXTERNAL_DIFF mechanism. Maybe document
suggested usage better.
This adds an example script git-apply-patch-script, that can be
used as the GIT_EXTERNAL_DIFF to apply changes between two trees
directly on the current work tree, like this:
GIT_EXTERNAL_DIFF=git-apply-patch-script git-diff-tree -p <tree> <tree>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The method for deciding what to pull is useful separately from any of the
ways of actually fetching the objects.
So split out "pull" functionality from http-pull and rpull
Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
...since everything out there is either strange (libc mktime has issues
with timezones) or introduces unnecessary dependencies for people (libcurl).
This goes back to the old date parsing, but moves it out into a file of
its own, and does the "struct tm" to "seconds since epoch" handling by
hand.
I grepped through the tz-database and it seems there's one "country"
left that has non-60-minute DST: Lord Howe Island. All others dropped
that before 1970.
This switches git-commit-tree to using curl_getdate() for the
AUTHOR_DATE, and thus fixes the problem with "mktime()" parsing dates in
the local timezone. It also ends up being more permissive about the
format of the date.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is another. This one belongs to a clean-up category.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This also regularizes the make. The source files themselves don't get
the "git-" prefix, because that's just inconvenient. So instead we just
make the rule that "git-xxxx" depends on "xxxx.c", and do that for
all the core programs (ie the old "git-mktag.c" got renamed to just
"mktag.c" to match everything else).
And "show-diff" got renamed to "git-diff-files" while at it, since
that's what it really should be to match the other git-diff-xxx cases.
This is an improved version of tar-tree, a streaming archive creator for
GIT. The major added feature is blocking; all write(2) calls now have a
size of 10240, just as GNU tar (and tape drives) likes them. The
buffering overhead does not seem to degrade performance because most
files in the repositories I tested this with are smaller than 10KB, so
we need fewer system calls.
File names are still restricted to 500 bytes and the archive format
currently only allows for files up to 8GB. Both restrictions can be
lifted if need be with more pax extended headers.
The archive format used is the pax interchange format, i.e. POSIX tar
format. It can be read by (and created with) GNU tar. If I read the
specs correctly tar-tree should now be standards compliant (modulo
bugs).
Because it streams the archive (think ls-tree merged with cat-file),
tar-tree doesn't need to create any temporary files. That makes it
quite fast.
It accepts tree IDs and commit IDs as first parameter. In the latter
case tar-tree tries to get the commit date out of the committer line.
Else all files in the archive are time-stamped with the current time.
An optional second parameter is used as a path prefix for all files in
the archive. Example:
$ tar-tree a2755a80f40e5794ddc20e00f781af9d6320fafb \
linux-2.6.12-rc3 | bzip9 -9 > linux-2.6.12-rc3.tar.bz2
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This adds preliminary support for tags in the library. It doesn't even
store the signature, however, let alone provide any way of checking it.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch introduces a new program, diff-tree-helper. It reads
output from diff-cache and diff-tree, and produces a patch file.
The diff format customization can be done the same way the
show-diff uses; the same external diff interface introduced by
the previous patch to drive diff from show-diff is used so this
is not surprising.
It is used like the following examples:
$ diff-cache --cached -z <tree> | diff-tree-helper -z -R paths...
$ diff-tree -r -z <tree1> <tree2> | diff-tree-helper -z paths...
- As usual, the use of the -z flag is recommended in the script
to pass NUL-terminated filenames through the pipe between
commands.
- The -R flag is used to generate reverse diff. It does not
matter for diff-tree case, but it is sometimes useful to get
a patch in the desired direction out of diff-cache.
- The paths parameters are used to restrict the paths that
appears in the output. Again this is useful to use with
diff-cache, which, unlike diff-tree, does not take such paths
restriction parameters.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With this patch, the non-core'ish part of show-diff command that
invokes an external "diff" comand to obtain patches is split
into a separate file. The next patch will introduce a new
command, diff-tree-helper, which uses this common diff interface
to format diff-tree and diff-cache output into a patch form.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch adds three similar and related programs. http-pull downloads
objects from an HTTP server; rpull downloads objects by using ssh and
rpush on the other side; and rpush uploads objects by using ssh and rpull
on the other side.
The algorithm should be sufficient to make the network throughput required
depend only on how much content is new, not at all on how much content the
repository contains.
The combination should enable people to have remote repositories by way of
ssh login for authenticated users and HTTP for anonymous access.
Signed-Off-By: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Here is a SHA1 implementation with the core written in PPC assembly.
On my 2GHz G5, it does 218MB/s, compared to 135MB/s for the openssl
version or 45MB/s for the mozilla version.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This one includes the Mozilla SHA1 implementation sent in by Edgar Toernig.
It's dual-licenced under MPL-1.1 or GPL, so in the context of git, we
obviously use the GPL version.
Side note: the Mozilla SHA1 implementation is about twice as fast as the
default openssl one on my G5, but the default openssl one has optimized
x86 assembly language on x86. So choose wisely.
Use a generic rule for executables that depend only on the corresponding
.o and on $(LIB_FILE).
Signed-Off-By: Andre Noll <maan@systemlinux.org>
Signed-Off-By: Linus Torvalds <torvalds@osdl.org>
the current cache state and/or working directory.
Very useful to see what has changed since the last commit, either in
the index file or in the whole working directory.
Also very possibly very buggy. Matching the two up is not entirely
trivial.