Rather than sorting the refs list while building it, sort in one
go after it is built using a merge sort. This has a large
performance boost with large numbers of refs.
It shouldn't happen that we read duplicate entries into the same
list, but just in case sort_ref_list drops them if the SHA1s are
the same, or dies, as we have no way of knowing which one is the
correct one.
Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
git-shortlog: Fix two formatting errors in asciidoc documentation
Fix overwriting of files when applying contextually independent diffs
git-svn: don't allow globs to match regular files
It was bothering me a lot that I abused small integer values
casted to (void *) to represent non string values in
gitattributes. This corrects it by making the type of attribute
values (const char *), and using the address of a few statically
allocated character buffer to denote true/false. Unset attributes
are represented as having NULLs as their values.
Added in-header documentation to explain how git_checkattr()
routine should be called.
Signed-off-by: Junio C Hamano <junkio@cox.net>
First use [verse] in the SYNOPSIS so that the line break actually
shows.
Secondly drop the quotes around '.mailmap' since this exposes
a bug in our toolchain (didn't bother enough yet to find out wether
it is asciidoc's fault or that of the XSL templates) that leads to
the dot not getting escaped correctly in the roff output and thereby
swallowing the line.
Signed-off-by: Frank Lichtenheld <frank@lichtenheld.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Noticed by applying two diffs of different contexts to the same file.
The check for existence of a file was wrong: the test assumed it was
a directory and reset the errno (twice: directly and by calling
lstat). So if an entry existed and was _not_ a directory no attempt
was made to rename into it, because the errno (expected by renaming
code) was already reset to 0. This resulted in error:
fatal: unable to write file file mode 100644
For Linux, removing "errno = 0" is enough, as lstat wont modify errno
if it was successful. The behavior should not be depended upon,
though, so modify the "if" as well.
The test simulates this situation.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
git only tracks the histories of full directories, not
that of individual files. Sometimes, SVN users will
place[1] a regular file in the directory designated
for subdirectories of branches or tags.
Thanks to jrockway on #git for pointing this out.
[1] mistakenly or otherwise, such as a README
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows [merge "drivername"] to have a variable "recursive"
that names a different low-level merge driver to be used when
merging common ancestors to come up with a virtual ancestor.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This changes the configuration syntax for defining a low-level
merge driver to be:
[merge "<<drivername>>"]
driver = "<<command line>>"
name = "<<driver description>>"
which is much nicer to read and is extensible. Credit goes to
Martin Waitz and Linus.
In addition, when we use an external low-level merge driver, it
is reported as an extra output from merge-recursive, using the
value of merge.<<drivername>.name variable.
The demonstration in t6026 has also been updated.
Signed-off-by: Junio C Hamano <junkio@cox.net>
When no 'merge' attribute is given to a path, merge-recursive
uses the built-in xdl-merge as the low-level merge driver.
A new configuration item 'merge.default' can name a low-level
merge driver of user's choice to be used instead.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows users to specify custom low-level merge driver per
path, using the attributes mechanism. Just like you can specify
one of built-in "text", "binary", "union" low-level merge
drivers by saying:
* merge=text
.gitignore merge=union
*.jpg merge=binary
pick a name of your favorite merge driver, and assign it as the
value of the 'merge' attribute.
A custom low-level merge driver is defined via the config
mechanism. This patch introduces 'merge.driver', a multi-valued
configuration. Its value is the name (i.e. the one you use as
the value of 'merge' attribute) followed by a command line
specification. The command line can contain %O, %A, and %B to
be interpolated with the names of temporary files that hold the
common ancestor version, the version from your branch, and the
version from the other branch, and the resulting command is
spawned.
The low-level merge driver is expected to update the temporary
file for your branch (i.e. %A) with the result and exit with
status 0 for a clean merge, and non-zero status for a conflicted
merge.
A new test in t6026 demonstrates a sample usage.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* fl/cvsserver:
config.txt: Add gitcvs.db* variables
cvsserver: Document the GIT branches -> CVS modules mapping more prominently
cvsserver: Reword documentation on necessity of write access
cvsserver: Allow to "add" a removed file
cvsserver: Add asciidoc documentation for new database backend configuration
cvsserver: Corrections to the database backend configuration
cvsserver: Use DBI->table_info instead of DBI->tables
cvsserver: Abort if connect to database fails
cvsserver: Make the database backend configurable
cvsserver: Allow to override the configuration per access method
cvsserver: Handle three part keys in git config correctly
cvsserver: Introduce new state variable 'method'
Conflicts:
Documentation/config.txt
delete_ref function does not change the 'sha1' parameter. Non-const pointer
causes a compiler warning if you call to the function using a const argument.
Signed-off-by: Carlos Rica <jasampler@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
Start preparing for 1.5.1.2
git-svn: quiet some warnings when run only with --version/--help
git-svn: respect lower bound of -r/--revision when following parent
Conflicts:
RelNotes
* 'master' of git://repo.or.cz/git-gui:
git-gui: Honor TCLTK_PATH if supplied
Revert "Allow wish interpreter to be defined with TCLTK_PATH"
git-gui: Display the directory basename in the title
git-gui: Brown paper bag fix division by 0 in blame
Always bind the return key to the default button
Do not break git-gui messages into multiple lines.
Improve look-and-feel of the git-gui tool.
Teach git-gui to use the user-defined UI font everywhere.
Allow wish interpreter to be defined with TCLTK_PATH
* jc/read-tree-df:
t3030: merge-recursive backend test.
merge-recursive: handle D/F conflict case more carefully.
merge-recursive: do not barf on "to be removed" entries.
Treat D/F conflict entry more carefully in unpack-trees.c::threeway_merge()
t1000: fix case table.
This demonstrates how the new low-level per-path merge backends,
union and ours, work, and shows how they are controlled by the
gitattribute mechanism.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows 'merge' attribute to control how the file-level
three-way merge is done per path.
- If you set 'merge' to true, leave it unspecified, or set it
to "text", we use the built-in 3-way xdl-merge.
- If you set 'merge' to false, or set it to "binary, the
"binary" merge is done. The merge result is the blob from
'our' tree, but this still leaves the path conflicted, so
that the mess can be sorted out by the user. This is
obviously meant to be useful for binary files.
- 'merge=union' (this is the first example of a string valued
attribute, introduced in the previous one) uses the "union"
merge. The "union" merge takes lines in conflicted hunks
from both sides, which is useful for line-oriented files such
as .gitignore.
Instead fo setting merge to 'true' or 'false' by using 'merge'
or '-merge', setting it explicitly to "text" or "binary" will
become useful once we start allowing custom per-path backends to
be added, and allow them to be activated for the default
(i.e. 'merge' attribute specified to 'true' or 'false') case,
using some other mechanisms. Setting merge attribute to "text"
or "binary" will be a way to explicitly request to override such
a custom default for selected paths.
Currently there is no way to specify random programs but it
should be trivial for motivated contributors to add later.
There is one caveat, though. ll_merge() is called for both
internal ancestor merge and the outer "final" merge. I think an
interactive custom per-path merge backend should refrain from
going interactive when performing an internal merge (you can
tell it by checking call_depth) and instead just call either
ll_xdl_merge() if the content is text, or call ll_binary_merge()
otherwise.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Mimick what we do for gitk. Since you do have a source file,
git-gui.sh, which is separate from the target, it should be much
easier in git-gui's Makefile.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This reverts commit e2a1bc67d3.
Junio rightly pointed out this patch doesn't handle the
`make install` target very well:
Junio C Hamano <junkio@cox.net> writes:
> You should never generate new files in the source tree from
> 'install' target. Otherwise, the usual pattern of "make" as
> yourself and then "make install" as root would not work from a
> "root-to-nobody-squashing" NFS mounted source tree to local
> filesystem. You should know better than accepting such a patch.
These are harmless but annoying. They were introduced in
512b620bd9
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When an explicit --revision argument is specified, do not fetch
past the specified range into the beginning of history.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows you to define three values (and possibly more) to
each attribute: true, false, and unset.
Typically the handlers that notice and act on attribute values
treat "unset" attribute to mean "do your default thing"
(e.g. crlf that is unset would trigger "guess from contents"),
so being able to override a setting to an unset state is
actually useful.
- If you want to set the attribute value to true, have an entry
in .gitattributes file that mentions the attribute name; e.g.
*.o binary
- If you want to set the attribute value explicitly to false,
use '-'; e.g.
*.a -diff
- If you want to make the attribute value _unset_, perhaps to
override an earlier entry, use '!'; e.g.
*.a -diff
c.i.a !diff
This also allows string values to attributes, with the natural
syntax:
attrname=attrvalue
but you cannot use it, as nobody takes notice and acts on
it yet.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This updates git-rev-list --objects to be a bit more careful
when listing a blob object to make sure the blob actually
exists, and uses it to make sure the quick-fetch optimization we
introduced earlier is not fooled by a previous incomplete fetch.
The quick-fetch optimization works by running this command:
git rev-list --objects <<commit-list>> --not --all
where <<commit-list>> is a list of commits that we are going to
fetch from the other side. If there is any object missing to
complete the <<commit-list>>, the rev-list would fail and die
(say, the commit was in our repository, but its tree wasn't --
then it will barf while trying to list the blobs the tree
contains because it cannot read that tree).
Usually we do not have the objects (otherwise why would we
fetching?), but in one important special case we do: when the
remote repository is used as an alternate object store
(i.e. pointed by .git/objects/info/alternates). We could check
.git/objects/info/alternates to see if the remote we are
interacting with is one of them (or is used as an alternate,
recursively, by one of them), but that check is more cumbersome
than it is worth.
The above check however did not catch missing blob, because
object listing code did not read nor check blob objects, knowing
that blobs do not contain any further references to other
objects. This commit fixes it with practically unmeasurable
overhead.
I've benched this with
git rev-list --objects --all >/dev/null
in the kernel repository, with three different implementations
of the "check-blob".
- Checking with has_sha1_file() has negligible (unmeasurable)
performance penalty.
- Checking with sha1_object_info() makes it somewhat slower,
perhaps by 5%.
- Checking with read_sha1_file() to cause a fully re-validation
is prohibitively expensive (about 4 times as much runtime).
In my original patch, I had this as a command line option, but
the overhead is small enough that it is not really worth it.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This replaces the fairly odd "created_object()" function that did _most_
of the object setup with a more complete "create_object()" function that
also has a more natural calling convention.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We used to use a different allocator scheme for when we didn't know the
object type. That meant that objects that were created without any
up-front knowledge of the type would not go through the same allocation
paths as normal object allocations, and would miss out on the statistics.
But perhaps more importantly than the statistics (that are useful when
looking at memory usage but not much else), if we want to make the
object hash tables use a denser object pointer representation, we need
to make sure that they all go through the same blocking allocator.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: also fix 0a5280a9 that incorrectly changed the title of one test.]
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
With large amount of objects, check_object() is really trashing the pack
sliding map and the filesystem cache. It has a completely random access
pattern especially with old objects where delta replay jumps back and
forth all over the pack.
This patch improves things by:
1) sorting objects by their offset in pack before calling check_object()
so the pack access pattern is linear;
2) recording the object type at add_object_entry() time since it is
already known in most cases;
3) recording the pack offset even for preferred_base objects;
4) avoid calling sha1_object_info() if all possible.
This limits pack accesses to the bare minimum and makes them perfectly
linear.
In the process check_object() was made more clear (to me at least).
Note: I thought about walking the sorted_by_offset list backward in
get_object_details() so if a pack happens to be larger than the available
file cache, then the cache would have been populated with useful data from
the beginning of the pack already when find_deltas() is called. Strangely,
testing (on Linux) showed absolutely no performance difference.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
... which consists of existing code split out of packed_delta_info()
for other callers to use it as well.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It currently aliases delta_size on the principle that reused deltas won't
go through the whole delta matching loop hence delta_size was unused.
This is not true if given delta doesn't find its base in the pack though.
But we need that information even for whole object data reuse.
Well in short the current state looks awful and is prone to bugs. It just
works fine now because try_delta() tests trg_entry->delta before using
trg_entry->delta_size, but that is a bit subtle and I was wondering for a
while why things just worked fine... even if I'm guilty of having
introduced this abomination myself in the first place.
Let's do the sensible thing instead with no ambiguity, which is to have
a separate variable for in_pack_header_size. This might even help future
optimizations.
While at it, let's reorder some struct object_entry members so they all
align well with their own width, regardless of the architecture or the
size of off_t. Some memory saving is to be expected with this alone.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Because we don't have to know the SHA1 h(hence the name) of the pack
up front anymore, let's get rid of yet another global sorted object list
and sort them only in write_index_file(), then compute the object list
SHA1 on the fly.
This has the advantage of saving another chunk of memory, and the sorted
list SHA1 won't be computed needlessly on servers during a fetch.
Of course the cunning plan is also to make write_index_file() much like
the function with the same name in index-pack.c for an eventual easy
sharing.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This capability is practically never useful, and therefore never tested,
because it is fairly unlikely that the requested pack will be already
available. Furthermore it is of little gain over the ability to reuse
existing pack data.
In fact the ability to change delta type on the fly when reusing delta
data is a nice thing that has almost no cost and allows greater backward
compatibility with a client's capabilities than if the client is blindly
sent a whole pack without any discrimination.
And this "feature" is simply in the way of other cleanups.
Let's get rid of it.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Get rid of sort_comparator() as it impose a run time double indirect
function call for little compile time type checking gain.
Also get rid of create_sorted_list() as it only has one user which would
as well be just fine doing its sorting locally. Eventually the list of
deltifiable objects might be shorter than the whole object list.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Objects that have delta "children" from pack data reuse must consider the
depth of their deepest child when they try to deltify themselves for those
children not to become too deep.
However, in the context of a "thin" pack, the delta children depth was
skipped entirely on the presumption that the pack was always going to be
exploded on the receiving end, hence the delta length wasn't an issue.
Now that we keep received packs as is and reuse pack data when repacking,
those packs do contain delta chains that are longer than expected. Worse,
those delta chain may even grow longer when the pack is further repacked
into another thin pack for a subsequent transmission.
So this patch restores strict delta length even for thin packs, and it
moves check_delta_limit() usage directly in the delta loop where it is
needed. This way the delta_limit can be removed from struct object_entry
as well. Oh and the initial value was wrong too.
The progress_interval() function was moved to a more logical location in
the process.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Before finding best delta combinations, we sort objects by name hash,
then by size, then by their position in memory. Then we walk the list
backwards to test delta candidates.
We hope that a bigger size usually means a newer objects. But a bigger
address in memory does not mean a newer object. So the last comparison
must be reversed.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Let's avoid some cycles when there is no base to test against, and avoid
unnecessary object lookups.
Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* js/wrap-log:
Fix permissions on test scripts
Fix t4201: accidental arithmetic expansion
shortlog -w: make wrap-line behaviour optional.
Use print_wrapped_text() in shortlog
Make every test executable. Remove exec-attribute from included shell files,
they can't used standalone anyway.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
instead of embedded subshell. It actually breaks here (dash as /bin/sh):
t4201-shortlog.sh: 27: Syntax error: Missing '))'
FATAL: Unexpected exit with code 2
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This adds "--decorate" as a log option, which prints out the ref names
of any commits that are shown.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This allows you to add an arbitrary "decoration" of your choice to any
object. It's a space- and time-efficient way to add information to
arbitrary objects, especially if most objects probably do not have the
decoration.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
Have sample update hook not refuse deleting a branch through push.
variable $projectdesc needs to be set before checking against unchanged default.
Update git-annotate/git-blame documentation
Update git-apply documentation
Update git-applymbox documentation
Update git-am documentation
user-manual: use detached head when rewriting history
user-manual: start revising "internals" chapter
user-manual: detached HEAD
user-manual: fix discussion of default clone
Documentation: clarify track/no-track option.
Documentation: clarify git-checkout -f, minor editing
Documentation: minor edits of git-lost-found manpage
source ref might be 0000...0000 to delete a branch through git-push,
'git <remote> push :<branch>'. The update hook should not decline this.
Signed-off-by: Gerrit Pape <pape@smarden.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>