This creates a hash index of every single file added to the index.
Right now that hash index isn't actually used for much: I implemented a
"cache_name_exists()" function that uses it to efficiently look up a
filename in the index without having to do the O(logn) binary search,
but quite frankly, that's not why this patch is interesting.
No, the whole and only reason to create the hash of the filenames in the
index is that by modifying the hash function, you can fairly easily do
things like making it always hash equivalent names into the same bucket.
That, in turn, means that suddenly questions like "does this name exist
in the index under an _equivalent_ name?" becomes much much cheaper.
Guiding principles behind this patch:
- it shouldn't be too costly. In fact, my primary goal here was to
actually speed up "git commit" with a fully populated kernel tree, by
being faster at checking whether a file already existed in the index. I
did succeed, but only barely:
Best before:
[torvalds@woody linux]$ time git commit > /dev/null
real 0m0.255s
user 0m0.168s
sys 0m0.088s
Best after:
[torvalds@woody linux]$ time ~/git/git commit > /dev/null
real 0m0.233s
user 0m0.144s
sys 0m0.088s
so some things are actually faster (~8%).
Caveat: that's really the best case. Other things are invariably going
to be slightly slower, since we populate that index cache, and quite
frankly, few things really use it to look things up.
That said, the cost is really quite small. The worst case is probably
doing a "git ls-files", which will do very little except puopulate the
index, and never actually looks anything up in it, just lists it.
Before:
[torvalds@woody linux]$ time git ls-files > /dev/null
real 0m0.016s
user 0m0.016s
sys 0m0.000s
After:
[torvalds@woody linux]$ time ~/git/git ls-files > /dev/null
real 0m0.021s
user 0m0.012s
sys 0m0.008s
and while the thing has really gotten relatively much slower, we're
still talking about something almost unmeasurable (eg 5ms). And that
really should be pretty much the worst case.
So we lose 5ms on one "benchmark", but win 22ms on another. Pick your
poison - this patch has the advantage that it will _likely_ speed up
the cases that are complex and expensive more than it slows down the
cases that are already so fast that nobody cares. But if you look at
relative speedups/slowdowns, it doesn't look so good.
- It should be simple and clean
The code may be a bit subtle (the reasons I do hash removal the way I
do etc), but it re-uses the existing hash.c files, so it really is
fairly small and straightforward apart from a few odd details.
Now, this patch on its own doesn't really do much, but I think it's worth
looking at, if only because if done correctly, the name hashing really can
make an improvement to the whole issue of "do we have a filename that
looks like this in the index already". And at least it gets real testing
by being used even by default (ie there is a real use-case for it even
without any insane filesystems).
NOTE NOTE NOTE! The current hash is a joke. I'm ashamed of it, I'm just
not ashamed of it enough to really care. I took all the numbers out of my
nether regions - I'm sure it's good enough that it works in practice, but
the whole point was that you can make a really much fancier hash that
hashes characters not directly, but by their upper-case value or something
like that, and thus you get a case-insensitive hash, while still keeping
the name and the index itself totally case sensitive.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This moves a common boolean expression into a helper function,
and makes the comparison between filesystem timestamp and index
timestamp done in the function in line with the other places.
st.st_mtime should be casted to (unsigned int) when compared to
an index timestamp ce_mtime.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It is a D/F conflict if you want to add "foo/bar" to the index
when "foo" already exists. Also it is a conflict if you want to
add a file "foo" when "foo/bar" exists.
An exception is when the existing entry is there only to mark "I
used to be here but I am being removed". This is needed for
operations such as "git read-tree -m -u" that update the index
and then reflect the result to the work tree --- we need to
remember what to remove somewhere, and we use the index for
that. In such a case, an existing file "foo" is being removed
and we can create "foo/" directory and hang "bar" underneath it
without any conflict.
We used to use (ce->ce_mode == 0) to mark an entry that is being
removed, but (CE_REMOVE & ce->ce_flags) is used for that purpose
these days. An earlier commit forgot to convert the logic in
the code that checks D/F conflict condition.
The old code knew that "to be removed" entries cannot be at
higher stage and actively checked that condition, but it was an
unnecessary check. This patch removes the extra check as well.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In the multiple message case we remove the word "messages" from the
statistics output of msgfmt as it looks cleaner on the tty when you
are watching the build process. However we failed to strip the word
"message" when only 1 message was found to be untranslated or fuzzy,
as msgfmt does not produce the 's' suffix.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The strings we were showing from po2msg didn't exactly match those
of msgfmt's --statistics output so we didn't show quite the same
results when building git-gui's message files. Now we're closer
to what msgfmt shows (at least for an en_US locale) so the make
output matches.
I noticed that the fuzzy translation count is off by one for the
current po/zh_cn.po file. Not sure why and I'm not going to try
and debug it at this time as the po2msg is strictly a fallback,
users building from source really should prefer msgfmt.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If msgfmt fails with exit code 127 that typically means the program
is not found in the user's PATH and thus cannot be executed by make.
In such a case we can try to fallback to the Tcl based po2msg program
that we distributed with git-gui, as it does a "good enough" job.
We still don't default to po2msg.sh however as it does not perform
a lot of the sanity checks that msgfmt does, and quite a few of
those are too useful to give up.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If the horizontal scrollbar isn't currently visible (because it has
not been needed) but we get an update to the scroll port we may find
the scrollbar window exists but the Tcl command doesn't. Apparently
it is possible for Tk to have partially destroyed the scrollbar by
removing the Tcl procedure name but still leaving the widget name in
the window registry.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This should reduce disk space usage when doing large imports.
We'll be switching to "gc --auto" post-1.5.4 to handle
repacking for us.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When partitioning the work amongst threads, dividing the number of
objects by the number of threads may return 0 when there are less
objects than threads; this will cause the subsequent code to segfault
when accessing list[sub_size-1]. Allow some threads to have
zero objects to work on instead of barfing, while letting others
to have more.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The test 'creating too deep nesting' can fail even when cloning the repos,
but is not its main purpose (it has to prepare nested repos and ensure
the last one is invalid). So split the test into the creation and
invalidity checking parts.
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This fixes the subtile bug in git send-email that was introduced into
git send-email with aa54892f5a (send-email:
detect invocation errors earlier), which caused no patches to be sent
out if the --compose flag was used.
Signed-off-by: Gustaf Hendeby <hendeby@isy.liu.se>
Tested-by: Seth Falcon <seth@userprimary.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As in run_diff_index(), we call unpack_trees() with the oneway_diff()
function in do_diff_cache() now. This makes the function diff_cache()
obsolete.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A plain "git commit" would still run lstat() a lot more than necessary,
because wt_status_print() would cause the index to be repeatedly flushed
and re-read by wt_read_cache(), and that would cause the CE_UPTODATE bit
to be lost, resulting in the files in the index being lstat'ed three
times each.
The reason why wt-status.c ended up invalidating and re-reading the
cache multiple times was that it uses "run_diff_index()", which in turn
uses "read_tree()" to populate the index with *both* the old index and
the tree we want to compare against.
So this patch re-writes run_diff_index() to not use read_tree(), but
instead use "unpack_trees()" to diff the index to a tree. That, in
turn, means that we don't need to modify the index itself, which then
means that we don't need to invalidate it and re-read it!
This, together with the lstat() optimizations, means that "git commit"
on the kernel tree really only needs to lstat() the index entries once.
That noticeably cuts down on the cached timings.
Best time before:
[torvalds@woody linux]$ time git commit > /dev/null
real 0m0.399s
user 0m0.232s
sys 0m0.164s
Best time after:
[torvalds@woody linux]$ time git commit > /dev/null
real 0m0.254s
user 0m0.140s
sys 0m0.112s
so it's a noticeable improvement in addition to being a nice conceptual
cleanup (it's really not that pretty that "run_diff_index()" dirties the
index!)
Doing an "strace -c" on it also shows that as it cuts the number of
lstat() calls by two thirds, it goes from being lstat()-limited to being
limited by getdents() (which is the readdir system call):
Before:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
60.69 0.000704 0 69230 31 lstat
23.62 0.000274 0 5522 getdents
8.36 0.000097 0 5508 2638 open
2.59 0.000030 0 2869 close
2.50 0.000029 0 274 write
1.47 0.000017 0 2844 fstat
After:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.17 0.000276 0 5522 getdents
26.51 0.000162 0 23112 31 lstat
19.80 0.000121 0 5503 2638 open
4.91 0.000030 0 2864 close
1.48 0.000020 0 274 write
1.34 0.000018 0 2844 fstat
...
It passes the test-suite for me, but this is another of one of those
really core functions, and certainly pretty subtle, so..
NOTE! The Linux lstat() system call is really quite cheap when everything
is cached, so the fact that this is quite noticeable on Linux is likely to
mean that it is *much* more noticeable on other operating systems. I bet
you'll see a much bigger performance improvement from this on Windows in
particular.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Aside from the lstat(2) done for work tree files, there are
quite many lstat(2) calls in refname dwimming codepath. This
patch is not about reducing them.
* It adds a new ce_flag, CE_UPTODATE, that is meant to mark the
cache entries that record a regular file blob that is up to
date in the work tree. If somebody later walks the index and
wants to see if the work tree has changes, they do not have
to be checked with lstat(2) again.
* fill_stat_cache_info() marks the cache entry it just added
with CE_UPTODATE. This has the effect of marking the paths
we write out of the index and lstat(2) immediately as "no
need to lstat -- we know it is up-to-date", from quite a lot
fo callers:
- git-apply --index
- git-update-index
- git-checkout-index
- git-add (uses add_file_to_index())
- git-commit (ditto)
- git-mv (ditto)
* refresh_cache_ent() also marks the cache entry that are clean
with CE_UPTODATE.
* write_index is changed not to write CE_UPTODATE out to the
index file, because CE_UPTODATE is meant to be transient only
in core. For the same reason, CE_UPDATE is not written to
prevent an accident from happening.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We currently use lower 12-bit (masked with CE_NAMEMASK) in the
ce_flags field to store the length of the name in cache_entry,
without checking the length parameter given to
create_ce_flags(). This can make us store incorrect length.
Currently we are mostly protected by the fact that many
codepaths first copy the path in a variable of size PATH_MAX,
which typically is 4096 that happens to match the limit, but
that feels like a bug waiting to happen. Besides, that would
not allow us to shorten the width of CE_NAMEMASK to use the bits
for new flags.
This redefines the meaning of the name length stored in the
cache_entry. A name that does not fit is represented by storing
CE_NAMEMASK in the field, and the actual length needs to be
computed by actually counting the bytes in the name[] field.
This way, only the unusually long paths need to suffer.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This converts the index explicitly on read and write to its on-disk
format, allowing the in-core format to contain more flags, and be
simpler.
In particular, the in-core format is now host-endian (as opposed to the
on-disk one that is network endian in order to be able to be shared
across machines) and as a result we can dispense with all the
htonl/ntohl on accesses to the cache_entry fields.
This will make it easier to make use of various temporary flags that do
not exist in the on-disk format.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junio pointed out this part of fast-import wasn't very clear on
initial read, and it took some time for someone who was new to
fast-import's "dirty little tricks" to understand how this was
even working. So a little bit of commentary in the proper place
may help future readers.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We now use the configured pack.compression and pack.depth values
within fast-import, as like builtin-pack-objects fast-import is
generating a packfile for consumption by the Git tools.
We use the same behavior as builtin-pack-objects does for these
options, allowing core.compression to supply the default value
for pack.compression.
The default setting for pack.depth within fast-import is still 10
as users will generally repack fast-import generated packfiles by
`repack -f`. A large delta depth within the fast-import packfile
can significantly slow down such a later repack.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Only "status" accepts "--cached" and the preferred way of
passing sub-command specific options is after the sub-command.
The documentation is adapted to reflect this.
Signed-off-by: Steffen Prohaska <prohaska@zib.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This modifies the existing t7400 test to use 'init' as the
pathname that a submodule is bound to. Without the earlier
subcommand parser fix, this fails.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The subcommand parser of "git submodule" made its subcommand
names reserved words. As a consequence, a command like this:
$ git submodule add init update
which is meant to add a submodule called 'init' at path 'update'
was misinterpreted as a request to invoke more than one mutually
incompatible subcommands and incorrectly rejected.
This patch fixes the issue by stopping the subcommand parsing at
the first subcommand word, to allow the sample command line
above to work as expected.
It also introduces the usual -- option disambiguator, so that a
submodule at path '-foo' can be updated with
$ git submodule update -- -foo
without triggering an "unrecognized option -foo" error.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This renames the shell functions used in git-submodule that
implement top-level subcommands. The rule is that the
subcommand $foo is implemented by cmd_$foo function.
A noteworthy change is that modules_list() is now known as
cmd_status(). There is no "submodule list" command.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* git://repo.or.cz/git-gui:
git-gui: Correct encoding of glossary/fr.po to UTF-8
git-gui: Consolidate hook execution code into a single function
git-gui: Correct window title for hook failure dialogs
git-gui: Honor the standard commit-msg hook
Junio noticed this was incorrectly added in ISO-8859-1 but it should
be in UTF-8 (as the headers claim UTF-8, and our convention is to use
only UTF-8).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The code we use to test if a hook is executable or not differs on
Cygwin from the normal POSIX case. Rather then repeating that for
all three hooks we call in our commit code path we can place the
common logic into a global procedure and invoke it when necessary.
This also lets us get rid of the ugly "|& cat" we were using before
as we can now rely on the Tcl 8.4 feature of "2>@1" or fallback to
the "|& cat" when necessary.
The post-commit hook is now run through the same API, but its outcome
does not influence the commit status. As a result we now show any of
the errors from the post-commit hook in a dialog window, instead of on
the user's tty that was used to launch git-gui. This resolves a long
standing bug related to not getting errors out of the post-commit hook
when launched under git-gui.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
During i18n translation work this message was partially broken
by using "append" instead of "strcat" to join the two different
parts of the message together.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Under core Git the git-commit tool will invoke the commit-msg hook
if it exists and is executable to the user running git-commit. As
a hook it has some limited value as it cannot alter the commit, but
it can modify the message the user is attempting to commit. It is
also able to examine the message to ensure it conforms to some local
standards/conventions.
Since the hook takes the name of a temporary file holding the message
as its only parameter we need to move the code that creates the temp
file up earlier in our commit code path, and then pass through that
file name to the latest stage (where we call git-commit-tree). We let
the hook alter the file as it sees fit and we don't bother to look at
its content again until the commit succeeded and we need the subject
for the reflog update.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
* git://repo.or.cz/git-gui:
git-gui: Makefile - Handle $DESTDIR on Cygwin
git-gui: add french glossary: glossary/fr.po
git-gui: Refresh file status description after hunk application
git-gui: Allow 'Create New Repository' on existing directories
git-gui: Initial french translation
git-gui: Improve German translation.
git-gui: Updated Swedish translation after mailing list review.
git-gui: Fix broken revert confirmation.
git-gui: Update German translation
git-gui: Update glossary: add term "hunk"
The URL to a repository http-push and http-fetch takes should
have a trailing slash. Instead of failing the request, add it
ourselves before attempting such a request.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The first thing http-push does is a PROPFIND to see if the other
end supports locking. The failure message we give is always
reported as "no DAV locking support at the remote repository",
regardless of the reason why we ended up not finding the locking
support on the other end.
This moves the code to report "no DAV locking support" down the
codepath so that the message is issued only when we successfully
get a response to PROPFIND and the other end say it does not
support locking. Other failures, such as connectivity glitches
and credential mismatches, have their own error message issued
and we will not issue "no DAV locking" error (we do not even
know if the remote end supports it).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Failing instead of silently not updating remote refs makes the things
clearer for the user when trying to push on a repository while another
person do (or while a dandling locks are waiting for a 10 minutes
timeout).
When silently not updating remote refs, the user does not even know
that git has pushed the objects but leaved the refs as they were
before (e.g. a new bunch of commits on branch "master" is uploaded,
however the branch by itsel still points on the previous head commit).
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Releasing webdav lock even if push fails because of bad (or no)
reference on command line.
To reproduce the issue that this patch fixes, prepare a test repository
availlable over http+webdav, say at http://myhost/myrepo.git/
Then:
$ git clone http://myhost/myrepo.git/
$ cd myrepo
$ git push http
Fetching remote heads...
refs/
refs/heads/
refs/tags/
No refs in common and none specified; doing nothing.
$ git push http
Fetching remote heads...
refs/
refs/heads/
refs/tags/
No refs in common and none specified; doing nothing.
$
Finally, you look at the web server logs, and will find one LOCK query
and no UNLOCK query, of course the second one will be in 423 return
code instead of 200:
1.2.3.4 - gb [19/Jan/2008:14:24:56 +0100] "LOCK /myrepo.git/info/refs HTTP/1.1" 200 465
(...)
1.2.3.4 - gb [19/Jan/2008:14:25:10 +0100] "LOCK /myrepo.git/info/refs HTTP/1.1" 423 363
With this patch, there would have be two UNLOCKs in addition of the LOCKs
From the user's point of view:
- If you realize that you should have typed e.g. "git push http
master" instead of "git push http", you will have to wait for 10
minutes for the lock to expire by its own.
- Furthermore, if somebody else is dumb enough to type "git push http"
while you need to push "master" branch, then you'll need too to wait
for 10 minutes too.
Signed-off-by: Gr.ANigoire Barbier <gb@gbarbier.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This tightens the parsing of a commit object in a couple of ways.
- The "tree " header must end with a LF (earlier we did not
check this condition).
- Make sure parsing of timestamp on the "committer " header
does not go beyond the buffer, even when (1) the "author "
header does not end with a LF (this means that the commit
object is malformed and lacks the committer information) or
(2) the "committer " header does not have ">" that is the end
of the e-mail address, or (3) the "committer " header does
not end with a LF.
We however still keep the existing behaviour to return a parsed
commit object even when non-structural headers such as committer
and author are malformed, so that tools that need to look at
commits to clean up a history with such broken commits can still
get at the structural data (i.e. the parents chain and the tree
object).
Signed-off-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the "This manual page describes only the most frequently used options."
text with the list of rev-list options in git-log manpage. (The git-diff-tree
options are already included.)
Move these options to a separate file and include it from both
git-rev-list.txt and git-log.txt.
Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make http-push always fail when not compiled with USE_CURL_MULTI, since
otherwise it corrupts the remote repository (and then fails anyway).
Signed-off-by: Grégoire Barbier <gb@gbarbier.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since we are now sanity-checking the contents of patches and
refusing to send ones with long lines, this knob provides a
way for the user to override the new behavior (if, e.g., he
knows his SMTP path will handle it).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We try to catch errors early so that we don't end up sending
half of a broken patch series. Right now the only validation
is checking that line-lengths are under the SMTP-mandated
limit of 998.
The validation parsing is very crude (it just checks each
line length without understanding the mailbox format) but
should work fine for this simple check.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We never even look at the command line arguments until after
we have prompted the user for some information. So running
"git send-email" without arguments would prompt for "from"
and "to" headers, only to then die with "No patch files
specified." Instead, let's try to do as much error checking
as possible before getting user input.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Without this change, each diagnostic could use an errno value
clobbered by the close or unlink in rollback_lock_file.
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the specfile (export-subst) attribute was introduced, it added a
dependency from archive-{tar|zip}.c to builtin-archive.c. This broke the
support for archive-operations in libgit.a since builtin-archive.o doesn't
belong in libgit.a.
This patch moves the functions required by libgit.a from builtin-archive.c
to the new file archive.c (which becomes part of libgit.a).
Signed-off-by: Lars Hjemli <hjemli@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update configure.ac (and config.mak.in) by adding test for unsetenv
(NO_UNSETENV). Add comment about NO_UNSETENV to Makefile header, as
original commit 731043fd adding compat/unsetenv.c didn't do that.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>