This finally turns on the sliding window behavior for packfile data
access by mapping limited size windows and chaining them under the
packed_git->windows list.
We consider a given byte offset to be within the window only if there
would be at least 20 bytes (one hash worth of data) accessible after
the requested offset. This range selection relates to the contract
that use_pack() makes with its callers, allowing them to access
one hash or one object header without needing to call use_pack()
for every byte of data obtained.
In the worst case scenario we will map the same page of data twice
into memory: once at the end of one window and once again at the
start of the next window. This duplicate page mapping will happen
only when an object header or a delta base reference is spanned
over the end of a window and is always limited to just one page of
duplication, as no sane operating system will ever have a page size
smaller than a hash.
I am assuming that the possible wasted page of virtual address
space is going to perform faster than the alternatives, which
would be to copy the object header or ref delta into a temporary
buffer prior to parsing, or to check the window range on every byte
during header parsing. We may decide to revisit this decision in
the future since this is just a gut instinct decision and has not
actually been proven out by experimental testing.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
To support multiple windows per packfile we need to unmap only one
window at a time from that packfile, leaving any other windows in
place and available for reference.
We treat all windows from all packfiles equally; the least recently
used, not-in-use window across all packfiles will always be closed
first.
If we have unmapped all windows in a packfile then we can also close
the packfile's file descriptor as its possible we won't need to map
any window from that file in the near future. This decision about
when to close the pack file descriptor may need to be revisited in
the future after additional testing on several different platforms
can be performed.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When we parse the object header or the delta base reference we
don't bother to loop over use_pack() calls. The reason we don't
need to bother with calling use_pack for each byte accessed is that
use_pack will always promise us at least 20 bytes (really the hash
size) after the offset. This promise from use_pack simplifies a
lot of code in the header parsing logic, as well as helps out the
zlib library by ensuring there's always some data for it to consume
during an inflate call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When multiple mmaps start getting used for all pack file access it
is not possible to get all data associated with a specific object
in one contiguous memory region. This limitation prevents simply
passing a single address and length to SHA1_Update or to inflate.
Instead we need to loop until we have processed all data of interest.
As we loop over the data we are always interested in reusing the same
window 'cursor', as the prior window will no longer be of any use
to us. This allows the use_pack() call to automatically decrement
the use count of the prior window before setting up access for us
to the next window.
Within each loop we need to make use of the available length output
parameter of use_pack() to tell us how many bytes are available in
the current memory region, as we cannot tell otherwise.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Part of the implementation concept of the sliding mmap window for
pack access is to permit multiple windows per pack to be mapped
independently. Since the inuse_cnt is associated with the mmap and
not with the file, this value is in struct pack_window and needs to
be incremented/decremented for each pack_window accessed by any code.
To faciliate that implementation we need to replace all uses of
use_packed_git() and unuse_packed_git() with a different API that
follows struct pack_window objects rather than struct packed_git.
The way this works is when we need to start accessing a pack for
the first time we should setup a new window 'cursor' by declaring
a local and setting it to NULL:
struct pack_windows *w_curs = NULL;
To obtain the memory region which contains a specific section of
the pack file we invoke use_pack(), supplying the address of our
current window cursor:
unsigned int len;
unsigned char *addr = use_pack(p, &w_curs, offset, &len);
the returned address `addr` will be the first byte at `offset`
within the pack file. The optional variable len will also be
updated with the number of bytes remaining following the address.
Multiple calls to use_pack() with the same window cursor will
update the window cursor, moving it from one window to another
when necessary. In this way each window cursor variable maintains
only one struct pack_window inuse at a time.
Finally before exiting the scope which originally declared the window
cursor we must invoke unuse_pack() to unuse the current window (which
may be different from the one that was first obtained from use_pack):
unuse_pack(&w_curs);
This implementation is still not complete with regards to multiple
windows, as only one window per pack file is supported right now.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
To efficiently support mmaping of multiple regions of the same pack
file we want to keep the pack's file descriptor open while we are
actively working with that pack. So we are now keeping that file
descriptor in packed_git.pack_fd and closing it only after we unmap
the last window.
This is going to increase the number of file descriptors that are
in use at once, however that will be bounded by the total number of
pack files present and therefore should not be very high. It is
a small tradeoff which we may need to revisit after some testing
can be done on various repositories and systems.
For code clarity we also want to seperate out the implementation
of how we open a pack file from the implementation which locates
a suitable window (or makes a new one) from the given pack file.
Since this is a rather large delta I'm taking advantage of doing
it now, in a fairly isolated change.
When we open a pack file we need to examine the header and trailer
without having a mmap in place, as we may only need to mmap
the middle section of this particular pack. Consequently the
verification code has been refactored to make use of the new
read_or_die function.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Like write_or_die read_or_die reads the entire length requested
or it kills the current process with a die call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Since the index_size and pack_size members of struct packed_git
are the lengths of those corresponding files we should use the
off_t size of the operating system to store these file lengths,
rather than an unsigned long. This would help in the future should
we ever resurrect Junio's 64 bit index implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The idea behind the sliding mmap window pack reader implementation
is to have multiple mmap regions active against the same pack file,
thereby allowing the process to mmap in only the active/hot sections
of the pack and reduce overall virtual address space usage.
To implement this we need to refactor the mmap related data
(pack_base, pack_use_cnt) out of struct packed_git and move them
into a new struct pack_window.
We are refactoring the code to support a single struct pack_window
per packfile, thereby emulating the prior behavior of mmap'ing the
entire pack file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Rather than hardcoding the maximum number of bytes which can be
mmapped from pack files we should make this value configurable,
allowing the end user to increase or decrease this limit on a
per-repository basis depending on the size of the repository
and the capabilities of their operating system.
In general users should not need to manually tune such a low-level
setting within the core code, but being able to artifically limit
the number of bytes which we can mmap at once from pack files will
make it easier to craft test cases for the new mmap sliding window
implementation.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The unpack_entry_gently function currently has only two callers:
the delta base resolution in sha1_file.c and the main loop of
pack-check.c. Both of these must change to using unpack_entry
directly when we implement sliding window mmap logic, so I'm doing
it earlier to help break down the change set.
This may cause a slight performance decrease for delta base
resolution as well as for pack-check.c's verify_packfile(), as
the pack use counter will be incremented and decremented for every
object that is unpacked.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When '*.ig' is ignored, and you have two files f.ig and d.ig/foo
in the working tree,
$ git add .
correctly ignored f.ig but failed to ignore d.ig/foo. This was
caused by a thinko in an earlier commit 4888c534, when we tried
to allow adding otherwise ignored files.
After reverting that commit, this takes a much simpler approach.
When we have an unmatched pathspec that talks about an existing
pathname, we know it is an ignored path the user tried to add,
so we include it in the set of paths directory walker returned.
This does not let you say "git add -f D" on an ignored directory
D and add everything under D. People can submit a patch to
further allow it if they want to, but I think it is a saner
behaviour to require explicit paths to be spelled out in such a
case.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Not that this reveals anything new, but I did test_tick shell
function in test-lib and found it rather cute and nice.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* jc/utf8:
t3900: test conversion to non UTF-8 as well
Rename t3900 test vector file
UTF-8: introduce i18n.logoutputencoding.
Teach log family --encoding
i18n.logToUTF8: convert commit log message to UTF-8
Move encoding conversion routine out of mailinfo to utf8.c
Conflicts:
commit.c
This changes the default remote.origin.fetch configuration
created by git-clone so that it allows non-fast-forward updates.
When using the separate-remote layout with reflog enabled, it
does not make much sense to refuse to update the remote tracking
branch just because some of them do not fast-forward. git-fetch
issues warnings on non-fast-forwardness, and the user can peek
at what the previous state was using the reflog.
Signed-off-by: Junio C Hamano <junkio@cox.net>
In some environments, certain tests have no way of succeeding
due to platform limitation, such as lack of 'unzip' program, or
filesystem that do not allow arbitrary sequence of non-NUL bytes
as pathnames.
You should be able to say something like
$ cd t
$ GIT_SKIP_TESTS=t9200.8 t9200-git-cvsexport-commit.sh
and even:
$ GIT_SKIP_TESTS='t[0-4]??? t91?? t9200.8' make test
to omit such tests. The value of the environment variable is a
SP separated list of patterns that tells which tests to skip,
and either can match the "t[0-9]{4}" part to skip the whole
test, or t[0-9]{4} followed by ".$number" to say which
particular test to skip.
Note that some tests in the existing test suite rely on previous
test item, so you cannot arbitrarily disable one and expect the
remainder of test to check what the test originally was intended
to check.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Otherwise, sending the diagnostic to stdout would provoke a
protocol failure.
Signed-off-by: Jim Meyering <jim@meyering.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
The function xdl_refine_conflicts() tries to break down huge
conflicts by doing a diff on the conflicting regions. However,
this does not make sense when one side is empty.
Worse, when one side is not only empty, but after EOF, the code
accessed unmapped memory.
Noticed by Luben Tuikov, Shawn Pearce and Alexandre Julliard, the
latter providing a test case.
Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This could be useful in finding new problems and helping users
debug.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We have less code to worry about now. As a bonus, --revision
can be used to reliably skip parts of history whenever fetch is
run, not just the first time. I'm not sure why anybody would
want to skip history in the middle, however...
For people (nearly everyone at the moment) without the
do_switch() function in their Perl SVN library, the entire tree
must be refetched if --follow-parent is used and a parent is
found. Future versions of SVN will have a working do_switch()
function accessible via Perl.
Accessing repositories on the local machine (especially file://
ones) is also slightly slower as a result; but I suspect most
git-svn users will be using it to access remote repositories.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
I don't think anybody running tests needs to know they're
running init-db and creating a repository for testing.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We get an extra measure of error checking here as well.
While we're at it, also removed a less portable use of export.
Signed-off-by: Eric Wong <normalperson@yhbt.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* js/shallow:
fetch-pack: Do not fetch tags for shallow clones.
get_shallow_commits: Avoid memory leak if a commit has been reached already.
git-fetch: Reset shallow_depth before auto-following tags.
upload-pack: Check for NOT_SHALLOW flag before sending a shallow to the client.
fetch-pack: Properly remove the shallow file when it becomes empty.
shallow clone: unparse and reparse an unshallowed commit
Why didn't we mark want_obj as ~UNINTERESTING in the old code?
Why does it mean we do not have to register shallow if we have one?
We should make sure that the protocol is still extensible.
add tests for shallow stuff
Shallow clone: do not ignore shallowness when following tags
allow deepening of a shallow repository
allow cloning a repository "shallowly"
support fetching into a shallow repository
upload-pack: no longer call rev-list
Now that git-merge knows how to use the pull.{twohead,octopus}
configuration options to select the default merge strategy there
is no reason for git-pull to do the same immediately prior to
invoking git-merge.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If git-merge is invoked without a strategy argument it is probably
being run as a porcelain-ish command directly and is not being run
from within git-pull. However we still should honor whatever merge
strategy the user may have selected in their configuration, just as
`git-pull .` would have.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If git-merge exits with a non-zero exit status so should git-pull.
This way the caller of git-pull knows the task did not complete
successfully simply by checking the process exit status.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If a three-way merge in git-rebase generates a conflict then we
should take advantage of git-merge-recursive's ability to include
the branch name of each side of the conflict hunk by setting the
GITHEAD_* environment variables.
In the case of rebase there aren't really two clear branches; we
have the branch we are rebasing onto, and we have the branch we are
currently rebasing. Since most conflicts will be arising between
the user's current branch and the branch they are rebasing onto
we assume the stuff that isn't in the current commit is the "onto"
branch and the stuff in the current commit is the "current" branch.
This assumption may however come up wrong if the user resolves one
conflict in such a way that it conflicts again on a future commit
also being rebased. In this case the user's prior resolution will
appear to be in the "onto" part of the hunk.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
To help correctly log actions caused by porcelain which invoke
git-reset directly we should honor the setting of GIT_REFLOG_ACTION
which we inherited from our caller.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Junio rightly pointed out that the --reflog-action parameter
was starting to get out of control, as most porcelain code
needed to hand it to other porcelain and plumbing alike to
ensure the reflog contained the top-level user action and
not the lower-level actions it invoked.
At Junio's suggestion we are introducing the new set_reflog_action
function to all shell scripts, allowing them to declare early on
what their default reflog name should be, but this setting only
takes effect if the caller has not already set the GIT_REFLOG_ACTION
environment variable.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Following advice from CGI(3pm) man page, precompile all CGI routines
for mod_perl, in the BEGIN block.
If you want to compile without importing use the compile() method
instead:
use CGI();
CGI->compile();
This is particularly useful in a mod_perl environment, in which you
might want to precompile all CGI routines in a startup script, and then
import the functions individually in each mod_perl script.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Add mod_perl version string (the value of $ENV{'MOD_PERL'} if it is
set) to "generator" meta header.
The purpose of this is to identify version of gitweb, now that
codepath may differ for gitweb run as CGI script, run under
mod_perl 1.0 and run under mod_perl 2.0.
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
It appears ISO-2022-JP is more widely accepted than ISO2022JP, so
rename it that way. We probably would need to have a way to skip
this test altogether in locale-challenged environments.
Signed-off-by: Junio C Hamano <junkio@cox.net>
It appears that curl_easy_duphandle() from libcurl 7.16.0
returns a curl session handle which fails GOOD_MULTI_HANDLE()
check in curl_multi_add_handle(). This causes fetch_ref() to
fail because start_active_slot() cannot start the request.
For now, check for 7.16.0 to work this issue around.
Signed-off-by: Junio C Hamano <junkio@cox.net>
* sp/gc:
Use 'repack -a -d -l' instead of 'repack -a -d' in git-gc
everyday: replace a few 'prune' and 'repack' with 'gc'
Create 'git gc' to perform common maintenance operations.
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This should not be necessary for people who only use NTFS, but for
people with FAT32 it seems to be an issue. Let's ship with a safer
default.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Re-enable rev-list --parents for parse_commit which was removed in
(208b2dff95). rev-list --parents is not
just used to return the parent headers in the commit object, it
includes any grafts which are vaild for the commit.
Signed-off-by: Robert Fitzsimons <robfitz@273k.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
If user hits enter at the prompt for
"Who should the emails appear to be from?",
the value for "From:" field was emptied instead of GIT_COMMITER_IDENT.
Signed-off-by: Quy Tonthat <qtonthat@gmail.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is to adjust to:
count-objects -v: show number of packs as well.
which will break a test in this series.
Signed-off-by: Junio C Hamano <junkio@cox.net>