A pack_revindex struct has two elements: the revindex
entries themselves, and a pointer to the packed_git. We need
both to do lookups, because only the latter knows things
like the number of objects in the pack.
Now that packed_git contains the pack_revindex struct it's
just as easy to pass around the packed_git itself, and we do
not need the extra back-pointer.
We can instead just store the entries directly in the pack.
All functions which took a pack_revindex now just take a
packed_git. We still lazy-load in find_pack_revindex, so
most callers are unaffected.
The exception is the bitmap code, which computes the
revindex and caches the pointer when we load the bitmaps. We
can continue to load, drop the extra cache pointer, and just
access bitmap_git.pack.revindex directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The main entry point to the pack-revindex code is
find_pack_revindex(). This calls revindex_for_pack(), which
lazily computes and caches the revindex for the pack.
We store the cache in a very simple hash table. It's created
by init_pack_revindex(), which inserts an entry for every
packfile we know about, and we never grow or shrink the
hash. If we ever need the revindex for a pack that isn't in
the hash, we die() with an internal error.
This can lead to a race, because we may load more packs
after having called init_pack_revindex(). For example,
imagine we have one process which needs to look at the
revindex for a variety of objects (e.g., cat-file's
"%(objectsize:disk)" format). Simultaneously, git-gc is
running, which is doing a `git repack -ad`. We might hit a
sequence like:
1. We need the revidx for some packed object. We call
find_pack_revindex() and end up in init_pack_revindex()
to create the hash table for all packs we know about.
2. We look up another object and can't find it, because
the repack has removed the pack it's in. We re-scan the
pack directory and find a new pack containing the
object. It gets added to our packed_git list.
3. We call find_pack_revindex() for the new object, which
hits revindex_for_pack() for our new pack. It can't
find the packed_git in the revindex hash, and dies.
You could also replace the `repack` above with a push or
fetch to create a new pack, though these are less likely
(you would have to somehow learn about the new objects to
look them up).
Prior to 1a6d8b9 (do not discard revindex when re-preparing
packfiles, 2014-01-15), this was safe, as we threw away the
revindex whenever we re-scanned the pack directory (and thus
re-created the revindex hash on the fly). However, we don't
want to simply revert that commit, as it was solving a
different race.
So we have a few options:
- We can fix the race in 1a6d8b9 differently, by having
the bitmap code look in the revindex hash instead of
caching the pointer. But this would introduce a lot of
extra hash lookups for common bitmap operations.
- We could teach the revindex to dynamically add new packs
to the hash table. This would perform the same, but
would mean adding extra code to the revindex hash (which
currently cannot be resized at all).
- We can get rid of the hash table entirely. There is
exactly one revindex per pack, so we can just store it
in the packed_git struct. Since it's initialized lazily,
it does not add to the startup cost.
This is the best of both worlds: less code and fewer
hash table lookups. The original code likely avoided
this in the name of encapsulation. But the packed_git
and reverse_index code are fairly intimate already, so
it's not much of a loss.
This patch implements the final option. It's a minimal
conversion that retains the pack_revindex struct. No callers
need to change, and we can do further cleanup in a follow-on
patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The current code writes a reflog entry whenever we update a
symbolic ref, but we never test that this is so. Let's add a
test to make sure upcoming refactoring doesn't cause a
regression.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If create_symref() fails, git-symbolic-ref will still exit
with code 0, and our caller has no idea that the command did
nothing.
This appears to have been broken since the beginning of time
(e.g., it is not a regression where create_symref() stopped
calling die() or something similar).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When fetching changes from a depot using a full client spec, there
is no need to perform as many queries as there are top-level paths
in the client spec. Instead we query all changes in chronological
order, also getting rid of the need to sort the results and remove
duplicates.
Signed-off-by: Sam Hocevar <sam@hocevar.net>
Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When submitting from a repository that was cloned using a client spec,
use the full list of paths when ruling out files that are outside the
view. This fixes a bug where only files pertaining to the first path
would be included in the p4 submit.
Signed-off-by: Sam Hocevar <sam@hocevar.net>
Signed-off-by: Luke Diamand <luke@diamand.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"format-patch" has learned a new option to zero-out the commit
object name on the mbox "From " line.
* bc/format-patch-null-from-line:
format-patch: check that header line has expected format
format-patch: add an option to suppress commit hash
sha1_file.c: introduce a null_oid constant
When getpwuid() on the system returned NULL (e.g. the user is not
in the /etc/passwd file or other uid-to-name mappings), the
codepath to find who the user is to record it in the reflog barfed
and died. Loosen the check in this codepath, which already accepts
questionable ident string (e.g. host part of the e-mail address is
obviously bogus), and in general when we operate fmt_ident() function
in non-strict mode.
* jk/ident-loosen-getpwuid:
ident: loosen getpwuid error in non-strict mode
ident: keep a flag for bogus default_email
ident: make xgetpwuid_self() a static local helper
The completion script (in contrib/) used to list "git column"
(which is not an end-user facing command) as one of the choices
* sg/completion-no-column:
completion: remove 'git column' from porcelain commands
Add new config to avoid typing "--recurse-submodules" on each push.
* mc/push-recurse-submodules-config:
push: follow the "last one wins" convention for --recurse-submodules
push: test that --recurse-submodules on command line overrides config
push: add recurseSubmodules config option
On Windows, when writing to a pipe fails, errno is always
EINVAL. However, Git expects it to be EPIPE.
According to the documentation, there are two cases in which write()
triggers EINVAL: the buffer is NULL, or the length is odd but the mode
is 16-bit Unicode (the broken pipe is not mentioned as possible cause).
Git never sets the file mode to anything but binary, therefore we know
that errno should actually be EPIPE if it is EINVAL and the buffer is
not NULL.
See https://msdn.microsoft.com/en-us/library/1570wh78.aspx for more
details.
This works around t5571.11 failing with v2.6.4 on Windows.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* git://ozlabs.org/~paulus/gitk:
gitk: sv.po: Update Swedish translation (311t)
gitk: Let .bleft.mid widgets 'breathe'
gitk: Match ttk fonts to gitk fonts
gitk: Update revision date in Japanese PO file
gitk: Update "Language:" header
gitk: Improve translation message
gitk: Remove unused line
gitk: Update year
gitk: Change last translator line
gitk: Update fuzzy messages
gitk: Update Japanese translation
gitk: Fix translation around copyright sign
gitk: Update Japanese translation
gitk: Fix wrong translation
gitk: Translate Japanese catalog
gitk: Translate more to Japanese catalog
gitk: Update Japanese message catalog
gitk: Re-sync line number in Japanese message catalogue
gitk: Color name update
When we unwrap a tag to find its commit for a traversal, we
do not propagate the "name" field of the tag in the pending
array (i.e., the ref name the user gave us in the first
place) to the commit (instead, we use an empty string). This
means that "git log --source" will never show the tag-name
for commits we reach through it.
This was broken in 2073949 (traverse_commit_list: support
pending blobs/trees with paths, 2014-10-15). That commit
tried to be careful and avoid propagating the path
information for a tag (which would be nonsensical) to trees
and blobs. But it should not have cut off the "name" field,
which should carry forward to children.
Note that this does mean that the "name" field will carry
forward to blobs and trees, too. Whereas prior to 2073949,
we always gave them an empty string. This is the right thing
to do, but in practice no callers probably use it (since now
we have an explicit separate "path" field, which was the
point of 2073949).
We add tests here not only for the broken case, but also a
basic sanity test of "log --source" in general, which did
not have any coverage in the test suite.
Reported-by: Raymundo <gypark@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git rebase -i" started with merge strategy options did not
propagate them upon "git rebase --continue".
* fr/rebase-i-continue-preserve-options:
rebase -i: remember merge options beyond continue actions
"git push" takes "--delete" but does not take a short form "-d",
unlike "git branch" which does take both. Bring consistency
between them.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In a later patch we enable parallel processing of submodules, this
only adds the possibility for it. So this change should not change
any user facing behavior.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows to run external commands in parallel with ordered output
on stderr.
If we run external commands in parallel we cannot pipe the output directly
to the our stdout/err as it would mix up. So each process's output will
flow through a pipe, which we buffer. One subprocess can be directly
piped to out stdout/err for a low latency feedback to the user.
Example:
Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a
different amount of time as the different submodules vary in size, then
the output of fetches in sequential order might look like this:
time -->
output: |---A---| |-B-| |-------C-------| |-D-| |-E-|
When we schedule these submodules into maximal two parallel processes,
a schedule and sample output over time may look like this:
process 1: |---A---| |-D-| |-E-|
process 2: |-B-| |-------C-------|
output: |---A---|B|---C-------|DE
So A will be perceived as it would run normally in the single child
version. As B has finished by the time A is done, we can dump its whole
progress buffer on stderr, such that it looks like it finished in no
time. Once that is done, C is determined to be the visible child and
its progress will be reported in real time.
So this way of output is really good for human consumption, as it only
changes the timing, not the actual output.
For machine consumption the output needs to be prepared in the tasks,
by either having a prefix per line or per block to indicate whose tasks
output is displayed, because the output order may not follow the
original sequential ordering:
|----A----| |--B--| |-C-|
will be scheduled to be all parallel:
process 1: |----A----|
process 2: |--B--|
process 3: |-C-|
output: |----A----|CB
This happens because C finished before B did, so it will be queued for
output before B.
To detect when a child has finished executing, we check interleaved
with other actions (such as checking the liveliness of children or
starting new processes) whether the stderr pipe still exists. Once a
child closed its stderr stream, we assume it is terminating very soon,
and use `finish_command()` from the single external process execution
interface to collect the exit status.
By maintaining the strong assumption of stderr being open until the
very end of a child process, we can avoid other hassle such as an
implementation using `waitpid(-1)`, which is not implemented in Windows.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new method removes all common signal handlers that were installed
by sigchain_push.
CC: Jeff King <peff@peff.net>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new call will read from a file descriptor into a strbuf once. The
underlying call xread is just run once. xread only reattempts
reading in case of EINTR, which makes it suitable to use for a
nonblocking read.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The man page of read(2) says:
EAGAIN The file descriptor fd refers to a file other than a socket
and has been marked nonblocking (O_NONBLOCK), and the read
would block.
EAGAIN or EWOULDBLOCK
The file descriptor fd refers to a socket and has been marked
nonblocking (O_NONBLOCK), and the read would block. POSIX.1-2001
allows either error to be returned for this case, and does not
require these constants to have the same value, so a portable
application should check for both possibilities.
If we get an EAGAIN or EWOULDBLOCK the fd must have set O_NONBLOCK.
As the intent of xread is to read as much as possible either until the
fd is EOF or an actual error occurs, we can ease the feeder of the fd
by not spinning the whole time, but rather wait for it politely by not
busy waiting.
We should not care if the call to poll failed, as we're in an infinite
loop and can only get out with the correct read().
Signed-off-by: Stefan Beller <sbeller@google.com>
Acked-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "Pushing submodule <foo>" progress output correctly goes to
stderr, but "Fetching submodule <foo>" is going to stdout by
mistake. Fix it to write to stderr.
Noticed while trying to implement a parallel submodule fetch. When
this particular output line went to a different file descriptor, it
was buffered separately, resulting in wrongly interleaved output if
we copied it to the terminal naively.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git grep" can now be configured (or told from the command line) how
many threads to use when searching in the working tree files.
Signed-off-by: Victor Leschuk <vleschuk@accesssoftek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach the command to show progress output when it takes long time to
produce the first line of output; this option cannot be used with
"--incremental" or "--porcelain" options.
git-annotate inherits the option as well.
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Edmundo Carmona Antoranz <eantoranz@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When show-in-pager option is used, threading is unconditionally
disabled, but this happened much earlier than the code that
determines the use of threading based on the operand (i.e. we do not
thread search in the object database). Consolidate the code to
disable threading to just one place.
Signed-off-by: Victor Leschuk <vleschuk@accesssoftek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Earlier we disabled threading when online_cpus() said "1", but on a
filesystem with long latency (or in a cold cache situation), using
multiple threads to drive I/O in parallel would improve performance
even on a single-core machines.
Signed-off-by: Victor Leschuk <vleschuk@accesssoftek.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format of the "From " header line is very specific to allow
utilities to detect Git-style patches. Add a test that the patches
created are in the expected format.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Oftentimes, patches created by git format-patch will be stored in
version control or compared with diff. In these cases, two otherwise
identical patches can have different commit hashes, leading to diff
noise. Teach git format-patch a --zero-commit option that instead
produces an all-zero hash to avoid this diff noise.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>