If a pack.packSizeLimit is set, we may split the pack data
across multiple packfiles. This means we cannot generate
.bitmap files, as they require that all of the reachable
objects are in the same pack. We check that condition when
we are generating the list of objects to pack (and disable
bitmaps if we are not packing everything), but we forgot to
update it when we notice that we needed to split (which
doesn't happen until the actual write phase).
The resulting bitmaps are quite bogus (they mention entries
that do not exist in the pack!) and can cause a fetch or
push to send insufficient objects.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are given an expiration time like
--unpack-unreachable=2.weeks.ago, we avoid writing out old,
unreachable loose objects entirely, under the assumption
that running "prune" would simply delete them immediately
anyway. However, this is only valid if we computed the same
set of reachable objects as prune would.
In practice, this is the case, because only git-repack uses
the --unpack-unreachable option with an expiration, and it
always feeds as many objects into the pack as possible. But
we can double-check at runtime just to be sure.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.
By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.
This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:
1. You make a commit on a branch that references blob X.
2. You repack, moving X into the pack.
3. You delete the branch (and its reflog), so that X is
unreferenced.
4. You "git add" blob X so that it is now referenced only
by the index.
5. You repack again with git-gc. The pack-objects we
invoke will see that X is neither referenced nor
recent and not bother loosening it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This saves us from having to bump the rp_av count when we
add new traversal options.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git help everyday" to show the Everyday Git document.
* po/everyday-doc:
doc: add 'everyday' to 'git help'
doc: Makefile regularise OBSOLETE_HTML list building
doc: modernise everyday.txt wording and format in man page style
When you resolve a sha1, you can optionally keep any context
found during the resolution, including the path and mode of
a tree entry (e.g., when looking up "HEAD:subdir/file.c").
The add_object_array_with_context function lets you then
attach that context to an entry in a list. Unfortunately,
the interface for doing so is horrible. The object_context
structure is large and most object_array users do not use
it. Therefore we keep a pointer to the structure to avoid
burdening other users too much. But that means when we do
use it that we must allocate the struct ourselves. And the
struct contains a fixed PATH_MAX-sized buffer, which makes
this wholly unsuitable for any large arrays.
We can observe that there is only a single user of the
"with_context" variant: builtin/grep.c. And in that use
case, the only element we care about is the path. We can
therefore store only the path as a pointer (the context's
mode field was redundant with the object_array_entry itself,
and nobody actually cared about the surrounding tree). This
still requires a strdup of the pathname, but at least we are
only consuming the minimum amount of memory for each string.
We can also handle the copying ourselves in
add_object_array_*, and free it as appropriate in
object_array_release_entry.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A recent commit taught git-prune to keep non-recent objects
that are reachable from recent ones. However, pack-objects,
when loosening unreachable objects, tries to optimize out
the write in the case that the object will be immediately
pruned. It now gets this wrong, since its rule does not
reflect the new prune code (and this can be seen by running
t6501 with a strategically placed repack).
Let's teach pack-objects similar logic.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.
Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime. But the new rules will depend on the exact
object, so we need to perform the check even for old packs.
Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Our current strategy with prune is that an object falls into
one of three categories:
1. Reachable (from ref tips, reflogs, index, etc).
2. Not reachable, but recent (based on the --expire time).
3. Not reachable and not recent.
We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.
However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.
One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have objects it refers to, and so forth
recursively. The problem is that this is very expensive.
Creating a parent link would require traversing the entire
object graph!
Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.
This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent. We can make a simple inductive argument that
with this patch, this property holds (that there are no
objects with missing referents in the repository):
0. When we have no objects, we have nothing to refer or be
referred to, so the property holds.
1. If we add objects to the repository, their direct
referents must generally exist (e.g., if you create a
tree, the blobs it references must exist; if you create
a commit to point at the tree, the tree must exist).
This is already the case before this patch. And it is
not 100% foolproof (you can make bogus objects using
`git hash-object`, for example), but it should be the
case for normal usage.
Therefore for any sequence of object additions, the
property will continue to hold.
2. If we remove objects from the repository, then we will
not remove a child object (like a blob) if an object
that refers to it is being kept. That is the part
implemented by this patch.
Note, however, that our reachability check and the
actual pruning are not atomic. So it _is_ still
possible to violate the property (e.g., an object
becomes referenced just as we are deleting it). This
patch is shooting for eliminating problems where the
mtimes of dependent objects differ by hours or days,
and one is dropped without the other. It does nothing
to help with short races.
Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to look at each object twice. This patch
instead performs the reachability traversal, then follows up
with a second traversal for recent objects, skipping any
that have already been marked.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This drops our line count considerably, and should make
things more readable by keeping the counting logic separate
from the traversal.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The point of xsize_t is to safely cast an off_t into a size_t
(because we are about to mmap). But in count-objects, we are
summing the sizes in an off_t. Using xsize_t means that
count-objects could fail on a 32-bit system with a 4G
object (not likely, as other parts of git would fail, but
we should at least be correct here).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This saves us from manually traversing the directory
structure ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.
Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).
We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While use of the --reference option to borrow objects from an
existing local repository of the same project is an effective way to
reduce traffic when cloning a project over the network, it makes the
resulting "borrowing" repository dependent on the "borrowed"
repository. After running
git clone --reference=P $URL Q
the resulting repository Q will be broken if the borrowed repository
P disappears.
The way to allow the borrowed repository to be removed is to repack
the borrowing repository (i.e. run "git repack -a -d" in Q); while
power users may know it very well, it is not easily discoverable.
Teach a new "--dissociate" option to "git clone" to run this
repacking for the user.
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Until v2.1.0-rc0~22^2~11 (refs.c: add an err argument to
repack_without_refs, 2014-06-20), repack_without_refs forgot to
provide an error message when commit_packed_refs fails. Even today,
it only provides a message for callers that pass a non-NULL err
parameter. Internal callers in refs.c pass non-NULL err but
"git remote" does not.
That means that "git remote rm" and "git remote prune" can fail
without printing a message about why. Fix them by passing in a
non-NULL err parameter and printing the returned message.
This is the last caller to a ref handling function passing err ==
NULL. A later patch can drop support for err == NULL, avoiding such
problems in the future.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Print a warning message for any bad ref names we find in the repo and
skip them so callers don't have to deal with parsing them.
It might be useful in the future to have a flag where we would not
skip these refs for those callers that want to and are prepared (for
example by using a --format argument with %0 as a delimiter after the
ref name).
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We currently do not handle badly named refs well:
$ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
$ git branch
fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
$ git branch -D master.....@\*@\\.
error: branch 'master.....@*@\.' not found.
Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.
So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them. Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.
Details:
In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed. Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").
In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.
Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.
Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially. For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.
In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git branch -d" reads the branch it is about to delete, it used
to avoid passing the RESOLVE_REF_READING ('treat missing ref as
error') flag because a symref pointing to a nonexistent ref would show
up as missing instead of as something that could be deleted. To check
if a ref is actually missing, we then check
- is it a symref?
- if not, did it resolve to null_sha1?
Now we pass RESOLVE_REF_NO_RECURSE and the correct information is
returned for a symref even when it points to a missing ref. Simplify
by relying on RESOLVE_REF_READING.
No functional change intended.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":
$ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
$ git branch -d nonsense
error: branch 'nonsense' not found.
Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:
$ git update-ref -d refs/heads/nonsense
error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links
Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.
Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading). Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.
While at it, swap two of the arguments in the function to put output
arguments at the end. As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.
Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change s_update_ref to use a ref transaction for the ref update.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change the ref transaction API so that we pass the reflog message to the
create/delete/update functions instead of to ref_transaction_commit.
This allows different reflog messages for each ref update in a multi-ref
transaction.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Originally the color-parsing function was used only for
config variables. It made sense to pass the variable name so
that the die() message could be something like:
$ git -c color.branch.plain=bogus branch
fatal: bad color value 'bogus' for variable 'color.branch.plain'
These days we call it in other contexts, and the resulting
error messages are a little confusing:
$ git log --pretty='%C(bogus)'
fatal: bad color value 'bogus' for variable '--pretty format'
$ git config --get-color foo.bar bogus
fatal: bad color value 'bogus' for variable 'command line'
This patch teaches color_parse to complain only about the
value, and then return an error code. Config callers can
then propagate that up to the config parser, which mentions
the variable name. Other callers can provide a custom
message. After this patch these three cases now look like:
$ git -c color.branch.plain=bogus branch
error: invalid color value: bogus
fatal: unable to parse 'color.branch.plain' from command-line config
$ git log --pretty='%C(bogus)'
error: invalid color value: bogus
fatal: unable to parse --pretty format
$ git config --get-color foo.bar bogus
error: invalid color value: bogus
fatal: unable to parse default color value
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many config-parsing helpers, like parse_branch_color_slot,
take the name of a config variable and an offset to the
"slot" name (e.g., "color.branch.plain" is passed along with
"13" to effectively pass "plain"). This is leftover from the
time that these functions would die() on error, and would
want the full variable name for error reporting.
These days they do not use the full variable name at all.
Passing a single pointer to the slot name is more natural,
and lets us more easily adjust the callers to use skip_prefix
to avoid manually writing offset numbers.
This is effectively a continuation of 9e1a5eb, which did the
same for parse_diff_color_slot. This patch covers all of the
remaining similar constructs.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The lockfile API and its users have been cleaned up.
* mh/lockfile: (38 commits)
lockfile.h: extract new header file for the functions in lockfile.c
hold_locked_index(): move from lockfile.c to read-cache.c
hold_lock_file_for_append(): restore errno before returning
get_locked_file_path(): new function
lockfile.c: rename static functions
lockfile: rename LOCK_NODEREF to LOCK_NO_DEREF
commit_lock_file_to(): refactor a helper out of commit_lock_file()
trim_last_path_component(): replace last_path_elm()
resolve_symlink(): take a strbuf parameter
resolve_symlink(): use a strbuf for internal scratch space
lockfile: change lock_file::filename into a strbuf
commit_lock_file(): use a strbuf to manage temporary space
try_merge_strategy(): use a statically-allocated lock_file object
try_merge_strategy(): remove redundant lock_file allocation
struct lock_file: declare some fields volatile
lockfile: avoid transitory invalid states
git_config_set_multivar_in_file(): avoid call to rollback_lock_file()
dump_marks(): remove a redundant call to rollback_lock_file()
api-lockfile: document edge cases
commit_lock_file(): rollback lock file on failure to rename
...
This patch adds the "git interpret-trailers" command.
This command uses the previously added process_trailers()
function in trailer.c.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Under NO_PTHREADS build, we warn when delta_search_threads is not
set to 1, because that is the only sensible value on a single
threaded build.
However, the auto detection that kicks in when that variable is set
to 0 (e.g. there is no configuration variable or command line option,
or an explicit --threads=0 is given from the command line to override
the pack.threads configuration to force auto-detection) was not done
before the condition to issue this warning was tested.
Move the auto-detection code and place it at an appropriate spot.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
type_cas_lock/unlock() should be defined as no-op for NO_PTHREADS
build, just like all the other locking primitives.
Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The argv_array used in unpack() is never freed. Instead of adding
explicit calls to argv_array_clear() use the args member of struct
child_process and let run_command() and friends clean up for us.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "Everyday GIT With 20 Commands Or So" is not accessible via the
Git help system. Move everyday.txt to giteveryday.txt so that "git
help everyday" works, and create a new placeholder file everyday.html
to refer people who follow existing URLs to the updated location.
giteveryday.txt now formats well with AsciiDoc as a man page and
refreshed content to a more command modern style.
Add 'everyday' to the help --guides list and update git(1) and 5
other links to giteveryday.
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow "git push" request to be signed, so that it can be verified and
audited, using the GPG signature of the person who pushed, that the
tips of branches at a public repository really point the commits
the pusher wanted to, without having to "trust" the server.
* jc/push-cert: (24 commits)
receive-pack::hmac_sha1(): copy the entire SHA-1 hash out
signed push: allow stale nonce in stateless mode
signed push: teach smart-HTTP to pass "git push --signed" around
signed push: fortify against replay attacks
signed push: add "pushee" header to push certificate
signed push: remove duplicated protocol info
send-pack: send feature request on push-cert packet
receive-pack: GPG-validate push certificates
push: the beginning of "git push --signed"
pack-protocol doc: typofix for PKT-LINE
gpg-interface: move parse_signature() to where it should be
gpg-interface: move parse_gpg_output() to where it should be
send-pack: clarify that cmds_sent is a boolean
send-pack: refactor inspecting and resetting status and sending commands
send-pack: rename "new_refs" to "need_pack_data"
receive-pack: factor out capability string generation
send-pack: factor out capability string generation
send-pack: always send capabilities
send-pack: refactor decision to send update per ref
send-pack: move REF_STATUS_REJECT_NODELETE logic a bit higher
...
Some MUAs mangled a line in a message that begins with "From " to
">From " when writing to a mailbox file and feeding such an input to
"git am" used to lose such a line.
* jk/mbox-from-line:
mailinfo: work around -Wstring-plus-int warning
mailinfo: make ">From" in-body header check more robust
Continue where ae021d87 (use skip_prefix to avoid magic numbers) left off
and use skip_prefix() in more places for determining the lengths of prefix
strings to avoid using dependent constants and other indirect methods.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The output file hasn't been created at this point, yet, so there is no
need to delete it when exiting early.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).
Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
For now, we still make sure to allocate at least PATH_MAX characters
for the strbuf because resolve_symlink() doesn't know how to expand
the space for its return value. (That will be fixed in a moment.)
Another alternative would be to just use a strbuf as scratch space in
lock_file() but then store a pointer to the naked string in struct
lock_file. But lock_file objects are often reused. By reusing the
same strbuf, we can avoid having to reallocate the string most times
when a lock_file object is reused.
Helped-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even the one lockfile object needn't be allocated each time the
function is called. Instead, define one statically-allocated
lock_file object and reuse it for every call.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
By the time the "if" block is entered, the lock_file instance from the
main function block is no longer in use, so re-use that one instead of
allocating a second one.
Note that the "lock" variable in the "if" block shadowed the "lock"
variable at function scope, so the only change needed is to remove the
inner definition.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Declare the return value to be const to make it clear that we aren't
giving callers permission to write over the string that it points at.
(The return value is the filename field of a struct lock_file, which
can be used by a signal handler at any time and therefore shouldn't be
tampered with.)
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function is used for other things besides the index, so rename it
accordingly.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git fsck" failed to report that it found corrupt objects via its
exit status in some cases.
* jk/fsck-exit-code-fix:
fsck: return non-zero status on missing ref tips
fsck: exit with non-zero status upon error from fsck_obj()
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
When receiving an invalid pack stream that records the same object
twice, multiple threads got confused due to a race.
* jk/index-pack-threading-races:
index-pack: fix race condition with duplicate bases
"git push" over HTTP transport had an artificial limit on number of
refs that can be pushed imposed by the command line length.
* jk/send-pack-many-refspecs:
send-pack: take refspecs over stdin
Some MUAs mangled a line in a message that begins with "From " to
">From " when writing to a mailbox file and feeding such an input
to "git am" used to lose such a line.
* jk/mbox-from-line:
mailinfo: work around -Wstring-plus-int warning
mailinfo: make ">From" in-body header check more robust
"rev-parse --verify --quiet $name" is meant to quietly exit with a
non-zero status when $name is not a valid object name, but still
gave error messages in some cases.
* da/rev-parse-verify-quiet:
stash: prefer --quiet over shell redirection of the standard error stream
refs: make rev-parse --quiet actually quiet
t1503: use test_must_be_empty
Documentation: a note about stdout for git rev-parse --verify --quiet
pre- and post-receive hooks are no longer required to read all
their inputs.
* jc/ignore-sigpipe-while-running-hooks:
receive-pack: allow hooks to ignore its standard input stream
Code cleanup.
* jk/prune-packed-server-info:
repack: call prune_packed_objects() and update_server_info() directly
server-info: clean up after writing info/packs
make update-server-info more robust
prune-packed: fix minor memory leak
"hash-object" learned a new "--literally" option to hash any random
garbage into a loose object, to allow us to create a test data for
mechanisms to catch corrupt objects.
* jc/hash-object:
hash-object: add --literally option
hash-object: pass 'write_object' as a flag
hash-object: reduce file-scope statics
Teach "git fsck" to inspect the contents of annotated tag objects.
* js/fsck-tag-validation:
Make sure that index-pack --strict checks tag objects
Add regression tests for stricter tag fsck'ing
fsck: check tag objects' headers
Make sure fsck_commit_buffer() does not run out of the buffer
fsck_object(): allow passing object data separately from the object itself
Refactor type_from_string() to allow continuing after detecting an error
clang gives the following warning:
builtin/receive-pack.c:327:35: error: sizeof on array function
parameter will return size of 'unsigned char *' instead of 'unsigned
char [20]' [-Werror,-Wsizeof-array-argument]
git_SHA1_Update(&ctx, out, sizeof(out));
^
builtin/receive-pack.c:292:37: note: declared here
static void hmac_sha1(unsigned char out[20],
^
Signed-off-by: Brian Gernhardt <brian@gernhardtsoftware.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The just-released Apple Xcode 6.0.1 has -Wstring-plus-int enabled by
default which complains about pointer arithmetic applied to a string
literal:
builtin/mailinfo.c:303:24: warning:
adding 'long' to a string does not append to the string
return !memcmp(SAMPLE + (cp - line), cp, strlen(SAMPLE) ...
~~~~~~~^~~~~~~~~~~~~
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/apply-ws-prefix:
apply: omit ws check for excluded paths
apply: hoist use_patch() helper for path exclusion up
apply: use the right attribute for paths in non-Git patches
Conflicts:
builtin/apply.c
"git fsck" failed to report that it found corrupt objects via its
exit status in some cases.
* jk/fsck-exit-code-fix:
fsck: return non-zero status on missing ref tips
fsck: exit with non-zero status upon error from fsck_obj()
"git config --add section.var val" used to lose existing
section.var whose value was an empty string.
* ta/config-add-to-empty-or-true-fix:
config: avoid a funny sentinel value "a^"
make config --add behave correctly for empty and NULL values
When receiving an invalid pack stream that records the same object
twice, multiple threads got confused due to a race. We should
reject or correct such a stream upon receiving, but that will be a
larger change.
* jk/index-pack-threading-races:
index-pack: fix race condition with duplicate bases
Code clean-up.
* jk/commit-author-parsing:
determine_author_info(): copy getenv output
determine_author_info(): reuse parsing functions
date: use strbufs in date-formatting functions
record_author_date(): use find_commit_header()
record_author_date(): fix memory leak on malformed commit
commit: provide a function to find a header in a buffer
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable. A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.
* bb/date-iso-strict:
pretty: provide a strict ISO 8601 date format
Sometimes users want to report a bug they experience on their
repository, but they are not at liberty to share the contents of
the repository. "fast-export" was taught an "--anonymize" option
to replace blob contents, names of people and paths and log
messages with bland and simple strings to help them.
* jk/fast-export-anonymize:
docs/fast-export: explain --anonymize more completely
teach fast-export an --anonymize option
The number of refs that can be pushed at once over smart HTTP was
limited by the command line length. The limitation has been lifted
by passing these refs from the standard input of send-pack.
* jk/send-pack-many-refspecs:
send-pack: take refspecs over stdin
When a reflog is deleted, e.g. when "git stash" clears its stashes,
"git rev-parse --verify --quiet" dies:
fatal: Log for refs/stash is empty.
The reason is that the get_sha1() code path does not allow us
to suppress this message.
Pass the flags bitfield through get_sha1_with_context() so that
read_ref_at() can suppress the message.
Use get_sha1_with_context1() instead of get_sha1() in rev-parse
so that the --quiet flag is honored.
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we run `branch --merged`, we use prepare_revision_walk
with the merge-filter marked as UNINTERESTING. Any branch
tips that are marked UNINTERESTING after it returns must be
ancestors of that commit. As we iterate through the list of
refs to show, we check item->commit->object.flags to see
whether it was marked.
This interacts badly with --verbose, which will do a
separate walk to find the ahead/behind information for each
branch. There are two bad things that can happen:
1. The ahead/behind walk may get the wrong results,
because it can see a bogus UNINTERESTING flag leftover
from the merge-filter walk.
2. We may omit some branches if their tips are involved in
the ahead/behind traversal of a branch shown earlier.
The ahead/behind walk carefully cleans up its commit
flags, meaning it may also erase the UNINTERESTING
flag that we expect to check later.
We can solve this by moving the merge-filter state for each
ref into its "struct ref_item" as soon as we finish the
merge-filter walk. That fixes (2). Then we are free to clear
the commit flags we used in the walk, fixing (1).
Note that we actually do away with the matches_merge_filter
helper entirely here, and inline it between the revision
walk and the flag-clearing. This ensures that nobody
accidentally calls it at the wrong time (it is only safe to
check in that instant between the setting and clearing of
the global flag).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When operating with the stateless RPC mode, we will receive a nonce
issued by another instance of us that advertised our capability and
refs some time ago. Update the logic to check received nonce to
detect this case, compute how much time has passed since the nonce
was issued and report the status with a new environment variable
GIT_PUSH_CERT_NONCE_SLOP to the hooks.
GIT_PUSH_CERT_NONCE_STATUS will report "SLOP" in such a case. The
hooks are free to decide how large a slop it is willing to accept.
Strictly speaking, the "nonce" is not really a "nonce" anymore in
the stateless RPC mode, as it will happily take any "nonce" issued
by it (which is protected by HMAC and its secret key) as long as it
is fresh enough. The degree of this security degradation, relative
to the native protocol, is about the same as the "we make sure that
the 'git push' decided to update our refs with new objects based on
the freshest observation of our refs by making sure the values they
claim the original value of the refs they ask us to update exactly
match the current state" security is loosened to accomodate the
stateless RPC mode in the existing code without this series, so
there is no need for those who are already using smart HTTP to push
to their repositories to be alarmed any more than they already are.
In addition, the server operator can set receive.certnonceslop
configuration variable to specify how stale a nonce can be (in
seconds). When this variable is set, and if the nonce received in
the certificate that passes the HMAC check was less than that many
seconds old, hooks are given "OK" in GIT_PUSH_CERT_NONCE_STATUS
(instead of "SLOP") and the received nonce value is given in
GIT_PUSH_CERT_NONCE, which makes it easier for a simple-minded
hook to check if the certificate we received is recent enough.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "--signed" option received by "git push" is first passed to the
transport layer, which the native transport directly uses to notice
that a push certificate needs to be sent. When the transport-helper
is involved, however, the option needs to be told to the helper with
set_helper_option(), and the helper needs to take necessary action.
For the smart-HTTP helper, the "necessary action" involves spawning
the "git send-pack" subprocess with the "--signed" option.
Once the above all gets wired in, the smart-HTTP transport now can
use the push certificate mechanism to authenticate its pushes.
Add a test that is modeled after tests for the native transport in
t5534-push-signed.sh to t5541-http-push-smart.sh. Update the test
Apache configuration to pass GNUPGHOME environment variable through.
As PassEnv would trigger warnings for an environment variable that
is not set, export it from test-lib.sh set to a harmless value when
GnuPG is not being used in the tests.
Note that the added test is deliberately loose and does not check
the nonce in this step. This is because the stateless RPC mode is
inevitably flaky and a nonce that comes back in the actual push
processing is one issued by a different process; if the two
interactions with the server crossed a second boundary, the nonces
will not match and such a check will fail. A later patch in the
series will work around this shortcoming.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to prevent a valid push certificate for pushing into an
repository from getting replayed in a different push operation, send
a nonce string from the receive-pack process and have the signer
include it in the push certificate. The receiving end uses an HMAC
hash of the path to the repository it serves and the current time
stamp, hashed with a secret seed (the secret seed does not have to
be per-repository but can be defined in /etc/gitconfig) to generate
the nonce, in order to ensure that a random third party cannot forge
a nonce that looks like it originated from it.
The original nonce is exported as GIT_PUSH_CERT_NONCE for the hooks
to examine and match against the value on the "nonce" header in the
certificate to notice a replay, but returned "nonce" header in the
push certificate is examined by receive-pack and the result is
exported as GIT_PUSH_CERT_NONCE_STATUS, whose value would be "OK"
if the nonce recorded in the certificate matches what we expect, so
that the hooks can more easily check.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The pre-receive and post-receive hooks were designed to be an
improvement over old style update and post-update hooks, which take
the update information on their command line and are limited by the
command line length limit. The same information is fed from the
standard input to pre/post-receive hooks instead to lift this
limitation. It has been mandatory for these new style hooks to
consume the update information fully from the standard input stream.
Otherwise, they would risk killing the receive-pack process via
SIGPIPE.
If a hook does not want to look at all the information, it is easy
to send its standard input to /dev/null (perhaps a niche use of hook
might need to know only the fact that a push was made, without
having to know what objects have been pushed to update which refs),
and this has already been done by existing hooks that are written
carefully.
However, because there is no good way to consistently fail hooks
that do not consume the input fully (a small push may result in a
short update record that may fit within the pipe buffer, to which
the receive-pack process may manage to write before the hook has a
chance to exit without reading anything, which will not result in a
death-by-SIGPIPE of receive-pack), it can lead to a hard to diagnose
"once in a blue moon" phantom failure.
Lift this "hooks must consume their input fully" mandate. A mandate
that is not enforced strictly is not helping us to catch mistakes in
hooks. If a hook has a good reason to decide the outcome of its
operation without reading the information we feed it, let it do so
as it pleases.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since commit 81c5cf7 (mailinfo: skip bogus UNIX From line inside
body, 2006-05-21), we have treated lines like ">From" in the body as
headers. This makes "git am" work for people who erroneously paste
the whole output from format-patch:
From 12345abcd...fedcba543210 Mon Sep 17 00:00:00 2001
From: them
Subject: [PATCH] whatever
into their email body (assuming that an mbox writer then quotes
"From" as ">From", as otherwise we would actually mailsplit on the
in-body line).
However, this has false positives if somebody actually has a commit
body that starts with "From "; in this case we erroneously remove
the line entirely from the commit message. We can make this check
more robust by making sure the line actually looks like a real mbox
"From" line.
Inspect the line that begins with ">From " a more carefully to only
skip lines that match the expected pattern (note that the datestamp
part of the format-patch output is designed to be kept constant to
help those who write magic(5) entries).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
With the interim protocol, we used to send the update commands even
though we already send a signed copy of the same information when
push certificate is in use. Update the send-pack/receive-pack pair
not to do so.
The notable thing on the receive-pack side is that it makes sure
that there is no command sent over the traditional protocol packet
outside the push certificate. Otherwise a pusher can claim to be
pushing one set of ref updates in the signed certificate while
issuing commands to update unrelated refs, and such an update will
evade later audits.
Finally, start documenting the protocol.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reusing the GPG signature check helpers we already have, verify
the signature in receive-pack and give the results to the hooks
via GIT_PUSH_CERT_{SIGNER,KEY,STATUS} environment variables.
Policy decisions, such as accepting or rejecting a good signature by
a key that is not fully trusted, is left to the hook and kept
outside of the core.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While signed tags and commits assert that the objects thusly signed
came from you, who signed these objects, there is not a good way to
assert that you wanted to have a particular object at the tip of a
particular branch. My signing v2.0.1 tag only means I want to call
the version v2.0.1, and it does not mean I want to push it out to my
'master' branch---it is likely that I only want it in 'maint', so
the signature on the object alone is insufficient.
The only assurance to you that 'maint' points at what I wanted to
place there comes from your trust on the hosting site and my
authentication with it, which cannot easily audited later.
Introduce a mechanism that allows you to sign a "push certificate"
(for the lack of better name) every time you push, asserting that
what object you are pushing to update which ref that used to point
at what other object. Think of it as a cryptographic protection for
ref updates, similar to signed tags/commits but working on an
orthogonal axis.
The basic flow based on this mechanism goes like this:
1. You push out your work with "git push --signed".
2. The sending side learns where the remote refs are as usual,
together with what protocol extension the receiving end
supports. If the receiving end does not advertise the protocol
extension "push-cert", an attempt to "git push --signed" fails.
Otherwise, a text file, that looks like the following, is
prepared in core:
certificate version 0.1
pusher Junio C Hamano <gitster@pobox.com> 1315427886 -0700
7339ca65... 21580ecb... refs/heads/master
3793ac56... 12850bec... refs/heads/next
The file begins with a few header lines, which may grow as we
gain more experience. The 'pusher' header records the name of
the signer (the value of user.signingkey configuration variable,
falling back to GIT_COMMITTER_{NAME|EMAIL}) and the time of the
certificate generation. After the header, a blank line follows,
followed by a copy of the protocol message lines.
Each line shows the old and the new object name at the tip of
the ref this push tries to update, in the way identical to how
the underlying "git push" protocol exchange tells the ref
updates to the receiving end (by recording the "old" object
name, the push certificate also protects against replaying). It
is expected that new command packet types other than the
old-new-refname kind will be included in push certificate in the
same way as would appear in the plain vanilla command packets in
unsigned pushes.
The user then is asked to sign this push certificate using GPG,
formatted in a way similar to how signed tag objects are signed,
and the result is sent to the other side (i.e. receive-pack).
In the protocol exchange, this step comes immediately before the
sender tells what the result of the push should be, which in
turn comes before it sends the pack data.
3. When the receiving end sees a push certificate, the certificate
is written out as a blob. The pre-receive hook can learn about
the certificate by checking GIT_PUSH_CERT environment variable,
which, if present, tells the object name of this blob, and make
the decision to allow or reject this push. Additionally, the
post-receive hook can also look at the certificate, which may be
a good place to log all the received certificates for later
audits.
Because a push certificate carry the same information as the usual
command packets in the protocol exchange, we can omit the latter
when a push certificate is in use and reduce the protocol overhead.
This however is not included in this patch to make it easier to
review (in other words, the series at this step should never be
released without the remainder of the series, as it implements an
interim protocol that will be incompatible with the final one).
As such, the documentation update for the protocol is left out of
this step.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Similar to the previous one for send-pack, make it easier and
cleaner to add to capability advertisement.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Make a helper function to accept a line of a protocol message and
queue an update command out of the code from read_head_info().
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This piece of code reads object names of shallow boundaries, not
old_sha1[], i.e. the current value the ref points at, which is to be
replaced by what is in new_sha1[].
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Ideally, we should have also allowed the first "shallow" to carry
the feature request trailer, but that is water under the bridge
now. This makes the next step to factor out the queuing of commands
easier to review.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An "update" command in the protocol exchange consists of 40-hex old
object name, SP, 40-hex new object name, SP, and a refname, but the
first instance is further followed by a NUL with feature requests.
The command structure, which has a flex-array member that stores the
refname at the end, was allocated based on the whole length of the
update command, without excluding the trailing feature requests.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call the functions behind git prune-packed and git update-server-info
directly instead of using run_command(). This is shorter, easier and
quicker.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We form all of our directories in a strbuf, but never release it.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".
The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.
In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Introduce CONFIG_REGEX_NONE as a more explicit sentinel value to say
"we do not want to replace any existing entry" and use it in the
implementation of "git config --add".
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This allows "hash-object --stdin" to just hash any garbage into a
"loose object" that may not pass the standard object parsing check
or fsck, so that different kind of corrupt objects we may encounter
in the field can be imitated in our test suite. That would in turn
allow us to test features that catch these corrupt objects.
Note that "cat-file" may need to learn "--literally" option to allow
us peek into a truly broken object.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Instead of forcing callers of lower level functions write
(write_object ? HASH_WRITE_OBJECT : 0), prepare the flag to be
passed down in the callchain from the command line parser.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of the knobs that affect helper functions called from
cmd_hash_object() were passed to them as parameters already, and the
only effect of having them as file-scope statics was to make the
reader wonder if the parameters are hiding the file-scope global
values by accident. Adjust their initialisation and make them
function-local variables.
The only exception was no_filters hash_stdin_paths() peeked from the
file-scope global, which was converted to a parameter to the helper
function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Progress output from "git gc --auto" was visible in "git fetch -q".
* nd/fetch-pass-quiet-to-gc-child-process:
fetch: silence git-gc if --quiet is given
fetch: convert argv_gc_auto to struct argv_array
Add a few more places in "commit" and "checkout" that make sure
that the cache-tree is fully populated in the index.
* dt/cache-tree-repair:
cache-tree: do not try to use an invalidated subtree info to build a tree
cache-tree: Write updated cache-tree after commit
cache-tree: subdirectory tests
test-dump-cache-tree: invalid trees are not errors
cache-tree: create/update cache-tree on checkout
The second batch of the transactional ref update series.
* rs/ref-transaction-1: (22 commits)
update-ref --stdin: pass transaction around explicitly
update-ref --stdin: narrow scope of err strbuf
refs.c: make delete_ref use a transaction
refs.c: make prune_ref use a transaction to delete the ref
refs.c: remove lock_ref_sha1
refs.c: remove the update_ref_write function
refs.c: remove the update_ref_lock function
refs.c: make lock_ref_sha1 static
walker.c: use ref transaction for ref updates
fast-import.c: use a ref transaction when dumping tags
receive-pack.c: use a reference transaction for updating the refs
refs.c: change update_ref to use a transaction
branch.c: use ref transaction for all ref updates
fast-import.c: change update_branch to use ref transactions
sequencer.c: use ref transactions for all ref updates
commit.c: use ref transactions for updates
replace.c: use the ref transaction functions for updates
tag.c: use ref transactions when doing updates
refs.c: add transaction.status and track OPEN/CLOSED
refs.c: make ref_transaction_begin take an err argument
...
Code clean-up.
* nd/mv-code-cleaning:
mv: no SP between function name and the first opening parenthese
mv: combine two if(s)
mv: unindent one level for directory move code
mv: move index search code out
mv: remove an "if" that's always true
mv: split submodule move preparation code out
mv: flatten error handling code block
mv: mark strings for translations
Update git_config() users with callback functions for a very narrow
scope with calls to config-set API that lets us query a single
variable.
* ta/config-set-2:
builtin/apply.c: replace `git_config()` with `git_config_get_string_const()`
merge-recursive.c: replace `git_config()` with `git_config_get_int()`
ll-merge.c: refactor `read_merge_config()` to use `git_config_string()`
fast-import.c: replace `git_config()` with `git_config_get_*()` family
branch.c: replace `git_config()` with `git_config_get_string()
alias.c: replace `git_config()` with `git_config_get_string()`
imap-send.c: replace `git_config()` with `git_config_get_*()` family
pager.c: replace `git_config()` with `git_config_get_value()`
builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
rerere.c: replace `git_config()` with `git_config_get_*()` family
fetchpack.c: replace `git_config()` with `git_config_get_*()` family
archive.c: replace `git_config()` with `git_config_get_bool()` family
read-cache.c: replace `git_config()` with `git_config_get_*()` family
http-backend.c: replace `git_config()` with `git_config_get_bool()` family
daemon.c: replace `git_config()` with `git_config_get_bool()` family
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').
For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.
By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.
The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.
[jc: adjusted t1450 and added another test]
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Applying a patch not generated by Git in a subdirectory used to
check the whitespace breakage using the attributes for incorrect
paths. Also whitespace checks were performed even for paths
excluded via "git apply --exclude=<path>" mechanism.
* jc/apply-ws-prefix:
apply: omit ws check for excluded paths
apply: hoist use_patch() helper for path exclusion up
apply: use the right attribute for paths in non-Git patches
This is inside an "else" block of "if (last - first < 1)", so we know
that "last - first >= 1" when we come here. No need to check
"last - first > 0".
While at there, save "argc + last - first" to a variable to shorten
the statements a bit.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are a handful more instances of this in compat/regex/ but they
are borrowed code taht we do not want to touch with a change that
really affects correctness, which this change is not.
Signed-off-by: Arjun Sreedharan <arjun024@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This makes it more obvious at a glance where the output of functions
parsing the --stdin stream goes.
No functional change intended.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Making the strbuf local in each function that needs to print errors
saves the reader from having to think about action at a distance,
such as
* errors piling up and being concatenated with no newline between
them
* errors unhandled in one function, to be later handled in another
* concurrency issues, if this code starts using threads some day
No functional change intended.
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Wrap all the ref updates inside a transaction.
In the new API there is no distinction between failure to lock and
failure to write a ref. Both can be permanent (e.g., a ref
"refs/heads/topic" is blocking creation of the lock file
"refs/heads/topic/1.lock") or transient (e.g., file system full) and
there's no clear difference in how the client should respond, so
replace the two statuses "failed to lock" and "failed to write" with
a single status "failed to update ref". In both cases a more
detailed message is sent by sideband to diagnose the problem.
Example, before:
error: there are still refs under 'refs/heads/topic'
remote: error: failed to lock refs/heads/topic
To foo
! [remote rejected] HEAD -> topic (failed to lock)
After:
error: there are still refs under 'refs/heads/topic'
remote: error: Cannot lock the ref 'refs/heads/topic'.
To foo
! [remote rejected] HEAD -> topic (failed to update ref)
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change commit.c to use ref transactions for all ref updates.
Make sure we pass a NULL pointer to ref_transaction_update if have_old
is false.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update replace.c to use ref transactions for updates.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change tag.c to use ref transactions for all ref updates.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add an err argument to _begin so that on non-fatal failures in future ref
backends we can report a nice error back to the caller.
While _begin can currently never fail for other reasons than OOM, in which
case we die() anyway, we may add other types of backends in the future.
For example, a hypothetical MySQL backend could fail in _begin with
"Can not connect to MySQL server. No route to host".
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Change ref_transaction_delete() to do basic error checking and return
non-zero on error. Update all callers to check the return for
ref_transaction_delete(). There are currently no conditions in _delete that
will return error but there will be in the future. Add an err argument that
will be updated on failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Do basic error checking in ref_transaction_create() and make it return
non-zero on error. Update all callers to check the result of
ref_transaction_create(). There are currently no conditions in _create that
will return error but there will be in the future. Add an err argument that
will be updated on failure.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce the use of fixed sized buffer passed to getcwd() calls
by introducing xgetcwd() helper.
* rs/strbuf-getcwd:
use strbuf_add_absolute_path() to add absolute paths
abspath: convert absolute_path() to strbuf
use xgetcwd() to set $GIT_DIR
use xgetcwd() to get the current directory or die
wrapper: add xgetcwd()
abspath: convert real_path_internal() to strbuf
abspath: use strbuf_getcwd() to remember original working directory
setup: convert setup_git_directory_gently_1 et al. to strbuf
unix-sockets: use strbuf_getcwd()
strbuf: add strbuf_getcwd()
Start "git config --edit --global" from a skeletal per-user
configuration file contents, instead of a total blank, when the
user does not already have any. This immediately reduces the need
for a later "Have you forgotten setting core.user?" and we can add
more to the template as we gain more experience.
* mm/config-edit-global:
commit: advertise config --global --edit on guessed identity
home_config_paths(): let the caller ignore xdg path
config --global --edit: create a template file if needed
merge_trees_recursive() stores a pointer to its parameter df_conflict in
its struct traverse_info, but it is never actually used. Stop doing
that, remove the parameter and inline the function into merge_trees(),
as the latter is now only passing on its parameters.
Remove the parameter df_conflict from unresolved_directory() as well,
now that there is no way to pass it to merge_trees_recursive() through
that function anymore.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we are resolving deltas in an indexed pack, we do it by
first selecting a potential base (either one stored in full
in the pack, or one created by resolving another delta), and
then resolving any deltas that use that base. When we
resolve a particular delta, we flip its "real_type" field
from OBJ_{REF,OFS}_DELTA to whatever the real type is.
We assume that traversing the objects this way will visit
each delta only once. This is correct for most packs; we
visit the delta only when we process its base, and each
object (and thus each base) appears only once. However, if a
base object appears multiple times in the pack, we will try
to resolve any deltas based on it once for each instance.
We can detect this case by noting that a delta we are about
to resolve has already had its real_type field flipped, and
we already do so with an assert(). However, if multiple
threads are in use, we may race with another thread on
comparing and flipping the field. We need to synchronize the
access.
The right mechanism for doing this is a compare-and-swap (we
atomically "claim" the delta for our own and find out
whether our claim was successful). We can implement this
in C by using a pthread mutex to protect the operation. This
is not the fastest way of doing a compare-and-swap; many
processors provide instructions for this, and gcc and other
compilers provide builtins to access them. However, some
experiments showed that lock contention does not cause a
significant slowdown here. Adding c-a-s support for many
compilers would increase the maintenance burden (and we
would still end up including the pthread version as a
fallback).
Note that we only need to touch the OBJ_REF_DELTA codepath
here. An OBJ_OFS_DELTA object points to its base using an
offset, and therefore has only one base, even if another
copy of that base object appears in the pack (we do still
touch it briefly because the setting of real_type is
factored out of resolve_data).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.
The output from "--date=iso" deviates from ISO 8601 in these ways:
- a space instead of the `T` date/time delimiter
- a space between time and time zone
- no colon between hours and minutes of the time zone
Add a strict ISO 8601 date format for displaying committer and
author dates. Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.
See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.
Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When figuring out the author name for a commit, we may end
up either pointing to const storage from getenv("GIT_AUTHOR_*"),
or to newly allocated storage based on an existing commit or
the --author option.
Using const pointers to getenv's return has two problems:
1. It is not guaranteed that the return value from getenv
remains valid across multiple calls.
2. We do not know whether to free the values at the end,
so we just leak them.
We can solve both by duplicating the string returned by
getenv().
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Rather than parsing the header manually to find the "author"
field, and then parsing its sub-parts, let's use
find_commit_header and split_ident_line. This is shorter and
easier to read, and should do a more careful parsing job.
For example, the current parser could find the end-of-email
right-bracket across a newline (for a malformed commit), and
calculate a bogus gigantic length for the date (by using
"eol - rb").
As a bonus, this also plugs a memory leak when we pull the
date field from an existing commit (we still leak the name
and email buffers, which will be fixed in a later commit).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sometimes users want to report a bug they experience on
their repository, but they are not at liberty to share the
contents of the repository. It would be useful if they could
produce a repository that has a similar shape to its history
and tree, but without leaking any information. This
"anonymized" repository could then be shared with developers
(assuming it still replicates the original problem).
This patch implements an "--anonymize" option to
fast-export, which generates a stream that can recreate such
a repository. Producing a single stream makes it easy for
the caller to verify that they are not leaking any useful
information. You can get an overview of what will be shared
by running a command like:
git fast-export --anonymize --all |
perl -pe 's/\d+/X/g' |
sort -u |
less
which will show every unique line we generate, modulo any
numbers (each anonymized token is assigned a number, like
"User 0", and we replace it consistently in the output).
In addition to anonymizing, this produces test cases that
are relatively small (compared to the original repository)
and fast to generate (compared to using filter-branch, or
modifying the output of fast-export yourself). Here are
numbers for git.git:
$ time git fast-export --anonymize --all \
--tag-of-filtered-object=drop >output
real 0m2.883s
user 0m2.828s
sys 0m0.052s
$ gzip output
$ ls -lh output.gz | awk '{print $5}'
2.9M
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Many of the date functions write into fixed-size buffers.
This is a minor pain, as we have to take special
precautions, and frequently end up copying the result into a
strbuf or heap-allocated buffer anyway (for which we
sometimes use strcpy!).
Let's instead teach parse_date, datestamp, etc to write to a
strbuf. The obvious downside is that we might need to
perform a heap allocation where we otherwise would not need
to. However, it turns out that the only two new allocations
required are:
1. In test-date.c, where we don't care about efficiency.
2. In determine_author_info, which is not performance
critical (and where the use of a strbuf will help later
refactoring).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Pushing a large number of refs works over most transports,
because we implement send-pack as an internal function.
However, it can sometimes fail when pushing over http,
because we have to spawn "git send-pack --stateless-rpc" to
do the heavy lifting, and we pass each refspec on the
command line. This can cause us to overflow the OS limits on
the size of the command line for a large push.
We can solve this by giving send-pack a --stdin option and
using it from remote-curl. We already dealt with this on
the fetch-pack side in 078b895 (fetch-pack: new --stdin
option to read refs from stdin, 2012-04-02). The stdin
option (and in particular, its use of packet-lines for
stateless-rpc input) is modeled after that solution.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reachability bitmaps do not work with shallow operations.
Fixes regression in 2.0.
* jk/pack-shallow-always-without-bitmap:
pack-objects: turn off bitmaps when we see --shallow lines
Instead of dying of a segmentation fault if getcwd() returns NULL, use
xgetcwd() to make sure to write a useful error message and then exit
in an orderly fashion.
Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Convert several calls of getcwd() and die() to use xgetcwd() instead.
This way we get rid of fixed-size buffers (which can be too small
depending on the used file system) and gain consistent error messages.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most struct child_process variables are cleared using memset first after
declaration. Provide a macro, CHILD_PROCESS_INIT, that can be used to
initialize them statically instead. That's shorter, doesn't require a
function call and is slightly more readable (especially given that we
already have STRBUF_INIT, ARGV_ARRAY_INIT etc.).
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently if we have a config file like,
[foo]
baz
bar =
and we try something like, "git config --add foo.baz roll", Git will
segfault. Moreover, for "git config --add foo.bar roll", it will
overwrite the original value instead of appending after the existing
empty value.
The problem lies with the regexp used for simulating --add in
`git_config_set_multivar_in_file()`, "^$", which in ideal case should
not match with any string but is true for empty strings. Instead use a
regexp like "a^" which can not be true for any string, empty or not.
For removing the segfault add a check for NULL values in `matches()` in
config.c.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Noticed-by: Matthew Flaschen <mflaschen@wikimedia.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Explicitly state that menu_item functions like clean_cmd don't take
any arguments by using void instead of an empty parameter list.
Found using gcc -Wstrict-prototypes.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_string_const()` instead of `git_config()` to take
advantage of the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This was found by coverity. (Id: 290001)
The variable 'output' is assigned to a value
after all gotos to the corrupt label.
Remove the goto by moving the errorhandling code to the
condition, which detects the error.
Signed-off-by: Stefan Beller <stefanbeller@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reachability bitmaps do not work with shallow operations,
because they cache a view of the object reachability that
represents the true objects. Whereas a shallow repository
(or a shallow operation in a repository) is inherently
cutting off the object graph with a graft.
We explicitly disallow the use of bitmaps in shallow
repositories by checking is_repository_shallow(), and we
should continue to do that. However, we also want to
disallow bitmaps when we are serving a fetch to a shallow
client, since we momentarily take on their grafted view of
the world.
It used to be enough to call is_repository_shallow at the
start of pack-objects. Upload-pack wrote the other side's
shallow state to a temporary file and pointed the whole
pack-objects process at this state with "git --shallow-file",
and from the perspective of pack-objects, we really were
in a shallow repo. But since b790e0f (upload-pack: send
shallow info over stdin to pack-objects, 2014-03-11), we do
it differently: we send --shallow lines to pack-objects over
stdin, and it registers them itself.
This means that our is_repository_shallow check is way too
early (we have not been told about the shallowness yet), and
that it is insufficient (calling is_repository_shallow is
not enough, as the shallow grafts we register do not change
its return value). Instead, we can just turn off bitmaps
explicitly when we see these lines.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Even the documentation tells us:
You should check if it returns any error (non-zero return
code) and if it does not, you can start using get_revision()
to do the iteration.
In preparation for this commit, I grepped all occurrences of
prepare_revision_walk and added error messages, when there were none.
Signed-off-by: Stefan Beller <stefanbeller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Free the refspec.
Found by scan.coverity.com (Id: 1127806)
Signed-off-by: Stefan Beller <stefanbeller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Whitespace breakages are checked while the patch is being parsed.
Disable them at the beginning of parse_chunk(), where each
individual patch is parsed, immediately after we learn the name of
the file the patch applies to and before we start parsing the diff
contained in the patch.
One may naively think that we should be able to not just skip the
whitespace checks but simply fast-forward to the next patch without
doing anything once use_patch() tells us that this patch is not
going to be used. But in reality we cannot really skip much of the
parsing in order to do such a "fast-forward", primarily because
parsing "@@ -k,l +m,n @@" lines and counting the input lines is how
we determine the boundaries of individual patches.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We parse each patchfile and find the name of the path the patch
applies to, and then use that name to consult the attribute system
to find the whitespace rules to be used, and also the target file
(either in the working tree or in the index) to replay the changes
against.
Unlike a Git-generated patch, a non-Git patch is taken to have the
pathnames relative to the current working directory. The names
found in such a patch are modified by prepending the prefix by the
prefix_patches() helper function introduced in 56185f49 (git-apply:
require -p<n> when working in a subdirectory., 2007-02-19).
However, this prefixing is done after the patch is fully parsed and
affects only what target files are patched. Because the attributes
are checked against the names found in the patch during the parsing,
not against the final pathname, the whitespace check that is done
during parsing ends up using attributes for a wrong path for non-Git
patches.
Fix this by doing the prefix much earlier, immediately after the
header part of each patch is parsed and we learn the name of the
path the patch affects.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Feeding the result of a real_path() call to real_path() again doesn't
change that result -- the returned path won't get any more real. Avoid
such a double call in set_git_dir_init() and for the same reason stop
calling real_path() before feeding paths to set_git_work_tree(), as the
latter already takes care of that.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally want to avoid lookup_unknown_object, because it
results in allocating more memory for the object than may be
strictly necessary.
In this case, it is used to check whether we have an
already-parsed object before calling parse_object, to save
us from reading the object from disk. Using lookup_object
would be fine for that purpose, but we can take it a step
further. Since this code was written, parse_object already
learned the "check lookup_object" optimization, so we can
simply call parse_object directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git replace" learned a "--graft" option to rewrite parents of a
commit.
* cc/replace-graft:
replace: add test for --graft with a mergetag
replace: check mergetags when using --graft
replace: add test for --graft with signed commit
replace: remove signature when using --graft
contrib: add convert-grafts-to-replace-refs.sh
Documentation: replace: add --graft option
replace: add test for --graft
replace: add --graft option
replace: cleanup redirection style in tests
When the user has no user-wide configuration file, it's faster to use the
newly introduced config file template than to run two commands to set
user.name and user.email. Advise this to the user.
The old advice is kept if the user already has a configuration file since
the template feature would not trigger in this case.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the user has no ~/.gitconfig file, git config --global --edit used
to launch an editor on an nonexistant file name.
Instead, create a file with a default content before launching the
editor. The template contains only commented-out entries, to save a few
keystrokes for the user. If the values are guessed properly, the user
will only have to uncomment the entries.
Advanced users teaching newbies can create a minimalistic configuration
faster for newbies. Beginners reading a tutorial advising to run "git
config --global --edit" as a first step will be slightly more guided for
their first contact with Git.
Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When parsing "index" lines from a git-diff, we look for a
space followed by the mode. If we don't have a space, then
we set our pointer to the end-of-line. However, we don't
double-check that our end-of-line pointer is valid (e.g., if
we got a truncated diff input), which could lead to some
wrap-around pointer arithmetic.
In most cases this would probably get caught by our "40 <
len" check later in the function, but to be on the safe
side, let's just use strchrnul to treat end-of-string the
same as end-of-line.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A call to "dwim_ref(name, len, flags, &ref)" will allocate a
new string in "ref" to return the exact ref we found. We do
not consistently free it in all code paths, leading to small
leaks. The worst is in get_sha1_basic, which may be called
many times (e.g., by "cat-file --batch"), though it is
relatively unlikely, as it only triggers on a bogus reflog
specification.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We used to do this so could pass a mutable string to
enter_repo. But since 1c64b48 (enter_repo: do not modify
input, 2011-10-04), this is not necessary.
The resulting code is simpler, and it fixes a minor leak.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Apply the "if cloning from a local disk, physically copy repository
using hardlinks, unless otherwise told not to with --no-local"
optimization when url.*.insteadOf mechanism rewrites a "git clone
$URL" that refers to a repository over the network to a clone from
a local disk.
* mb/local-clone-after-applying-insteadof:
use local cloning if insteadOf makes a local URL
* rs/code-cleaning:
remote-testsvn: use internal argv_array of struct child_process in cmd_import()
bundle: use internal argv_array of struct child_process in create_bundle()
fast-import: use hashcmp() for SHA1 hash comparison
transport: simplify fetch_objs_via_rsync() using argv_array
run-command: use internal argv_array of struct child_process in run_hook_ve()
use commit_list_count() to count the members of commit_lists
strbuf: use strbuf_addstr() for adding C strings
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
* kb/perf-trace:
api-trace.txt: add trace API documentation
progress: simplify performance measurement by using getnanotime()
wt-status: simplify performance measurement by using getnanotime()
git: add performance tracing for git's main() function to debug scripts
trace: add trace_performance facility to debug performance issues
trace: add high resolution timer function to debug performance issues
trace: add 'file:line' to all trace output
trace: move code around, in preparation to file:line output
trace: add current timestamp to all trace output
trace: disable additional trace output for unit tests
trace: add infrastructure to augment trace output with additional info
sha1_file: change GIT_TRACE_PACK_ACCESS logging to use trace API
Documentation/git.txt: improve documentation of 'GIT_TRACE*' variables
trace: improve trace performance
trace: remove redundant printf format attribute
trace: consistently name the format parameter
trace: move trace declarations from cache.h to new trace.h
When using --graft, with a mergetag in the original
commit, we should check that the commit pointed to by
the mergetag is still a parent of then new commit we
create, otherwise the mergetag could be misleading.
If the commit pointed to by the mergetag is no more
a parent of the new commit, we could remove the
mergetag, but in this case there is a good chance
that the title or other elements of the commit might
also be misleading. So let's just error out and
suggest to use --edit instead on the commit.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It could be misleading to keep a signature in a
replacement commit, so let's remove it.
Note that there should probably be a way to sign
the replacement commit created when using --graft,
but this can be dealt with in another commit or
patch series.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usage string for this option is:
git replace [-f] --graft <commit> [<parent>...]
First we create a new commit that is the same as <commit>
except that its parents are [<parents>...]
Then we create a replace ref that replace <commit> with
the commit we just created.
With this new option, it should be straightforward to
convert grafts to replace refs.
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* kb/hashmap-updates:
hashmap: add string interning API
hashmap: add simplified hashmap_get_from_hash() API
hashmap: improve struct hashmap member documentation
hashmap: factor out getting a hash code from a SHA1
Early part of the "ref transaction" topic.
* rs/ref-transaction-0:
refs.c: change ref_transaction_update() to do error checking and return status
refs.c: remove the onerr argument to ref_transaction_commit
update-ref: use err argument to get error from ref_transaction_commit
refs.c: make update_ref_write update a strbuf on failure
refs.c: make ref_update_reject_duplicates take a strbuf argument for errors
refs.c: log_ref_write should try to return meaningful errno
refs.c: make resolve_ref_unsafe set errno to something meaningful on error
refs.c: commit_packed_refs to return a meaningful errno on failure
refs.c: make remove_empty_directories always set errno to something sane
refs.c: verify_lock should set errno to something meaningful
refs.c: make sure log_ref_setup returns a meaningful errno
refs.c: add an err argument to repack_without_refs
lockfile.c: make lock_file return a meaningful errno on failurei
lockfile.c: add a new public function unable_to_lock_message
refs.c: add a strbuf argument to ref_transaction_commit for error logging
refs.c: allow passing NULL to ref_transaction_free
refs.c: constify the sha arguments for ref_transaction_create|delete|update
refs.c: ref_transaction_commit should not free the transaction
refs.c: remove ref_transaction_rollback
Use xmemdupz() to allocate the memory, copy the data and make sure to
NUL-terminate the result, all in one step. The resulting code is
shorter, doesn't contain the constants 1 and '\0', and avoids
duplicating function parameters.
For blame, the last copied byte (o->file.ptr[o->file.size]) is always
set to NUL by fake_working_tree_commit() or read_sha1_file(), so no
information is lost by the conversion to using xmemdupz().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use xcalloc() instead of xmalloc() followed by memset() to allocate
and zero out memory because it's shorter and avoids duplicating the
function parameters.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using memset and then manually setting values of the string-list
members is not future proof as the internal representation of
string-list may change any time.
Use `string_list_init()` or STRING_LIST_INIT_* macros instead of
memset.
Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call commit_list_count() instead of open-coding it repeatedly.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Avoid code duplication and let strbuf_addstr() call strlen() for us.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the is_local logic to the place where origin remote has been setup and
check if the remote url can be used to do local cloning.
This saves a lot of space (and time) in some of the mirroring scenarios that
involve insteadOf rewrites.
Signed-off-by: Michael Barabanov <michael.barabanov@windriver.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add support for configuring default sort ordering for git tags. Command
line option will override this configured value, using the exact same
syntax.
Cc: Jeff King <peff@peff.net>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Both refs.c and fsck.c have their own private copies of the is_branch function.
Delete the is_branch function from fsck.c and make the version in refs.c
public.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
fsck: simplify fsck_commit_buffer() by using commit_list_count()
commit: use commit_list_append() instead of duplicating its code
merge: simplify merge_trivial() by using commit_list_append()
use strbuf_addch for adding single characters
use strbuf_addbuf for adding strbufs
* jk/strip-suffix:
prepare_packed_git_one: refactor duplicate-pack check
verify-pack: use strbuf_strip_suffix
strbuf: implement strbuf_strip_suffix
index-pack: use strip_suffix to avoid magic numbers
use strip_suffix instead of ends_with in simple cases
replace has_extension with ends_with
implement ends_with via strip_suffix
add strip_suffix function
sha1_file: replace PATH_MAX buffer with strbuf in prepare_packed_git_one()
Teach "git replace --edit" mode a "--raw" option to allow
editing the bare-metal representation data of objects.
* jk/replace-edit-raw:
replace: add a --raw mode for --edit
Teach "git replace" an "--edit" mode.
* cc/replace-edit:
replace: use argv_array in export_object
avoid double close of descriptors handed to run_command
replace: replace spaces with tabs in indentation
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
"git clone -b brefs/tags/bar" would have mistakenly thought we were
following a single tag, even though it was a name of the branch,
because it incorrectly used strstr().
* jc/fix-clone-single-starting-at-a-tag:
builtin/clone.c: detect a clone starting at a tag correctly
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed. The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
* maint-1.8.5:
annotate: use argv_array
t7300: repair filesystem permissions with test_when_finished
enums: remove trailing ',' after last item in enum
Simplify the code and get rid of some magic constants by using
argv_array to build the argument list for cmd_blame. Be lazy and let
the OS release our allocated memory, as before.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
During the commit process, update the cache-tree. Write this updated
cache-tree so that it's ready for subsequent commands.
Add test code which demonstrates that git commit now writes the cache
tree. Make all tests test the entire cache-tree, not just the root
level.
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Update ref_transaction_update() do some basic error checking and return
non-zero on error. Update all callers to check ref_transaction_update() for
error. There are currently no conditions in _update that will return error but
there will be in the future. Add an err argument that will be updated on
failure. In future patches we will start doing both locking and checking
for name conflicts in _update instead of _commit at which time this function
will start returning errors for these conditions.
Also check for BUGs during update and die(BUG:...) if we are calling
_update with have_old but the old_sha1 pointer is NULL.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Since all callers now use QUIET_ON_ERR we no longer need to provide an onerr
argument any more. Remove the onerr argument from the ref_transaction_commit
signature.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Call ref_transaction_commit with QUIET_ON_ERR and use the strbuf that is
returned to print a log message if/after the transaction fails.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Update repack_without_refs to take an err argument and update it if there
is a failure. Pass the err variable from ref_transaction_commit to this
function so that callers can print a meaningful error message if _commit
fails due to this function.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Add a strbuf argument to _commit so that we can pass an error string back to
the caller. So that we can do error logging from the caller instead of from
_commit.
Longer term plan is to first convert all callers to use onerr==QUIET_ON_ERR
and craft any log messages from the callers themselves and finally remove the
onerr argument completely.
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Acked-by: Michael Haggerty <mhagger@alum.mit.edu>
The trace API currently rechecks the environment variable and reopens the
trace file on every API call. This has the ugly side effect that errors
(e.g. file cannot be opened, or the user specified a relative path) are
also reported on every call. Performance can be improved by about factor
three by remembering the environment state and keeping the file open.
Replace the 'const char *key' parameter in the API with a pointer to a
'struct trace_key' that bundles the environment variable name with
additional, trace-internal state. Change the call sites of these APIs to
use a static 'struct trace_key' instead of a string constant.
In trace.c::get_trace_fd(), save and reuse the file descriptor in 'struct
trace_key'.
Add a 'trace_disable()' API, so that packet_trace() can cleanly disable
tracing when it encounters packed data (instead of using unsetenv()).
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We generally want to avoid lookup_unknown_object, because it
results in allocating more memory for the object than may be
strictly necessary.
In this case, it is used to check whether we have an
already-parsed object before calling parse_object, to save
us from reading the object from disk. Using lookup_object
would be fine for that purpose, but we can take it a step
further. Since this code was written, parse_object already
learned the "check lookup_object" optimization, so we can
simply call parse_object directly.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Build the commit_list of parents by calling commit_list_append() twice
instead of allocating and linking the items by hand. This makes the
code shorter and simpler. Rename the commit_list from parent to parents
(plural) while at it because there are two of them.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add 'verify-commit' to be used in a way similar to 'verify-tag' is
used. Further work on verifying the mergetags might be needed.
* mg/verify-commit:
t7510: test verify-commit
t7510: exit for loop with test result
verify-commit: scriptable commit signature verification
gpg-interface: provide access to the payload
gpg-interface: provide clear helper for struct signature_check
"git clone -b brefs/tags/bar" would have mistakenly thought we were
following a single tag, even though it was a name of the branch,
because it incorrectly used strstr().
* jc/fix-clone-single-starting-at-a-tag:
builtin/clone.c: detect a clone starting at a tag correctly
* jk/repack-pack-keep-objects:
repack: s/write_bitmap/&s/ in code
repack: respect pack.writebitmaps
repack: do not accidentally pack kept objects by default
We can make the parsing of the --sort parameter a bit more
readable by having skip_prefix keep our pointer up to date.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/xstrfmt:
setup_git_env(): introduce git_path_from_env() helper
unique_path: fix unlikely heap overflow
walker_fetch: fix minor memory leak
merge: use argv_array when spawning merge strategy
sequencer: use argv_array_pushf
setup_git_env: use git_pathdup instead of xmalloc + sprintf
use xstrfmt to replace xmalloc + strcpy/strcat
use xstrfmt to replace xmalloc + sprintf
use xstrdup instead of xmalloc + strcpy
use xstrfmt in favor of manual size calculations
strbuf: add xstrfmt helper
* jk/skip-prefix:
http-push: refactor parsing of remote object names
imap-send: use skip_prefix instead of using magic numbers
use skip_prefix to avoid repeated calculations
git: avoid magic number with skip_prefix
fetch-pack: refactor parsing in get_ack
fast-import: refactor parsing of spaces
stat_opt: check extra strlen call
daemon: use skip_prefix to avoid magic numbers
fast-import: use skip_prefix for parsing input
use skip_prefix to avoid repeating strings
use skip_prefix to avoid magic numbers
transport-helper: avoid reading past end-of-string
fast-import: fix read of uninitialized argv memory
apply: use skip_prefix instead of raw addition
refactor skip_prefix to return a boolean
avoid using skip_prefix as a boolean
daemon: mark some strings as const
parse_diff_color_slot: drop ofs parameter
Hashmap entries are typically looked up by just a key. The hashmap_get()
API expects an initialized entry structure instead, to support compound
keys. This flexibility is currently only needed by find_dir_entry() in
name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
(currently five) call sites of hashmap_get() have to set up a near emtpy
entry structure, resulting in duplicate code like this:
struct hashmap_entry keyentry;
hashmap_entry_init(&keyentry, hash(key));
return hashmap_get(map, &keyentry, key);
Add a hashmap_get_from_hash() API that allows hashmap lookups by just
specifying the key and its hash code, i.e.:
return hashmap_get_from_hash(map, hash(key), key);
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.
Add a properly documented API for this.
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When git checkout checks out a branch, create or update the
cache-tree so that subsequent operations are faster.
update_main_cache_tree learned a new flag, WRITE_TREE_REPAIR. When
WRITE_TREE_REPAIR is set, portions of the cache-tree which do not
correspond to existing tree objects are invalidated (and portions which
do are marked as valid). No new tree objects are created.
Signed-off-by: David Turner <dturner@twitter.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
In this code, we try to convert both "foo.idx" and "foo"
into "foo.pack". By stripping the suffix, we can avoid a
confusing use of strbuf_splice, and make it clear that both
cases are adding ".pack" to the end.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We also switch to using strbufs, which lets us avoid the
potentially dangerous combination of a manual malloc
followed by a strcpy.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When stripping a suffix like:
if (ends_with(str, "foo"))
buf = xmemdupz(str, strlen(str) - 3);
we can instead use strip_suffix to avoid the constant 3,
which must match the literal "foo" (we sometimes use
strlen("foo") instead, but that means we are repeating
ourselves). The example above becomes:
if (strip_suffix(str, "foo", &len))
buf = xmemdupz(str, len);
This also saves a strlen(), since we calculate the string
length when detecting the suffix.
Note that in some cases we also switch from xstrndup to
xmemdupz, which saves a further strlen call.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These two are almost the same function, with the exception
that has_extension only matches if there is content before
the suffix. So ends_with(".exe", ".exe") is true, but
has_extension would not be.
This distinction does not matter to any of the callers,
though, and we can just replace uses of has_extension with
ends_with. We prefer the "ends_with" name because it is more
generic, and there is nothing about the function that
requires it to be used for file extensions.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the purposes of "git replace --edit" is to help a
user repair objects which are malformed or corrupted.
Usually we pretty-print trees with "ls-tree", which is much
easier to work with than the raw binary data. However, some
forms of corruption break the tree-walker, in which case our
pretty-printing fails, rendering "--edit" useless for the
user.
This patch introduces a "--raw" option, which lets you edit
the binary data in these instances.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is a little more verbose, but will make it easier to
make parts of our command-line conditional (without
resorting to magic numbers or lots of NULLs to get an
appropriately sized argv array).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When a file descriptor is given to run_command via the
"in", "out", or "err" parameters, run_command takes
ownership. The descriptor will be closed in the parent
process whether the process is spawned successfully or not,
and closing it again is wrong.
In practice this has not caused problems, because we usually
close() right after start_command returns, meaning no other
code has opened a descriptor in the meantime. So we just get
EBADF and ignore it (rather than accidentally closing
somebody else's descriptor!).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Recent updates to "git repack" started to duplicate objects that
are in packfiles marked with .keep flag into the new packfile by
mistake.
* jk/repack-pack-keep-objects:
repack: s/write_bitmap/&s/ in code
repack: respect pack.writebitmaps
repack: do not accidentally pack kept objects by default
"git status" (and "git commit") behaved as if changes in a modified
submodule are not there if submodule.*.ignore configuration is set,
which was misleading. The configuration is only to unclutter diff
output during the course of development, and should not to hide
changes in the "status" output to cause the users forget to commit
them.
* jl/status-added-submodule-is-never-ignored:
commit -m: commit staged submodules regardless of ignore config
status/commit: show staged submodules regardless of ignore config
"git status", even though it is a read-only operation, tries to
update the index with refreshed lstat(2) info to optimize future
accesses to the working tree opportunistically, but this could
race with a "read-write" operation that modify the index while it
is running. Detect such a race and avoid overwriting the index.
* ym/fix-opportunistic-index-update-race:
read-cache.c: verify index file before we opportunistically update it
wrapper.c: add xpread() similar to xread()
"git remote rm" and "git remote prune" can involve removing many
refs at once, which is not a very efficient thing to do when very
many refs exist in the packed-refs file.
* jl/remote-rm-prune:
remote prune: optimize "dangling symref" check/warning
remote: repack packed-refs once when deleting multiple refs
remote rm: delete remote configuration as the last
"git rerere forget" did not work well when merge.conflictstyle
was set to a non-default value.
* fc/rerere-conflict-style:
rerere: fix for merge.conflictstyle
On a case insensitive filesystem, merge-recursive incorrectly
deleted the file that is to be renamed to a name that is the same
except for case differences.
* dt/merge-recursive-case-insensitive:
mv: allow renaming to fix case on case insensitive filesystems
merge-recursive.c: fix case-changing merge bug
"git mailinfo" used to read beyond the end of header string while
parsing an incoming e-mail message to extract the patch.
* rs/mailinfo-header-cmp:
mailinfo: use strcmp() for string comparison
The error reporting from "git index-pack" has been improved to
distinguish missing objects from type errors.
* jk/index-pack-report-missing:
index-pack: distinguish missing objects from type errors
We used to disable threaded "git index-pack" on platforms without
thread-safe pread(); use a different workaround for such
platforms to allow threaded "git index-pack".
* nd/index-pack-one-fd-per-thread:
index-pack: work around thread-unsafe pread()
"git grep -O" to show the lines that hit in the pager did not work
well with case insensitive search. We now spawn "less" with its
"-I" option when it is used as the pager (which is the default).
* sk/spawn-less-case-insensitively-from-grep-O-i:
git grep -O -i: if the pager is 'less', pass the '-I' option
"git gc --auto" was recently changed to run in the background to
give control back early to the end-user sitting in front of the
terminal, but it forgot that housekeeping involving reflogs should
be done without other processes competing for accesses to the refs.
* nd/daemonize-gc:
gc --auto: do not lock refs in the background
"git format-patch" did not enforce the rule that the "--follow"
option from the log/diff family of commands must be used with
exactly one pathspec.
* jk/diff-follow-must-take-one-pathspec:
move "--follow needs one pathspec" rule to diff_setup_done
"git commit --allow-empty-message -C $commit" did not work when the
commit did not have any log message.
* jk/commit-C-pick-empty:
commit: do not complain of empty messages from -C
"git blame" assigned the blame to the copy in the working-tree if
the repository is set to core.autocrlf=input and the file used CRLF
line endings.
* bc/blame-crlf-test:
blame: correctly handle files regardless of autocrlf
"git blame" miscounted number of columns needed to show localized
timestamps, resulting in jaggy left-side-edge of the source code
lines in its output.
* jx/blame-align-relative-time:
blame: dynamic blame_date_width for different locales
blame: fix broken time_buf paddings in relative timestamp
"--ignore-space-change" option of "git apply" ignored the spaces
at the beginning of line too aggressively, which is inconsistent
with the option of the same name "diff" and "git diff" have.
* jc/apply-ignore-whitespace:
apply --ignore-space-change: lines with and without leading whitespaces do not match
Commit signatures can be verified using "git show -s --show-signature"
or the "%G?" pretty format and parsing the output, which is well suited
for user inspection, but not for scripting.
Provide a command "verify-commit" which is analogous to "verify-tag": It
returns 0 for good signatures and non-zero otherwise, has the gpg output
on stderr and (optionally) the commit object on stdout, sans the
signature, just like "verify-tag" does.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The struct has been growing members whose malloced memory needs to be
freed. Do this with one helper function so that no malloced memory shall
be left unfreed.
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
31b808a0 (clone --single: limit the fetch refspec to fetched branch,
2012-09-20) tried to see if the given "branch" to follow is actually
a tag at the remote repository by checking with "refs/tags/" but it
incorrectly used strstr(3); it is actively wrong to treat a "branch"
"refs/heads/refs/tags/foo" and use the logic for the "refs/tags/"
ref hierarchy. What the code really wanted to do is to see if it
starts with "refs/tags/".
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jc/fetch-pull-refmap:
docs: Explain the purpose of fetch's and pull's <refspec> parameter.
fetch: allow explicit --refmap to override configuration
fetch doc: add a section on configured remote-tracking branches
fetch doc: remove "short-cut" section
fetch doc: update refspec format description
fetch doc: on pulling multiple refspecs
fetch doc: remove notes on outdated "mixed layout"
fetch doc: update note on '+' in front of the refspec
fetch doc: move FETCH_HEAD material lower and add an example
fetch doc: update introductory part for clarity
It's a common idiom to match a prefix and then skip past it
with strlen, like:
if (starts_with(foo, "bar"))
foo += strlen("bar");
This avoids magic numbers, but means we have to repeat the
string (and there is no compiler check that we didn't make a
typo in one of the strings).
We can use skip_prefix to handle this case without repeating
ourselves.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A submodule diff generally has content like:
-Subproject commit [0-9a-f]{40}
+Subproject commit [0-9a-f]{40}
When we are using "git apply --index" with a submodule, we
first apply the textual diff, and then parse that result to
figure out the new sha1.
If the diff has bogus input like:
-Subproject commit 1234567890123456789012345678901234567890
+bogus
we will parse the "bogus" portion. Our parser assumes that
the buffer starts with "Subproject commit", and blindly
skips past it using strlen(). This can cause us to read
random memory after the buffer.
This problem was unlikely to have come up in practice (since
it requires a malformed diff), and even when it did, we
likely noticed the problem anyway as the next operation was
to call get_sha1_hex on the random memory.
However, we can easily fix it by using skip_prefix to notice
the parsing error.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:
1. When you want to conditionally skip or keep the string
as-is, you have to introduce a temporary variable.
For example:
tmp = skip_prefix(buf, "foo");
if (tmp)
buf = tmp;
2. It is verbose to check the outcome in a conditional, as
you need extra parentheses to silence compiler
warnings. For example:
if ((cp = skip_prefix(buf, "foo"))
/* do something with cp */
Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).
This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:
if (skip_prefix(arg, "foo ", &arg))
do_foo(arg);
else if (skip_prefix(arg, "bar ", &arg))
do_bar(arg);
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
It's easy to get manual allocation calculations wrong, and
the use of strcpy/strcat raise red flags for people looking
for buffer overflows (though in this case each site was
fine).
It's also shorter to use xstrfmt, and the printf-format
tends to be easier for a reader to see what the final string
will look like.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one line shorter, and makes sure the length in the
malloc and sprintf steps match.
These conversions are very straightforward; we can drop the
malloc entirely, and replace the sprintf with xstrfmt.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is one line shorter, and makes sure the length in the
malloc and copy steps match.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's no point in using:
if (skip_prefix(buf, "foo"))
over
if (starts_with(buf, "foo"))
as the point of skip_prefix is to return a pointer to the
data after the prefix. Using starts_with is more readable,
and will make refactoring skip_prefix easier.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Allow remote-helper/fast-import based transport to rename the refs
while transferring the history.
* fc/remote-helper-refmap:
transport-helper: remove unnecessary strbuf resets
transport-helper: add support to delete branches
fast-export: add support to delete refs
fast-import: add support to delete refs
transport-helper: add support to push symbolic refs
transport-helper: add support for old:new refspec
fast-export: add new --refspec option
fast-export: improve argument parsing
"git gc --auto" was recently changed to run in the background to
give control back early to the end-user sitting in front of the
terminal, but it forgot that housekeeping involving reflogs should
be done without other processes competing for accesses to the refs.
* nd/daemonize-gc:
gc --auto: do not lock refs in the background
"git remote rm" and "git remote prune" can involve removing many
refs at once, which is not a very efficient thing to do when very
many refs exist in the packed-refs file.
* jl/remote-rm-prune:
remote prune: optimize "dangling symref" check/warning
remote: repack packed-refs once when deleting multiple refs
remote rm: delete remote configuration as the last
submodule.*.ignore and diff.ignoresubmodules are used to ignore all
submodule changes in "diff" output, but it can be confusing to
apply these configuration values to status and commit.
This is a backward-incompatible change, but should be so in a good
way (aka bugfix).
* jl/status-added-submodule-is-never-ignored:
commit -m: commit staged submodules regardless of ignore config
status/commit: show staged submodules regardless of ignore config
"git replace" learns a new "--edit" option.
* cc/replace-edit:
Documentation: replace: describe new --edit option
replace: add --edit to usage string
replace: add tests for --edit
replace: die early if replace ref already exists
replace: refactor checking ref validity
replace: make sure --edit results in a different object
replace: add --edit option
replace: factor object resolution out of replace_object
replace: use OPT_CMDMODE to handle modes
replace: refactor command-mode determination
* 'mt/patch-id-stable' (early part):
patch-id-test: test stable and unstable behaviour
patch-id: make it stable against hunk reordering
test doc: test_write_lines does not split its arguments
test: add test_write_lines helper
Changing get_next_line() to return the end pointer instead of NULL in
case no newline character is found treats allows us to treat complete
and incomplete lines the same, simplifying the code. Switching to
counting lines instead of EOLs allows us to start counting at the
first character, instead of having to call get_next_line() first.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code for finding the start of the next line into a helper
function in order to reduce duplication.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.
And as a side effect, we abstract away the final bare uses
of commit->buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.
We return either:
1. The original commit->buffer
2. A newly allocated buffer from read_sha1_file
3. A reencoded buffer (based on either 1 or 2 above).
while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some call sites check commit->buffer to see whether we have
a cached buffer, and if so, do some work with it. In the
long run we may want to switch these code paths to make
their decision on a different boolean flag (because checking
the cache may get a little more expensive in the future).
But for now, we can easily support them by converting the
calls to use get_cached_commit_buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now this is just a one-liner, but abstracting it will
make it easier to change later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This converts two lines into one at each caller. But more
importantly, it abstracts the concept of freeing the buffer,
which will make it easier to change later.
Note that we also need to provide a "detach" mechanism for a
tricky case in index-pack. We are passed a buffer for the
object generated by processing the incoming pack. If we are
not using --strict, we just calculate the sha1 on that
buffer and return, leaving the caller to free it. But if we
are using --strict, we actually attach that buffer to an
object, pass the object to the fsck functions, and then
detach the buffer from the object again (so that the caller
can free it as usual). In this case, we don't want to free
the buffer ourselves, but just make sure it is no longer
associated with the commit.
Note that we are making the assumption here that the
attach/detach process does not impact the buffer at all
(e.g., it is never reallocated or modified). That holds true
now, and we have no plans to change that. However, as we
abstract the commit_buffer code, this dependency becomes
less obvious. So when we detach, let's also make sure that
we get back the same buffer that we gave to the
commit_buffer code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Normally scripts do not have to be aware about split indexes because
all shared indexes are in $GIT_DIR. A simple "mv $tmp_index
$GIT_DIR/somewhere" is enough. Scripts that generate temporary indexes
and move them across repos must be aware about split index and copy
the shared file as well. This option enables that.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you have a large work tree but only make changes in a subset, then
$GIT_DIR/index's size should be stable after a while. If you change
branches that touch something else, $GIT_DIR/index's size may grow
large that it becomes as slow as the unified index. Do --split-index
again occasionally to force all changes back to the shared index and
keep $GIT_DIR/index small.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The large part of this patch just follows CE_ENTRY_CHANGED
marks. replace_index_entry() is updated to update
split_index->base->cache[] as well so base->cache[] does not reference
to a freed entry.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Other fill_stat_cache_info() is on new entries, which should set
CE_ENTRY_ADDED in cache_changed, so we're safe.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
cache entry additions, removals and modifications are separated
out. The rest of changes are still in the catch-all flag
SOMETHING_CHANGED, which would be more specific later.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const. We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.
We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>