More transition from "unsigned char[40]" to "struct object_id".
This needed a few merge fixups, but is mostly disentangled from other
topics.
* bc/object-id:
remote: convert functions to struct object_id
Remove get_object_hash.
Convert struct object to object_id
Add several uses of get_object_hash.
object: introduce get_object_hash macro.
ref_newer: convert to use struct object_id
push_refs_with_export: convert to struct object_id
get_remote_heads: convert to struct object_id
parse_fetch: convert to use struct object_id
add_sought_entry_mem: convert to struct object_id
Convert struct ref to use object_id.
sha1_file: introduce has_object_file helper.
Commit 1b0d400 refactored the prepare_final() function so
that it could be reused in multiple places. Originally, the
loop had two outputs: a commit to stuff into sb->final, and
the name of the commit from the rev->pending array.
After the refactor, that loop is put in its own function
with a single return value: the object_array_entry from the
rev->pending array. This contains both the name and the object,
but with one important difference: the object is the
_original_ object found by the revision parser, not the
dereferenced commit. If one feeds a tag to "git blame", we
end up casting the tag object to a "struct commit", which
causes a segfault.
Instead, let's return the commit (properly casted) directly
from the function, and take the "name" as an optional
out-parameter. This does the right thing, and actually
simplifies the callers, who no longer need to cast or
dereference the object_array_entry themselves.
[test case by Max Kirillov <max@max630.net>]
Signed-off-by: Jeff King <peff@peff.net>
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object. This provides no
functional change, as it is essentially a macro substitution.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
struct object is one of the major data structures dealing with object
IDs. Convert it to use struct object_id instead of an unsigned char
array. Convert get_object_hash to refer to the new member as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash. Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
The error message from "git blame --contents --reverse" incorrectly
talked about "--contents --children".
* mk/blame-error-message:
blame: fix option name in error message
"git blame" learnt to take "--first-parent" and "--reverse" at the
same time when it makes sense.
* mk/blame-first-parent:
blame: allow blame --reverse --first-parent when it makes sense
blame: extract find_single_final
blame: test to describe use of blame --reverse --first-parent
Allow combining --reverse and --first-parent if initial commit of
specified range is at the first-parent chain starting from the final
commit. Disable the prepare_revision_walk()'s builtin children
collection, instead picking only the ones which are along the first
parent chain.
Signed-off-by: Max Kirillov <max@max630.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The error message from "git blame --contents --reverse" incorrectly
talked about "--contents --children".
* mk/blame-error-message:
blame: fix option name in error message
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.
Macintosh-specific breakage was noticed and corrected in this
reroll.
* jk/war-on-sprintf: (70 commits)
name-rev: use strip_suffix to avoid magic numbers
use strbuf_complete to conditionally append slash
fsck: use for_each_loose_file_in_objdir
Makefile: drop D_INO_IN_DIRENT build knob
fsck: drop inode-sorting code
convert strncpy to memcpy
notes: document length of fanout path with a constant
color: add color_set helper for copying raw colors
prefer memcpy to strcpy
help: clean up kfmclient munging
receive-pack: simplify keep_arg computation
avoid sprintf and strcpy with flex arrays
use alloc_ref rather than hand-allocating "struct ref"
color: add overflow checks for parsing colors
drop strcpy in favor of raw sha1_to_hex
use sha1_to_hex_r() instead of strcpy
daemon: use cld->env_array when re-spawning
stat_tracking_info: convert to argv_array
http-push: use an argv_array for setup_revisions
fetch-pack: use argv_array for index-pack / unpack-objects
...
"git blame --first-parent v1.0..v2.0" was not rejected but did not
limit the blame to commits on the first parent chain.
* jk/blame-first-parent:
blame: handle --first-parent
"git blame --first-parent v1.0..v2.0" was not rejected but did not
limit the blame to commits on the first parent chain.
* jk/blame-first-parent:
blame: handle --first-parent
"git log --date=local" used to only show the normal (default)
format in the local timezone. The command learned to take 'local'
as an instruction to use the local timezone with other formats,
e.g. "git show --date=rfc-local".
* jk/date-local:
t6300: add tests for "-local" date formats
t6300: make UTC and local dates different
date: make "local" orthogonal to date format
date: check for "local" before anything else
t6300: add test for "raw" date format
t6300: introduce test_date() helper
fast-import: switch crash-report date to iso8601
Documentation/rev-list: don't list date formats
Documentation/git-for-each-ref: don't list date formats
Documentation/config: don't list date formats
Documentation/blame-options: don't list date formats
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.
But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.
It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:
- the call signature is a little bit unwieldy:
d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);
You need offsetof here instead of just writing to the
end of the base size, because we don't know how the
struct is packed (partially this is because FLEX_ARRAY
might not be zero, though we can account for that; but
the size of the struct may actually be rounded up for
alignment, and we can't know that).
- some sites do clever things, like over-allocating because
they know they will write larger things into the buffer
later (e.g., struct packed_git here).
So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Before sha1_to_hex_r() existed, a simple way to get hex
sha1 into a buffer was with:
strcpy(buf, sha1_to_hex(sha1));
This isn't wrong (assuming the buf is 41 characters), but it
makes auditing the code base for bad strcpy() calls harder,
as these become false positives.
Let's convert them to sha1_to_hex_r(), and likewise for
some calls to find_unique_abbrev(). While we're here, we'll
double-check that all of the buffers are correctly sized,
and use the more obvious GIT_SHA1_HEXSZ constant.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When we call into xdiff to perform a diff, we generally lose
the return code completely. Typically by ignoring the return
of our xdi_diff wrapper, but sometimes we even propagate
that return value up and then ignore it later. This can
lead to us silently producing incorrect diffs (e.g., "git
log" might produce no output at all, not even a diff header,
for a content-level diff).
In practice this does not happen very often, because the
typical reason for xdiff to report failure is that it
malloc() failed (it uses straight malloc, and not our
xmalloc wrapper). But it could also happen when xdiff
triggers one our callbacks, which returns an error (e.g.,
outf() in builtin/rerere.c tries to report a write failure
in this way). And the next patch also plans to add more
failure modes.
Let's notice an error return from xdiff and react
appropriately. In most of the diff.c code, we can simply
die(), which matches the surrounding code (e.g., that is
what we do if we fail to load a file for diffing in the
first place). This is not that elegant, but we are probably
better off dying to let the user know there was a problem,
rather than simply generating bogus output.
We could also just die() directly in xdi_diff, but the
callers typically have a bit more context, and can provide a
better message (and if we do later decide to pass errors up,
we're one step closer to doing so).
There is one interesting case, which is in diff_grep(). Here
if we cannot generate the diff, there is nothing to match,
and we silently return "no hits". This is actually what the
existing code does already, but we make it a little more
explicit.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The revision.c options-parser will parse "--first-parent"
for us, but the blame code does not actually respect it, as
we simply iterate over the whole list returned by
first_scapegoat(). We can fix this by returning a
truncated parent list.
Note that we could technically also do so by limiting the
return value of num_scapegoats(), but that is less robust.
We would rely on nobody ever looking at the "next" pointer
from the returned list.
Combining "--reverse" with "--first-parent" is more
complicated, and will probably involve cooperation from
revision.c. Since the desired semantics are not even clear,
let's punt on this for now, but explicitly disallow it to
avoid confusing users (this is not really a regression,
since it did something nonsensical before).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most of our "--date" modes are about the format of the date:
which items we show and in what order. But "--date=local" is
a bit of an oddball. It means "show the date in the normal
format, but using the local timezone". The timezone we use
is orthogonal to the actual format, and there is no reason
we could not have "localized iso8601", etc.
This patch adds a "local" boolean field to "struct
date_mode", and drops the DATE_LOCAL element from the
date_mode_type enum (it's now just DATE_NORMAL plus
local=1). The new feature is accessible to users by adding
"-local" to any date mode (e.g., "iso-local"), and we retain
"local" as an alias for "default-local" for backwards
compatibility.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: John Keeping <john@keeping.me.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
That "someday" in the comment happened two years later in
b65982b (Optimize "diff-index --cached" using cache-tree - 2009-05-20)
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
One of the most common uses of git_path() is to pass a
constant, like git_path("MERGE_MSG"). This has two
drawbacks:
1. The return value is a static buffer, and the lifetime
is dependent on other calls to git_path, etc.
2. There's no compile-time checking of the pathname. This
is OK for a one-off (after all, we have to spell it
correctly at least once), but many of these constant
strings appear throughout the code.
This patch introduces a series of functions to "memoize"
these strings, which are essentially globals for the
lifetime of the program. We compute the value once, take
ownership of the buffer, and return the cached value for
subsequent calls. cache.h provides a helper macro for
defining these functions as one-liners, and defines a few
common ones for global use.
Using a macro is a little bit gross, but it does nicely
document the purpose of the functions. If we need to touch
them all later (e.g., because we learned how to change the
git_dir variable at runtime, and need to invalidate all of
the stored values), it will be much easier to have the
complete list.
Note that the shared-global functions have separate, manual
declarations. We could do something clever with the macros
(e.g., expand it to a declaration in some places, and a
declaration _and_ a definition in path.c). But there aren't
that many, and it's probably better to stay away from
too-magical macros.
Likewise, if we abandon the C preprocessor in favor of
generating these with a script, we could get much fancier.
E.g., normalizing "FOO/BAR-BAZ" into "git_path_foo_bar_baz".
But the small amount of saved typing is probably not worth
the resulting confusion to readers who want to grep for the
function's definition.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach "git log" and friends a new "--date=format:..." option to
format timestamps using system's strftime(3).
* jk/date-mode-format:
strbuf: make strbuf_addftime more robust
introduce "format" date-mode
convert "enum date_mode" into a struct
show-branch: use DATE_RELATIVE instead of magic number
Clean up refs API and make "git clone" less intimate with the
implementation detail.
* mh/init-delete-refs-api:
delete_ref(): use the usual convention for old_sha1
cmd_update_ref(): make logic more straightforward
update_ref(): don't read old reference value before delete
check_branch_commit(): make first parameter const
refs.h: add some parameter names to function declarations
refs: move the remaining ref module declarations to refs.h
initial_ref_transaction_commit(): check for ref D/F conflicts
initial_ref_transaction_commit(): check for duplicate refs
refs: remove some functions from the module's public interface
initial_ref_transaction_commit(): function for initial ref creation
repack_without_refs(): make function private
prune_refs(): use delete_refs()
prune_remote(): use delete_refs()
delete_refs(): bail early if the packed-refs file cannot be rewritten
delete_refs(): make error message more generic
delete_refs(): new function for the refs API
delete_ref(): handle special case more explicitly
remove_branches(): remove temporary
delete_ref(): move declaration to refs.h
This feeds the format directly to strftime. Besides being a
little more flexible, the main advantage is that your system
strftime may know more about your locale's preferred format
(e.g., how to spell the days of the week).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In preparation for adding date modes that may carry extra
information beyond the mode itself, this patch converts the
date_mode enum into a struct.
Most of the conversion is fairly straightforward; we pass
the struct as a pointer and dereference the type field where
necessary. Locations that declare a date_mode can use a "{}"
constructor. However, the tricky case is where we use the
enum labels as constants, like:
show_date(t, tz, DATE_NORMAL);
Ideally we could say:
show_date(t, tz, &{ DATE_NORMAL });
but of course C does not allow that. Likewise, we cannot
cast the constant to a struct, because we need to pass an
actual address. Our options are basically:
1. Manually add a "struct date_mode d = { DATE_NORMAL }"
definition to each caller, and pass "&d". This makes
the callers uglier, because they sometimes do not even
have their own scope (e.g., they are inside a switch
statement).
2. Provide a pre-made global "date_normal" struct that can
be passed by address. We'd also need "date_rfc2822",
"date_iso8601", and so forth. But at least the ugliness
is defined in one place.
3. Provide a wrapper that generates the correct struct on
the fly. The big downside is that we end up pointing to
a single global, which makes our wrapper non-reentrant.
But show_date is already not reentrant, so it does not
matter.
This patch implements 3, along with a minor macro to keep
the size of the callers sane.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some functions from the refs module were still declared in cache.h.
Move them to refs.h.
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/janitorial:
dir: remove unused variable sb
clean: remove unused variable buf
use file_exists() to check if a file exists in the worktree
Some time ago, "git blame" (incorrectly) lost the convert_to_git()
call when synthesizing a fake "tip" commit that represents the
state in the working tree, which broke folks who record the history
with LF line ending to make their project portabile across
platforms while terminating lines in their working tree files with
CRLF for their platform.
* tb/blame-resurrect-convert-to-git:
blame: CRLF in the working tree and LF in the repo
Complement existing --show-email option with fallback
configuration variable, with tests.
Signed-off-by: Quentin Neill <quentin.neill@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* rs/janitorial:
dir: remove unused variable sb
clean: remove unused variable buf
use file_exists() to check if a file exists in the worktree
Call file_exists() instead of open-coding it. That's shorter, simpler
and the intent becomes clearer.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Some time ago, "git blame" (incorrectly) lost the convert_to_git()
call when synthesizing a fake "tip" commit that represents the
state in the working tree, which broke folks who record the history
with LF line ending to make their project portabile across
platforms while terminating lines in their working tree files with
CRLF for their platform.
* tb/blame-resurrect-convert-to-git:
blame: CRLF in the working tree and LF in the repo
Earlier, 9c9b4f2f (standardize usage info string format, 2015-01-13)
tried to make usage-string in line with the documentation by
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
but it missed a few places.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
A typical setup under Windows is to set core.eol to CRLF, and text
files are marked as "text" in .gitattributes, or core.autocrlf is
set to true.
After 4d4813a5 "git blame" no longer works as expected for such a
set-up. Every line is annotated as "Not Committed Yet", even though
the working directory is clean. This is because the commit removed
the conversion in blame.c for all files, with or without CRLF in the
repo.
Having files with CRLF in the repo and core.autocrlf=input is a
temporary situation, and the files, if committed as is, will be
normalized in the repo, which _will_ be a notable change. Blaming
them with "Not Committed Yet" is the right result. Revert commit
4d4813a5 which was a misguided attempt to "solve" a non-problem.
Add two test cases in t8003 to verify the correct CRLF conversion.
Suggested-By: Stepan Kasal <kasal@ucw.cz>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
"git blame HEAD -- missing" failed to correctly say "HEAD" when it
tried to say "No such path 'missing' in HEAD".
* jk/blame-commit-label:
blame.c: fix garbled error message
use xstrdup_or_null to replace ternary conditionals
builtin/commit.c: use xstrdup_or_null instead of envdup
builtin/apply.c: use xstrdup_or_null instead of null_strdup
git-compat-util: add xstrdup_or_null helper
Since ea02ffa3 (mailmap: simplify map_user() interface, 2013-01-05),
find_alignment() has been invoking commit_info_destroy() on an
uninitialized auto 'struct commit_info' (when METAINFO_SHOWN is not
set). commit_info_destroy() calls strbuf_release() for each
'commit_info' strbuf member, which randomly invokes free() on
whatever random stack value happens to reside in strbuf.buf, thus
leading to periodic crashes.
Reported-by: Dilyan Palauzov <dilyan.palauzov@aegee.org>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:
- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The helper functions prepare_final() and prepare_initial() return a
pointer to a string that is a member of an object in the revs->pending
array. This array is later rebuilt when running prepare_revision_walk()
which potentially transforms the pointer target into a bogus string. Fix
this by maintaining a copy of the original string.
Signed-off-by: Lukas Fleischer <git@cryptocrack.de>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading). Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.
While at it, swap two of the arguments in the function to put output
arguments at the end. As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.
Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.
Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"log --date=iso" uses a slight variant of ISO 8601 format that is
made more human readable. A new "--date=iso-strict" option gives
datetime output that is more strictly conformant.
* bb/date-iso-strict:
pretty: provide a strict ISO 8601 date format
Git's "ISO" date format does not really conform to the ISO 8601
standard due to small differences, and it cannot be parsed by ISO
8601-only parsers, e.g. those of XML toolchains.
The output from "--date=iso" deviates from ISO 8601 in these ways:
- a space instead of the `T` date/time delimiter
- a space between time and time zone
- no colon between hours and minutes of the time zone
Add a strict ISO 8601 date format for displaying committer and
author dates. Use the '%aI' and '%cI' format specifiers and add
'--date=iso-strict' or '--date=iso8601-strict' date format names.
See http://thread.gmane.org/gmane.comp.version-control.git/255879 and
http://thread.gmane.org/gmane.comp.version-control.git/52414/focus=52585
for discussion.
Signed-off-by: Beat Bolli <bbolli@ewanet.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* rs/code-cleaning:
remote-testsvn: use internal argv_array of struct child_process in cmd_import()
bundle: use internal argv_array of struct child_process in create_bundle()
fast-import: use hashcmp() for SHA1 hash comparison
transport: simplify fetch_objs_via_rsync() using argv_array
run-command: use internal argv_array of struct child_process in run_hook_ve()
use commit_list_count() to count the members of commit_lists
strbuf: use strbuf_addstr() for adding C strings
Make sure all in-core commit objects are assigned a unique number
so that they can be annotated using the commit-slab API.
* jk/alloc-commit-id:
diff-tree: avoid lookup_unknown_object
object_as_type: set commit index
alloc: factor out commit index
add object_as_type helper for casting objects
parse_object_buffer: do not set object type
move setting of object->type to alloc_* functions
alloc: write out allocator definitions
alloc.c: remove the alloc_raw_commit_node() function
Use xmemdupz() to allocate the memory, copy the data and make sure to
NUL-terminate the result, all in one step. The resulting code is
shorter, doesn't contain the constants 1 and '\0', and avoids
duplicating function parameters.
For blame, the last copied byte (o->file.ptr[o->file.size]) is always
set to NUL by fake_working_tree_commit() or read_sha1_file(), so no
information is lost by the conversion to using xmemdupz().
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Call commit_list_count() instead of open-coding it repeatedly.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
An experiment to use two files (the base file and incremental
changes relative to it) to represent the index to reduce I/O cost
of rewriting a large index when only small part of the working tree
changes.
* nd/split-index: (32 commits)
t1700: new tests for split-index mode
t2104: make sure split index mode is off for the version test
read-cache: force split index mode with GIT_TEST_SPLIT_INDEX
read-tree: note about dropping split-index mode or index version
read-tree: force split-index mode off on --index-output
rev-parse: add --shared-index-path to get shared index path
update-index --split-index: do not split if $GIT_DIR is read only
update-index: new options to enable/disable split index mode
split-index: strip pathname of on-disk replaced entries
split-index: do not invalidate cache-tree at read time
split-index: the reading part
split-index: the writing part
read-cache: mark updated entries for split index
read-cache: save deleted entries in split index
read-cache: mark new entries for split index
read-cache: split-index mode
read-cache: save index SHA-1 after reading
entry.c: update cache_changed if refresh_cache is set in checkout_entry()
cache-tree: mark istate->cache_changed on prime_cache_tree()
cache-tree: mark istate->cache_changed on cache tree update
...
A handful of code paths had to read the commit object more than
once when showing header fields that are usually not parsed. The
internal data structure to keep track of the contents of the commit
object has been updated to reduce the need for this double-reading,
and to allow the caller find the length of the object.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
The "struct object" type implements basic object
polymorphism. Individual instances are allocated as
concrete types (or as a union type that can store any
object), and a "struct object *" can be cast into its real
type after examining its "type" enum. This means it is
dangerous to have a type field that does not match the
allocation (e.g., setting the type field of a "struct blob"
to "OBJ_COMMIT" would mean that a reader might read past the
allocated memory).
In most of the current code this is not a problem; the first
thing we do after allocating an object is usually to set its
type field by passing it to create_object. However, the
virtual commits we create in merge-recursive.c do not ever
get their type set. This does not seem to have caused
problems in practice, though (presumably because we always
pass around a "struct commit" pointer and never even look at
the type).
We can fix this oversight and also make it harder for future
code to get it wrong by setting the type directly in the
object allocation functions.
This will also make it easier to fix problems with commit
index allocation, as we know that any object allocated by
alloc_commit_node will meet the invariant that an object
with an OBJ_COMMIT type field will have a unique index
number.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move "commit->buffer" out of the in-core commit object and keep
track of their lengths. Use this to optimize the code paths to
validate GPG signatures in commit objects.
* jk/commit-buffer-length:
reuse cached commit buffer when parsing signatures
commit: record buffer length in cache
commit: convert commit->buffer to a slab
commit-slab: provide a static initializer
use get_commit_buffer everywhere
convert logmsg_reencode to get_commit_buffer
use get_commit_buffer to avoid duplicate code
use get_cached_commit_buffer where appropriate
provide helpers to access the commit buffer
provide a helper to set the commit buffer
provide a helper to free commit buffer
sequencer: use logmsg_reencode in get_message
logmsg_reencode: return const buffer
do not create "struct commit" with xcalloc
commit: push commit_index update into alloc_commit_node
alloc: include any-object allocations in alloc_report
replace dangerous uses of strbuf_attach
commit_tree: take a pointer/len pair rather than a const strbuf
"git blame" assigned the blame to the copy in the working-tree if
the repository is set to core.autocrlf=input and the file used CRLF
line endings.
* bc/blame-crlf-test:
blame: correctly handle files regardless of autocrlf
Changing get_next_line() to return the end pointer instead of NULL in
case no newline character is found treats allows us to treat complete
and incomplete lines the same, simplifying the code. Switching to
counting lines instead of EOLs allows us to start counting at the
first character, instead of having to call get_next_line() first.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Move the code for finding the start of the next line into a helper
function in order to reduce duplication.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.
For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.
This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Like the callsites in the previous commit, logmsg_reencode
already falls back to read_sha1_file when necessary.
However, I split its conversion out into its own commit
because it's a bit more complex.
We return either:
1. The original commit->buffer
2. A newly allocated buffer from read_sha1_file
3. A reencoded buffer (based on either 1 or 2 above).
while trying to do as few extra reads/allocations as
possible. Callers currently free the result with
logmsg_free, but we can simplify this by pointing them
straight to unuse_commit_buffer. This is a slight layering
violation, in that we may be passing a buffer from (3).
However, since the end result is to free() anything except
(1), which is unlikely to change, and because this makes the
interface much simpler, it's a reasonable bending of the
rules.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Right now this is just a one-liner, but abstracting it will
make it easier to change later.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The return value from logmsg_reencode may be either a newly
allocated buffer or a pointer to the existing commit->buffer.
We would not want the caller to accidentally free() or
modify the latter, so let's mark it as const. We can cast
away the constness in logmsg_free, but only once we have
determined that it is a free-able buffer.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In both blame and merge-recursive, we sometimes create a
"fake" commit struct for convenience (e.g., to represent the
HEAD state as if we would commit it). By allocating
ourselves rather than using alloc_commit_node, we do not
properly set the "index" field of the commit. This can
produce subtle bugs if we then use commit-slab on the
resulting commit, as we will share the "0" index with
another commit.
We can fix this by using alloc_commit_node() to allocate.
Note that we cannot free the result, as it is part of our
commit allocator. However, both cases were already leaking
the allocated commit anyway, so there's nothing to fix up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If a file contained CRLF line endings in a repository with
core.autocrlf=input, then blame always marked lines as "Not
Committed Yet", even if they were unmodified.
* bc/blame-crlf-test:
blame: correctly handle files regardless of autocrlf
"git blame" has been optimized greatly by reorganising the data
structure that is used to keep track of the work to be done, thanks
to David Karstrup <dak@gnu.org>.
* dk/blame-reorg:
blame: large-scale performance rewrite
If a file contained CRLF line endings in a repository with
core.autocrlf=input, then blame always marked lines as "Not
Committed Yet", even if they were unmodified. Don't attempt to
convert the line endings when creating the fake commit so that blame
works correctly regardless of the autocrlf setting.
Reported-by: Ephrim Khong <dr.khong@gmail.com>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous implementation used a single sorted linear list of blame
entries for organizing all partial or completed work. Every subtask had
to scan the whole list, with most entries not being relevant to the
task. The resulting run-time was quadratic to the number of separate
chunks.
This change gives every subtask its own data to work with. Subtasks are
organized into "struct origin" chains hanging off particular commits.
Commits are organized into a priority queue, processing them in commit
date order in order to keep most of the work affecting a particular blob
collated even in the presence of an extensive merge history.
For large files with a diversified history, a speedup by a factor of 3
or more is not unusual.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When show date in relative date format for git-blame, the max display
width of datetime is set as the length of the string "Thu Oct 19
16:00:04 2006 -0700" (30 characters long). But actually the max width
for C locale is only 22 (the length of string "x years, xx months ago").
And for other locale, it maybe smaller. E.g. For Chinese locale, only
needs a half (16-character width).
Set blame_date_width as the display width of _("4 years, 11 months
ago"), so that translators can make the choice.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Command `git blame --date relative` aligns the date field with a
fixed-width (defined by blame_date_width), and if time_str is shorter
than that, it adds spaces for padding. But there are two bugs in the
following codes:
time_len = strlen(time_str);
...
memset(time_buf + time_len, ' ', blame_date_width - time_len);
1. The type of blame_date_width is size_t, which is unsigned. If
time_len is greater than blame_date_width, the result of
"blame_date_width - time_len" will never be a negative number, but a
really big positive number, and will cause memory overwrite.
This bug can be triggered if either l10n message for function
show_date_relative() in date.c is longer than 30 characters, then
`git blame --date relative` may exit abnormally.
2. When show blame information with relative time, the UTF-8 characters
in time_str will break the alignment of columns after the date field.
This is because the time_buf padding with spaces should have a
constant display width, not a fixed strlen size. So we should call
utf8_strwidth() instead of strlen() for width calibration.
Helped-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Attempts to show where a single-strand-of-pearls break in "git log"
output.
* nd/log-show-linear-break:
log: add --show-linear-break to help see non-linear history
object.h: centralize object flag allocation
While the field "flags" is mainly used by the revision walker, it is
also used in many other places. Centralize the whole flag allocation to
one place for a better overview (and easier to move flags if we have
too).
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up.
* dk/blame-janitorial:
builtin/blame.c::find_copy_in_blob: no need to scan for region end
blame.c: prepare_lines should not call xrealloc for every line
builtin/blame.c::prepare_lines: fix allocation size of sb->lineno
builtin/blame.c: eliminate same_suspect()
builtin/blame.c: struct blame_entry does not need a prev link
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.
* ep/varscope:
builtin/gc.c: reduce scope of variables
builtin/fetch.c: reduce scope of variable
builtin/commit.c: reduce scope of variables
builtin/clean.c: reduce scope of variable
builtin/blame.c: reduce scope of variables
builtin/apply.c: reduce scope of variables
bisect.c: reduce scope of variable
Making a single preparation run for counting the lines will avoid memory
fragmentation. Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *) < sizeof(int), admittedly unlikely.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If we are calling xrealloc on every single line, the least we can do
is get the right allocation size.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since the origin pointers are "interned" and reference-counted, comparing
the pointers rather than the content is enough. The only uninterned
origins are cached values kept in commit->util, but same_suspect is not
called on them.
Signed-off-by: David Kastrup <dak@gnu.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There is no reason to have a hardcoded upper limit of the number of
parents for an octopus merge, created via the graft mechanism.
* js/lift-parent-count-limit:
Remove the line length limit for graft files
Support for grafts predates Git's strbuf, and hence it is understandable
that there was a hard-coded line length limit of 1023 characters (which
was chosen a bit awkwardly, given that it is *exactly* one byte short of
aligning with the 41 bytes occupied by a commit name and the following
space or new-line character).
While regular commit histories hardly win comprehensibility in general
if they merge more than twenty-two branches in one go, it is not Git's
business to limit grafts in such a way.
In this particular developer's case, the use case that requires
substantially longer graft lines to be supported is the visualization of
the commits' order implied by their changes: commits are considered to
have an implicit relationship iff exchanging them in an interactive
rebase would result in merge conflicts.
Thusly implied branches tend to be very shallow in general, and the
resulting thicket of implied branches is usually very wide; It is
actually quite common that *most* of the commits in a topic branch have
not even one implied parent, so that a final merge commit has about as
many implied parents as there are commits in said branch.
[jc: squashed in tests by Jonathan]
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* jk/robustify-parse-commit:
checkout: do not die when leaving broken detached HEAD
use parse_commit_or_die instead of custom message
use parse_commit_or_die instead of segfaulting
assume parse_commit checks for NULL commit
assume parse_commit checks commit->object.parsed
log_tree_diff: die when we fail to parse a commit
Normally parse_pathspec() is used on command line arguments where it
can do fancy thing like parsing magic on each argument or adding magic
for all pathspecs based on --*-pathspecs options.
There's another use of parse_pathspec(), where pathspec is needed, but
the input is known to be pure paths. In this case we usually don't
want --*-pathspecs to interfere. And we definitely do not want to
parse magic in these paths, regardless of --literal-pathspecs.
Add new flag PATHSPEC_LITERAL_PATH for this purpose. When it's set,
--*-pathspecs are ignored, no magic is parsed. And if the caller
allows PATHSPEC_LITERAL (i.e. the next calls can take literal magic),
then PATHSPEC_LITERAL will be set.
This fixes cases where git chokes when GIT_*_PATHSPECS are set because
parse_pathspec() indicates it won't take any magic. But
GIT_*_PATHSPECS add them anyway. These are
export GIT_LITERAL_PATHSPECS=1
git blame -- something
git log --follow something
git log --merge
"git ls-files --with-tree=path" (aka parse_pathspec() in
overlay_tree_on_cache()) is safe because the input is empty, and
producing one pathspec due to PATHSPEC_PREFER_CWD does not take any
magic into account.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The parse_commit function will check the "parsed" flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git mv A B" when moving a submodule A does "the right thing",
inclusing relocating its working tree and adjusting the paths in
the .gitmodules file.
* jl/submodule-mv: (53 commits)
rm: delete .gitmodules entry of submodules removed from the work tree
mv: update the path entry in .gitmodules for moved submodules
submodule.c: add .gitmodules staging helper functions
mv: move submodules using a gitfile
mv: move submodules together with their work trees
rm: do not set a variable twice without intermediate reading.
t6131 - skip tests if on case-insensitive file system
parse_pathspec: accept :(icase)path syntax
pathspec: support :(glob) syntax
pathspec: make --literal-pathspecs disable pathspec magic
pathspec: support :(literal) syntax for noglob pathspec
kill limit_pathspec_to_literal() as it's only used by parse_pathspec()
parse_pathspec: preserve prefix length via PATHSPEC_PREFIX_ORIGIN
parse_pathspec: make sure the prefix part is wildcard-free
rename field "raw" to "_raw" in struct pathspec
tree-diff: remove the use of pathspec's raw[] in follow-rename codepath
remove match_pathspec() in favor of match_pathspec_depth()
remove init_pathspec() in favor of parse_pathspec()
remove diff_tree_{setup,release}_paths
convert common_prefix() to use struct pathspec
...
Teaches "git blame" to take more than one -L ranges.
* es/blame-L-twice:
line-range: reject -L line numbers less than 1
t8001/t8002: blame: add tests of -L line numbers less than 1
line-range: teach -L^:RE to search from start of file
line-range: teach -L:RE to search from end of previous -L range
line-range: teach -L^/RE/ to search from start of file
line-range-format.txt: document -L/RE/ relative search
log: teach -L/RE/ to search from end of previous -L range
blame: teach -L/RE/ to search from end of previous -L range
line-range: teach -L/RE/ to search relative to anchor point
blame: document multiple -L support
t8001/t8002: blame: add tests of multiple -L options
blame: accept multiple -L ranges
blame: inline one-line function into its lone caller
range-set: publish API for re-use by git-blame -L
line-range-format.txt: clarify -L:regex usage form
git-log.txt: place each -L option variation on its own line
Range specification -L/RE/ for blame/log unconditionally begins
searching at line one. Mailing list discussion [1] suggests that, in the
presence of multiple -L options, -L/RE/ should search relative to the
endpoint of the previous -L range, if any.
Teach the parsing machinery underlying blame's and log's -L options to
accept a start point for -L/RE/ searches. Follow-up patches will upgrade
blame and log to take advantage of this ability.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/229755/focus=229966
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git-blame accepts only a single -L option or none. Clients requiring
blame information for multiple disjoint ranges are therefore forced
either to invoke git-blame multiple times, once for each range, or only
once with no -L option to cover the entire file, both of which can be
costly. Teach git-blame to accept multiple -L ranges. Overlapping and
out-of-order ranges are accepted.
In this patch, the X in -LX,Y is absolute (for instance, /RE/ patterns
search from line 1), and Y is relative to X. Follow-up patches provide
more flexibility over how X is anchored.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
As of 25ed3412 (Refactor parse_loc; 2013-03-28),
blame.c:prepare_blame_range() became effectively a one-line function
which merely passes its arguments along to another function. This
indirection does not bring clarity to the code. Simplify by inlining
prepare_blame_range() into its lone caller.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since inception, -LX,Y has correctly reported an out-of-range error when
Y is beyond end of file, however, X was not checked, and an out-of-range
X would cause a crash. 92f9e273 (blame: prevent a segv when -L given
start > EOF; 2010-02-08) attempted to rectify this shortcoming but has
its own off-by-one error which allows X to extend one line past end of
file. For example, given a file with 5 lines:
git blame -L5 foo # OK, blames line 5
git blame -L6 foo # accepted, no error, no output, huh?
git blame -L7 foo # error "fatal: file foo has only 5 lines"
Fix this bug.
In order to avoid regressing "blame foo" when foo is an empty file, the
fix is slightly more complicated than changing '<' to '<='.
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.
This patch does not change semantics of any command intentionally.
Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
While at there, move free_pathspec() to pathspec.c
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* tr/line-log:
git-log(1): remove --full-line-diff description
line-log: fix documentation formatting
log -L: improve comments in process_all_files()
log -L: store the path instead of a diff_filespec
log -L: test merge of parallel modify/rename
t4211: pass -M to 'git log -M -L...' test
log -L: fix overlapping input ranges
log -L: check range set invariants when we look it up
Speed up log -L... -M
log -L: :pattern:file syntax to find by funcname
Implement line-history search (git log -L)
Export rewrite_parents() for 'log -L'
Refactor parse_loc
pretty-printing body of the commit that is stored in non UTF-8
encoding did not work well. The early part of this series fixes
it. And then it adds %C(auto) specifier that turns the coloring on
when we are emitting to the terminal, and adds column-aligning
format directives.
* nd/pretty-formats:
pretty: support %>> that steal trailing spaces
pretty: support truncating in %>, %< and %><
pretty: support padding placeholders, %< %> and %><
pretty: add %C(auto) for auto-coloring
pretty: split color parsing into a separate function
pretty: two phase conversion for non utf-8 commits
utf8.c: add reencode_string_len() that can handle NULs in string
utf8.c: add utf8_strnwidth() with the ability to skip ansi sequences
utf8.c: move display_mode_esc_sequence_len() for use by other functions
pretty: share code between format_decoration and show_decorations
pretty-formats.txt: wrap long lines
pretty: get the correct encoding for --pretty:format=%e
pretty: save commit encoding from logmsg_reencode if the caller needs it
The commit encoding is parsed by logmsg_reencode, there's no need for
the caller to re-parse it again. The reencoded message now has the new
encoding, not the original one. The caller would need to read commit
object again before parsing.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
split_ident_line() can leave us with the pointers date_begin, date_end,
tz_begin and tz_end all set to NULL. Check them before use and supply
the same fallback values as in the case of a negative return code from
split_ident_line().
The "(unknown)" is not actually shown in the output, though, because it
will be converted to a number (zero) eventually.
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This new syntax finds a funcname matching /pattern/, and then takes from there
up to (but not including) the next funcname. So you can say
git log -L:main:main.c
and it will dig up the main() function and show its line-log, provided
there are no other funcnames matching 'main'.
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We want to use the same style of -L n,m argument for 'git log -L' as
for git-blame. Refactor the argument parsing of the range arguments
from builtin/blame.c to the (new) file that will hold the 'git log -L'
logic.
To accommodate different data structures in blame and log -L, the file
contents are abstracted away; parse_range_arg takes a callback that it
uses to get the contents of a line of the (notional) file.
The new test is for a case that made me pause during debugging: the
'blame -L with invalid end' test was the only one that noticed an
outright failure to parse the end *at all*. So make a more explicit
test for that.
Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Usually a commit that makes it to logmsg_reencode will have
been parsed, and the commit->buffer struct member will be
valid. However, some code paths will free commit buffers
after having used them (for example, the log traversal
machinery will do so to keep memory usage down).
Most of the time this is fine; log should only show a commit
once, and then exits. However, there are some code paths
where this does not work. At least two are known:
1. A commit may be shown as part of a regular ref, and
then it may be shown again as part of a submodule diff
(e.g., if a repo contains refs to both the superproject
and subproject).
2. A notes-cache commit may be shown during "log --all",
and then later used to access a textconv cache during a
diff.
Lazily loading in logmsg_reencode does not necessarily catch
all such cases, but it should catch most of them. Users of
the commit buffer tend to be either parsing for structure
(in which they will call parse_commit, and either we will
already have parsed, or we will load commit->buffer lazily
there), or outputting (either to the user, or fetching a
part of the commit message via format_commit_message). In
the latter case, we should always be using logmsg_reencode
anyway (and typically we do so via the pretty-print
machinery).
If there are any cases that this misses, we can fix them up
to use logmsg_reencode (or handle them on a case-by-case
basis if that is inappropriate).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The logmsg_reencode function will return the reencoded
commit buffer, or NULL if reencoding failed or no reencoding
was necessary. Since every caller then ends up checking for NULL
and just using the commit's original buffer, anyway, we can
be a bit more helpful and just return that buffer when we
would have returned NULL.
Since the resulting string may or may not need to be freed,
we introduce a logmsg_free, which checks whether the buffer
came from the commit object or not (callers either
implemented the same check already, or kept two separate
pointers, one to mark the buffer to be used, and one for the
to-be-freed string).
Pushing this logic into logmsg_* simplifies the callers, and
will let future patches lazily load the commit buffer in a
single place.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Teach commands in the "log" family to optionally pay attention to
the mailmap.
* ap/log-mailmap:
log --use-mailmap: optimize for cases without --author/--committer search
log: add log.mailmap configuration option
log: grep author/committer using mailmap
test: add test for --use-mailmap option
log: add --use-mailmap option
pretty: use mailmap to display username and email
mailmap: add mailmap structure to rev_info and pp
mailmap: simplify map_user() interface
mailmap: remove email copy and length limitation
Use split_ident_line to parse author and committer
string-list: allow case-insensitive string list
Simplify map_user(), mostly to avoid copies of string buffers. It
also simplifies caller functions.
map_user() directly receive pointers and length from the commit buffer
as mail and name. If mapping of the user and mail can be done, the
pointer is updated to a new location. Lengths are also updated if
necessary.
The caller of map_user() can then copy the new email and name if
necessary.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Currently blame.c::get_acline(), pretty.c::pp_user_info() and
shortlog.c::insert_one_record() are parsing author name, email, time
and tz themselves.
Use ident.c::split_ident_line() for better code reuse.
Signed-off-by: Antoine Pelisse <apelisse@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This function has only two callsites, and is a thin wrapper whose
usefulness is dubious. When the caller needs to learn the log
output encoding, it should be able to do so by directly calling
get_log_output_encoding() and calling the underlying
logmsg_reencode() with it.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If you know your history did not have renames, or if you care only
about the history after a large rename that happened some time ago,
"git blame --no-follow $path" is a way to tell the command not to
bother about renames.
When you use -C, the lines that came from the renamed file will
still be found without the whole-file rename detection, so it is not
all that interesting either way, though.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame MAKEFILE" run in a history that has "Makefile" but not
"MAKEFILE" should say "No such file MAKEFILE in HEAD", but got
confused on a case insensitive filesystem and failed to do so.
Even during a conflicted merge, "git blame $path" always meant to
blame uncommitted changes to the "working tree" version; make it
more useful by showing cleanly merged parts as coming from the other
branch that is being merged.
* jc/maint-blame-no-such-path:
blame: allow "blame file" in the middle of a conflicted merge
blame $path: avoid getting fooled by case insensitive filesystems
"git blame file" has always meant "find the origin of each line of
the file in the history leading to HEAD, oh by the way, blame the
lines that are modified locally to the working tree".
This teaches "git blame" that during a conflicted merge, some
uncommitted changes may have come from the other history that is
being merged.
The verify_working_tree_path() function introduced in the previous
patch to notice a typo in the filename (primarily on case insensitive
filesystems) has been updated to allow a filename that does not exist
in HEAD (i.e. the tip of our history) as long as it exists one of the
commits being merged, so that a "we deleted, the other side modified"
case tracks the history of the file in the history of the other side.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame MAKEFILE" run in a history that has "Makefile" but not
MAKEFILE can get confused on a case insensitive filesystem, because
the check we run to see if there is a corresponding file in the
working tree with lstat("MAKEFILE") succeeds. In addition to that
check, we have to make sure that the given path also exists in the
commit we start digging history from (i.e. "HEAD").
Note that this reveals the breakage in a test added in cd8ae20
(git-blame shouldn't crash if run in an unmerged tree, 2007-10-18),
which expects the entire merge-in-progress path to be blamed to the
working tree when it did not exist in our tree. As it is clear in
the log message of that commit, the old breakage was that it was
causing an internal error and the fix was about avoiding it.
Just check that the command does not die an uncontrolled death. For
this particular case, the blame should fail, as the history for the
file in that contents has not been committed yet at the point in the
test.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
A lot of i18n mark-up for the help text from "git <cmd> -h".
* nd/i18n-parseopt-help: (66 commits)
Use imperative form in help usage to describe an action
Reduce translations by using same terminologies
i18n: write-tree: mark parseopt strings for translation
i18n: verify-tag: mark parseopt strings for translation
i18n: verify-pack: mark parseopt strings for translation
i18n: update-server-info: mark parseopt strings for translation
i18n: update-ref: mark parseopt strings for translation
i18n: update-index: mark parseopt strings for translation
i18n: tag: mark parseopt strings for translation
i18n: symbolic-ref: mark parseopt strings for translation
i18n: show-ref: mark parseopt strings for translation
i18n: show-branch: mark parseopt strings for translation
i18n: shortlog: mark parseopt strings for translation
i18n: rm: mark parseopt strings for translation
i18n: revert, cherry-pick: mark parseopt strings for translation
i18n: rev-parse: mark parseopt strings for translation
i18n: reset: mark parseopt strings for translation
i18n: rerere: mark parseopt strings for translation
i18n: status: mark parseopt strings for translation
i18n: replace: mark parseopt strings for translation
...
We do not want a link to 0{40} object stored anywhere in our objects.
* jk/maint-null-in-trees:
fsck: detect null sha1 in tree entries
do not write null sha1s to on-disk index
diff: do not use null sha1 as a sentinel value
diff_setup_done() has historically returned an error code, but lost
the last nonzero return in 943d5b7 (allow diff.renamelimit to be set
regardless of -M/-C, 2006-08-09). The callers were in a pretty
confused state: some actually checked for the return code, and some
did not.
Let it return void, and patch all callers to take this into account.
This conveniently also gets rid of a handful of different(!) error
messages that could never be triggered anyway.
Note that the function can still die().
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The diff code represents paths using the diff_filespec
struct. This struct has a sha1 to represent the sha1 of the
content at that path, as well as a sha1_valid member which
indicates whether its sha1 field is actually useful. If
sha1_valid is not true, then the filespec represents a
working tree file (e.g., for the no-index case, or for when
the index is not up-to-date).
The diff_filespec is only used internally, though. At the
interfaces to the diff subsystem, callers feed the sha1
directly, and we create a diff_filespec from it. It's at
that point that we look at the sha1 and decide whether it is
valid or not; callers may pass the null sha1 as a sentinel
value to indicate that it is not.
We should not typically see the null sha1 coming from any
other source (e.g., in the index itself, or from a tree).
However, a corrupt tree might have a null sha1, which would
cause "diff --patch" to accidentally diff the working tree
version of a file instead of treating it as a blob.
This patch extends the edges of the diff interface to accept
a "sha1_valid" flag whenever we accept a sha1, and to use
that flag when creating a filespec. In some cases, this
means passing the flag through several layers, making the
code change larger than would be desirable.
One alternative would be to simply die() upon seeing
corrupted trees with null sha1s. However, this fix more
directly addresses the problem (while bogus sha1s in a tree
are probably a bad thing, it is really the sentinel
confusion sending us down the wrong code path that is what
makes it devastating). And it means that git is more capable
of examining and debugging these corrupted trees. For
example, you can still "diff --raw" such a tree to find out
when the bogus entry was introduced; you just cannot do a
"--patch" diff (just as you could not with any other
corrupted tree, as we do not have any content to diff).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame" did not try to make sure that the abbreviated commit
object names in its output are unique.
* jc/maint-blame-unique-abbrev:
blame: compute abbreviation width that ensures uniqueness
Strip the name length from the ce_flags field and move it
into its own ce_namelen field in struct cache_entry. This
will both give us a tiny bit of a performance enhancement
when working with long pathnames and is a refactoring for
more readability of the code.
It enhances readability, by making it more clear what
is a flag, and where the length is stored and make it clear
which functions use stages in comparisions and which only
use the length.
It also makes CE_NAMEMASK private, so that users don't
mistakenly write the name length in the flags.
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame" did not try to make sure the abbreviated commit object
names in its output are unique.
* jc/maint-blame-unique-abbrev:
blame: compute abbreviation width that ensures uniqueness
Julia Lawall noticed that in linux-next repository the commit object
60d5c9f5 (shown with the default abbreviation width baked into "git
blame") in output from
$ git blame -L 3675,3675 60d5c9f5b -- \
drivers/staging/brcm80211/brcmfmac/wl_iw.c
is no longer unique in the repository, which results in "short SHA1
60d5c9f5 is ambiguous".
Compute the minimum abbreviation width that ensures uniqueness when
the user did not specify the --abbrev option to avoid this.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Plain gcc may not but sparse catches and complains about this sort of
stuff.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use handle_split_cb() directly as hunk_func() callback, without going
through xdi_diff_hunks().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Use blame_chunk_cb() directly as hunk_func() callback, without detour
through xdi_diff_hunks().
Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git blame" started missing quite a few changes from the origin since we
stopped using the diff minimalization by default in v1.7.2 era.
Teach "--minimal" option to "git blame" to work around this regression.
* jc/maint-blame-minimal:
blame: accept --need-minimal
"git blame" started missing quite a few changes from the origin since we
stopped using the diff minimalization by default in v1.7.2 era.
* jc/maint-blame-minimal:
blame: accept --need-minimal
Between v1.7.1 and v1.7.2, 582aa00bdf switched the default "diff"
invocation not to use XDF_NEED_MINIMAL, but this breaks "git blame"
rather badly.
Allow the command line option to ask for an extra careful matching.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
builtin/blame.c has a helper function to compute how many columns
we need to show a line-number, whose implementation is reusable as
a more generic helper function to count the number of columns
necessary to show any cardinal number.
Rename it to decimal_width(), move it to pager.c and export it for
use by future callers.
Signed-off-by: Zbigniew Jędrzejewski-Szmek <zbyszek@in.waw.pl>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the userdiff_config function was introduced in be58e70
(diff: unify external diff and funcname parsing code,
2008-10-05), it used a return value convention unlike any
other config callback. Like other callbacks, it used "-1" to
signal error. But it returned "1" to indicate that it found
something, and "0" otherwise; other callbacks simply
returned "0" to indicate that no error occurred.
This distinction was necessary at the time, because the
userdiff namespace overlapped slightly with the color
configuration namespace. So "diff.color.foo" could mean "the
'foo' slot of diff coloring" or "the 'foo' component of the
"color" userdiff driver". Because the color-parsing code
would die on an unknown color slot, we needed the userdiff
code to indicate that it had matched the variable, letting
us bypass the color-parsing code entirely.
Later, in 8b8e862 (ignore unknown color configuration,
2009-12-12), the color-parsing code learned to silently
ignore unknown slots. This means we no longer need to
protect userdiff-matched variables from reaching the
color-parsing code.
We can therefore change the userdiff_config calling
convention to a more normal one. This drops some code from
each caller, which is nice. But more importantly, it reduces
the cognitive load for readers who may wonder why
userdiff_config is unlike every other config callback.
There's no need to add a new test confirming that this
works; t4020 already contains a test that sets
diff.color.external.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
* maint-1.7.7:
Git 1.7.7.5
Git 1.7.6.5
blame: don't overflow time buffer
fetch: create status table using strbuf
checkout,merge: loosen overwriting untracked file check based on info/exclude
cast variable in call to free() in builtin/diff.c and submodule.c
apply: get rid of useless x < 0 comparison on a size_t type
Conflicts:
Documentation/git.txt
GIT-VERSION-GEN
RelNotes
builtin/fetch.c
When showing the raw timestamp, we format the numeric
seconds-since-epoch into a buffer, followed by the timezone
string. This string has come straight from the commit
object. A well-formed object should have a timezone string
of only a few bytes, but we could be operating on data
pushed by a malicious user.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>