Commit Graph

11245 Commits

Author SHA1 Message Date
Junio C Hamano
4e34e20c9f Merge branch 'dt/tree-fsck'
The codepath in "git fsck" to detect malformed tree objects has
been updated not to die but keep going after detecting them.

* dt/tree-fsck:
  fsck: handle bad trees like other errors
  tree-walk: be more specific about corrupt tree errors
2016-10-03 13:30:38 -07:00
Junio C Hamano
fe252ef81a Merge branch 'kd/mailinfo-quoted-string'
An author name, that spelled a backslash-quoted double quote in the
human readable part "My \"double quoted\" name", was not unquoted
correctly while applying a patch from a piece of e-mail.

* kd/mailinfo-quoted-string:
  mailinfo: unescape quoted-pair in header fields
  t5100-mailinfo: replace common path prefix with variable
2016-10-03 13:30:38 -07:00
Junio C Hamano
53eb85e623 Merge branch 'nd/init-core-worktree-in-multi-worktree-world'
"git init" tried to record core.worktree in the repository's
'config' file when GIT_WORK_TREE environment variable was set and
it was different from where GIT_DIR appears as ".git" at its top,
but the logic was faulty when .git is a "gitdir:" file that points
at the real place, causing trouble in working trees that are
managed by "git worktree".  This has been corrected.

* nd/init-core-worktree-in-multi-worktree-world:
  init: kill git_link variable
  init: do not set unnecessary core.worktree
  init: kill set_git_dir_init()
  init: call set_git_dir_init() from within init_db()
  init: correct re-initialization from a linked worktree
2016-10-03 13:30:35 -07:00
Junio C Hamano
347408496a Merge branch 'ik/gitweb-force-highlight'
"gitweb" can spawn "highlight" to show blob contents with
(programming) language-specific syntax highlighting, but only
when the language is known.  "highlight" can however be told
to make the guess itself by giving it "--force" option, which
has been enabled.

* ik/gitweb-force-highlight:
  gitweb: use highlight's shebang detection
  gitweb: remove unused guess_file_syntax() parameter
2016-10-03 13:30:34 -07:00
Junio C Hamano
4cff50b3fb Merge branch 'jt/mailinfo-fold-in-body-headers'
When "git format-patch --stdout" output is placed as an in-body
header and it uses the RFC2822 header folding, "git am" failed to
put the header line back into a single logical line.  The
underlying "git mailinfo" was taught to handle this properly.

* jt/mailinfo-fold-in-body-headers:
  mailinfo: handle in-body header continuations
  mailinfo: make is_scissors_line take plain char *
  mailinfo: separate in-body header processing
2016-09-29 16:57:12 -07:00
Kevin Daudt
f357e5de31 mailinfo: unescape quoted-pair in header fields
rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Remove exterior quotes and remove escape characters so that they don't
show up in the author field.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-28 13:21:18 -07:00
Kevin Daudt
ee4d679f57 t5100-mailinfo: replace common path prefix with variable
Many tests need to store data in a file, and repeat the same pattern to
refer to that path:

    "$TEST_DIRECTORY"/t5100/

Create a variable that contains this path, and use that instead.

While we're making this change, make sure the quotes are not just around
the variable, but around the entire string to not give the impression
we want shell splitting to affect the other variables.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-28 13:16:59 -07:00
David Turner
8354fa3d4c fsck: handle bad trees like other errors
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue.  Now fsck can tell the user which tree is
bad, too.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:09:10 -07:00
Jeff King
2edffef233 tree-walk: be more specific about corrupt tree errors
When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.

Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:08:30 -07:00
Junio C Hamano
104a93a329 Merge branch 'rt/rebase-i-broken-insn-advise'
When "git rebase -i" is given a broken instruction, it told the
user to fix it with "--edit-todo", but didn't say what the step
after that was (i.e. "--continue").

* rt/rebase-i-broken-insn-advise:
  rebase -i: improve advice on bad instruction lines
2016-09-26 16:09:21 -07:00
Junio C Hamano
ebc63580a1 Merge branch 'tg/add-chmod+x-fix'
"git add --chmod=+x <pathspec>" added recently only toggled the
executable bit for paths that are either new or modified. This has
been corrected to flip the executable bit for all paths that match
the given pathspec.

* tg/add-chmod+x-fix:
  t3700-add: do not check working tree file mode without POSIXPERM
  t3700-add: create subdirectory gently
  add: modify already added files when --chmod is given
  read-cache: introduce chmod_index_entry
  update-index: add test for chmod flags
2016-09-26 16:09:20 -07:00
Junio C Hamano
6a67695268 Merge branch 'js/regexec-buf'
Some codepaths in "git diff" used regexec(3) on a buffer that was
mmap(2)ed, which may not have a terminating NUL, leading to a read
beyond the end of the mapped region.  This was fixed by introducing
a regexec_buf() helper that takes a <ptr,len> pair with REG_STARTEND
extension.

* js/regexec-buf:
  regex: use regexec_buf()
  regex: add regexec_buf() that can work on a non NUL-terminated string
  regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
2016-09-26 16:09:19 -07:00
Junio C Hamano
31b83f361b Merge branch 'nd/checkout-disambiguation'
"git checkout <word>" does not follow the usual disambiguation
rules when the <word> can be both a rev and a path, to allow
checking out a branch 'foo' in a project that happens to have a
file 'foo' in the working tree without having to disambiguate.
This was poorly documented and the check was incorrect when the
command was run from a subdirectory.

* nd/checkout-disambiguation:
  checkout: fix ambiguity check in subdir
  checkout.txt: document a common case that ignores ambiguation rules
  checkout: add some spaces between code and comment
2016-09-26 16:09:18 -07:00
Junio C Hamano
8969feac7e Merge branch 'va/i18n-more'
Even more i18n.

* va/i18n-more:
  i18n: stash: mark messages for translation
  i18n: notes-merge: mark die messages for translation
  i18n: ident: mark hint for translation
  i18n: i18n: diff: mark die messages for translation
  i18n: connect: mark die messages for translation
  i18n: commit: mark message for translation
2016-09-26 16:09:18 -07:00
Junio C Hamano
e447d3182c Merge branch 'jt/format-patch-rfc'
In some projects, it is common to use "[RFC PATCH]" as the subject
prefix for a patch meant for discussion rather than application.  A
new option "--rfc" was a short-hand for "--subject-prefix=RFC PATCH"
to help the participants of such projects.

* jt/format-patch-rfc:
  format-patch: add "--rfc" for the common case of [RFC PATCH]
2016-09-26 16:09:17 -07:00
Junio C Hamano
b7af6ae5cf Merge branch 'mh/diff-indent-heuristic'
Output from "git diff" can be made easier to read by selecting
which lines are common and which lines are added/deleted
intelligently when the lines before and after the changed section
are the same.  A command line option is added to help with the
experiment to find a good heuristics.

* mh/diff-indent-heuristic:
  blame: honor the diff heuristic options and config
  parse-options: add parse_opt_unknown_cb()
  diff: improve positioning of add/delete blocks in diffs
  xdl_change_compact(): introduce the concept of a change group
  recs_match(): take two xrecord_t pointers as arguments
  is_blank_line(): take a single xrecord_t as argument
  xdl_change_compact(): only use heuristic if group can't be matched
  xdl_change_compact(): fix compaction heuristic to adjust ixo
2016-09-26 16:09:16 -07:00
Junio C Hamano
b3e588a48a Merge branch 'rs/c-auto-resets-attributes'
The pretty-format specifier "%C(auto)" used by the "log" family of
commands to enable coloring of the output is taught to also issue a
color-reset sequence to the output.

* rs/c-auto-resets-attributes:
  pretty: let %C(auto) reset all attributes
2016-09-26 16:09:15 -07:00
Ian Kelling
779a206632 gitweb: use highlight's shebang detection
The "highlight" binary can, in some cases, determine the language type
by the means of file contents, for example the shebang in the first line
for some scripting languages.  Make use of this autodetection for files
which syntax is not known by gitweb.  In that case, pass the blob
contents to "highlight --force"; the parameter is needed to make it
always generate HTML output (which includes HTML-escaping).

Although we now run highlight on files which do not end up highlighted,
performance is virtually unaffected because when we call highlight, it
is used for escaping HTML.  In the case that highlight is used, gitweb
calls sanitize() instead of esc_html(), and the latter is significantly
slower (it does more, being roughly a superset of sanitize()).  Simple
benchmark comparing performance of 'blob' view of files without syntax
highlighting in gitweb before and after this change indicates ±1%
difference in request time for all file types.  Benchmark was performed
on local instance on Debian, using Apache/2.4.23 web server and CGI.

Document the feature and improve syntax highlight documentation, add
test to ensure gitweb doesn't crash when language detection is used.

Signed-off-by: Ian Kelling <ian@iankelling.org>
Acked-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-25 16:39:11 -07:00
Nguyễn Thái Ngọc Duy
6311cfaf93 init: do not set unnecessary core.worktree
The function needs_work_tree_config() that is called from
create_default_files() is supposed to be fed the path to ".git" that
looks as if it is at the top of the working tree, and decide if that
location matches the actual worktree being used.  This comparison allows
"git init" to decide if core.worktree needs to be recorded in the
working tree.

In the current code, however, we feed the return value from
get_git_dir(), which can be totally different from what the function
expects when "gitdir" file is involved.  Instead of giving the path to
the ".git" at the top of the working tree, we end up feeding the actual
path that the file points at.

This original location of ".git" however is only known to init_db().
Make init_db() save it and have it passed to create_default_files() as a
new parameter, which passes the correct location down to
needs_work_tree_config() to fix this.

Noticed-by: Max Nordlund <max.nordlund@sqore.com>
Helped-by: Michael J Gruber <git@drmicha.warpmail.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-25 16:32:35 -07:00
Nguyễn Thái Ngọc Duy
fe9aa0b22e init: correct re-initialization from a linked worktree
When 'git init' is called from a linked worktree, we treat '.git'
dir (which is $GIT_COMMON_DIR/worktrees/something) as the main
'.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton
in there. It does not harm anything (*) but it is still wrong.

Since 'git init' calls set_git_dir() at preparation time, which
indirectly calls get_common_dir() and correctly detects multiple
worktree setup, all git_path_buf() calls in create_default_files() will
return correct paths in both single and multiple worktree setups. The
only thing left is copy_templates(), which targets $GIT_DIR, not
$GIT_COMMON_DIR.

Fix that with get_git_common_dir(). This function will return $GIT_DIR
in single-worktree setup, so we don't have to make a special case for
multiple-worktree here.

(*) It does in fact, thanks to another bug. More on that later.

Noticed-by: Max Nordlund <max.nordlund@sqore.com>
Helped-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-25 16:32:35 -07:00
Junio C Hamano
ae1ae600db Merge branch 'jk/rebase-i-drop-ident-check'
Even when "git pull --rebase=preserve" (and the underlying "git
rebase --preserve") can complete without creating any new commit
(i.e. fast-forwards), it still insisted on having a usable ident
information (read: user.email is set correctly), which was less
than nice.  As the underlying commands used inside "git rebase"
would fail with a more meaningful error message and advice text
when the bogus ident matters, this extra check was removed.

* jk/rebase-i-drop-ident-check:
  rebase-interactive: drop early check for valid ident
2016-09-21 15:15:28 -07:00
Junio C Hamano
1fe6f5fb0a Merge branch 'va/i18n'
More i18n.

* va/i18n:
  i18n: update-index: mark warnings for translation
  i18n: show-branch: mark plural strings for translation
  i18n: show-branch: mark error messages for translation
  i18n: receive-pack: mark messages for translation
  notes: spell first word of error messages in lowercase
  i18n: notes: mark error messages for translation
  i18n: merge-recursive: mark verbose message for translation
  i18n: merge-recursive: mark error messages for translation
  i18n: config: mark error message for translation
  i18n: branch: mark option description for translation
  i18n: blame: mark error messages for translation
2016-09-21 15:15:28 -07:00
Junio C Hamano
e8f871a9ce Merge branch 'jt/format-patch-base-info-above-sig'
"git format-patch --base=..." feature that was recently added
showed the base commit information after "-- " e-mail signature
line, which turned out to be inconvenient.  The base information
has been moved above the signature line.

* jt/format-patch-base-info-above-sig:
  format-patch: show base info before email signature
2016-09-21 15:15:27 -07:00
Junio C Hamano
0c5ff91639 Merge branch 'ks/perf-build-with-autoconf'
Performance tests done via "t/perf" did not use the same set of
build configuration if the user relied on autoconf generated
configuration.

* ks/perf-build-with-autoconf:
  t/perf/run: copy config.mak.autogen & friends to build area
2016-09-21 15:15:27 -07:00
Junio C Hamano
4ed38637ec Merge branch 'rs/xdiff-merge-overlapping-hunks-for-W-context'
"git diff -W" output needs to extend the context backward to
include the header line of the current function and also forward to
include the body of the entire current function up to the header
line of the next one.  This process may have to merge to adjacent
hunks, but the code forgot to do so in some cases.

* rs/xdiff-merge-overlapping-hunks-for-W-context:
  xdiff: fix merging of hunks with -W context and -u context
2016-09-21 15:15:26 -07:00
Junio C Hamano
d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano
7f109ef54e Merge branch 'ks/pack-objects-bitmap'
Some codepaths in "git pack-objects" were not ready to use an
existing pack bitmap; now they are and as the result they have
become faster.

* ks/pack-objects-bitmap:
  pack-objects: use reachability bitmap index when generating non-stdout pack
  pack-objects: respect --local/--honor-pack-keep/--incremental when bitmap is in use
2016-09-21 15:15:21 -07:00
Junio C Hamano
7889ed25ac Merge branch 'js/cat-file-filters'
Even though "git hash-objects", which is a tool to take an
on-filesystem data stream and put it into the Git object store,
allowed to perform the "outside-world-to-Git" conversions (e.g.
end-of-line conversions and application of the clean-filter), and
it had the feature on by default from very early days, its reverse
operation "git cat-file", which takes an object from the Git object
store and externalize for the consumption by the outside world,
lacked an equivalent mechanism to run the "Git-to-outside-world"
conversion.  The command learned the "--filters" option to do so.

* js/cat-file-filters:
  cat-file: support --textconv/--filters in batch mode
  cat-file --textconv/--filters: allow specifying the path separately
  cat-file: introduce the --filters option
  cat-file: fix a grammo in the man page
2016-09-21 15:15:19 -07:00
Junio C Hamano
07d872434d Merge branch 'jt/accept-capability-advertisement-when-fetching-from-void'
JGit can show a fake ref "capabilities^{}" to "git fetch" when it
does not advertise any refs, but "git fetch" was not prepared to
see such an advertisement.  When the other side disconnects without
giving any ref advertisement, we used to say "there may not be a
repository at that URL", but we may have seen other advertisement
like "shallow" and ".have" in which case we definitely know that a
repository is there.  The code to detect this case has also been
updated.

* jt/accept-capability-advertisement-when-fetching-from-void:
  connect: advertized capability is not a ref
  connect: tighten check for unexpected early hang up
  tests: move test_lazy_prereq JGIT to test-lib.sh
2016-09-21 15:15:18 -07:00
Johannes Sixt
40e0dc17ce t3700-add: do not check working tree file mode without POSIXPERM
A recently introduced test checks the result of 'git status' after
setting the executable bit on a file. This check does not yield the
expected result when the filesystem does not support the executable
bit.

What we care about is that a file added with "--chmod=+x" has
executable bit in the index and that "--chmod=+x" (or any other
options for that matter) does not muck with working tree files.
The former is tested by other existing tests, so let's check the
latter more explicitly and only under POSIXPERM prerequisite.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 14:09:54 -07:00
Johannes Schindelin
b7d36ffca0 regex: use regexec_buf()
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec(3) may have been mmap(2)ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G <regex>` would crash. This patch converts more callers,
though, some of which allocated to construct NUL-terminated strings,
or worse, modified buffers to temporarily insert NULs while calling
regexec(3).  By converting them to use regexec_buf(), the code has
become much cleaner.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 13:56:15 -07:00
Johannes Schindelin
db5dfa3314 regex: -G<pattern> feeds a non NUL-terminated string to regexec() and fails
When our pickaxe code feeds file contents to regexec(), it implicitly
assumes that the file contents are read into implicitly NUL-terminated
buffers (i.e. that we overallocate by 1, appending a single '\0').

This is not so.

In particular when the file contents are simply mmap()ed, we can be
virtually certain that the buffer is preceding uninitialized bytes, or
invalid pages.

Note that the test we add here is known to be flakey: we simply cannot
know whether the byte following the mmap()ed ones is a NUL or not.

Typically, on Linux the test passes. On Windows, it fails virtually
every time due to an access violation (that's a segmentation fault for
you Unix-y people out there). And Windows would be correct: the
regexec() call wants to operate on a regular, NUL-terminated string,
there is no NUL in the mmap()ed memory range, and it is undefined
whether the next byte is even legal to access.

When run with --valgrind it demonstrates quite clearly the breakage, of
course.

Being marked with `test_expect_failure`, this test will sometimes be
declare "TODO fixed", even if it only passes by mistake.

This test case represents a Minimal, Complete and Verifiable Example of
a breakage reported by Chris Sidi.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 13:56:15 -07:00
Johannes Sixt
b07ad46432 t3700-add: create subdirectory gently
The subdirectory 'sub' is created early in the test file. Later, a test
case removes it during its clean-up actions. However, this test case is
protected by POSIXPERM. Consequently, 'sub' remains when the POSIXPERM
prerequisite is not satisfied. Later, a recently introduced test case
creates 'sub' again. Use -p with mkdir so that it does not fail if 'sub'
already exists.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 11:05:35 -07:00
Jonathan Tan
6b4b013f18 mailinfo: handle in-body header continuations
Mailinfo currently handles multi-line headers, but it does not handle
multi-line in-body headers. Teach it to handle such headers, for
example, for this input:

  From: author <author@example.com>
  Date: Fri, 9 Jun 2006 00:44:16 -0700
  Subject: a very long
   broken line

  Subject: another very long
   broken line

interpret the in-body subject to be "another very long broken line"
instead of "another very long".

An existing test (t/t5100/msg0015) has an indented line immediately
after an in-body header - it has been modified to reflect the new
functionality.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 10:23:11 -07:00
Vasco Almeida
c041c6d06a i18n: notes-merge: mark die messages for translation
Update test to reflect changes.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 10:20:43 -07:00
Josh Triplett
68e83a5b82 format-patch: add "--rfc" for the common case of [RFC PATCH]
Add an alias for --subject-prefix='RFC PATCH', which is used
commonly in some development communities to deserve such a
short-hand.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 08:58:10 -07:00
Nguyễn Thái Ngọc Duy
b829b9439a checkout: fix ambiguity check in subdir
The two functions in parse_branchname_arg(), verify_non_filename and
check_filename, need correct prefix in order to reconstruct the paths
and check for their existence. With NULL prefix, they just check paths
at top dir instead.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-21 08:44:41 -07:00
Junio C Hamano
294573e6d7 Merge branch 'js/t9903-chaining' into maint
Test fix.

* js/t9903-chaining:
  t9903: fix broken && chain
2016-09-19 13:51:44 -07:00
Junio C Hamano
1e28677e5b Merge branch 'ep/use-git-trace-curl-in-tests' into maint
Update a few tests that used to use GIT_CURL_VERBOSE to use the
newer GIT_TRACE_CURL.

* ep/use-git-trace-curl-in-tests:
  t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
  t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
  test-lib.sh: preserve GIT_TRACE_CURL from the environment
  t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var
2016-09-19 13:51:41 -07:00
Junio C Hamano
8e26535866 Merge branch 'js/t6026-clean-up' into maint
A test spawned a short-lived background process, which sometimes
prevented the test directory from getting removed at the end of the
script on some platforms.

* js/t6026-clean-up:
  t6026-merge-attr: clean up background process at end of test case
2016-09-19 13:51:41 -07:00
Junio C Hamano
d6645312ff Merge branch 'jc/forbid-symbolic-ref-d-HEAD' into maint
"git symbolic-ref -d HEAD" happily removes the symbolic ref, but
the resulting repository becomes an invalid one.  Teach the command
to forbid removal of HEAD.

* jc/forbid-symbolic-ref-d-HEAD:
  symbolic-ref -d: do not allow removal of HEAD
2016-09-19 13:51:41 -07:00
Junio C Hamano
4c10c31137 Merge branch 'jc/submodule-anchor-git-dir' into maint
Having a submodule whose ".git" repository is somehow corrupt
caused a few commands that recurse into submodules loop forever.

* jc/submodule-anchor-git-dir:
  submodule: avoid auto-discovery in prepare_submodule_repo_env()
2016-09-19 13:51:40 -07:00
Junio C Hamano
79b51ebf6f Merge branch 'jk/test-lib-drop-pid-from-results' into maint
The test framework left the number of tests and success/failure
count in the t/test-results directory, keyed by the name of the
test script plus the process ID.  The latter however turned out not
to serve any useful purpose.  The process ID part of the filename
has been removed.

* jk/test-lib-drop-pid-from-results:
  test-lib: drop PID from test-results/*.count
2016-09-19 13:51:39 -07:00
Junio C Hamano
4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Junio C Hamano
81358dc238 Merge branch 'cc/apply-am'
"git am" has been taught to make an internal call to "git apply"'s
innards without spawning the latter as a separate process.

* cc/apply-am: (41 commits)
  builtin/am: use apply API in run_apply()
  apply: learn to use a different index file
  apply: pass apply state to build_fake_ancestor()
  apply: refactor `git apply` option parsing
  apply: change error_routine when silent
  usage: add get_error_routine() and get_warn_routine()
  usage: add set_warn_routine()
  apply: don't print on stdout in verbosity_silent mode
  apply: make it possible to silently apply
  apply: use error_errno() where possible
  apply: make some parsing functions static again
  apply: move libified code from builtin/apply.c to apply.{c,h}
  apply: rename and move opt constants to apply.h
  builtin/apply: rename option parsing functions
  builtin/apply: make create_one_file() return -1 on error
  builtin/apply: make try_create_file() return -1 on error
  builtin/apply: make write_out_results() return -1 on error
  builtin/apply: make write_out_one_result() return -1 on error
  builtin/apply: make create_file() return -1 on error
  builtin/apply: make add_index_file() return -1 on error
  ...
2016-09-19 13:47:18 -07:00
Vasco Almeida
f2b93b388c i18n: connect: mark die messages for translation
Mark messages passed to die() in die_initial_contact().

Update test to reflect changes.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:55:36 -07:00
Vasco Almeida
4fa4b31507 i18n: commit: mark message for translation
Mark message commit_utf8_warn for translation.

Update tests to reflect changes.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:55:36 -07:00
René Scharfe
c99ad274b1 pretty: let %C(auto) reset all attributes
Reset colors and attributes upon %C(auto) to enable full automatic
control over them; otherwise attributes like bold or reverse could
still be in effect from previous %C placeholders.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:50:32 -07:00
Michael Haggerty
5b162879e9 blame: honor the diff heuristic options and config
Teach "git blame" and "git annotate" the --compaction-heuristic and
--indent-heuristic options that are now supported by "git diff".

Also teach them to honor the `diff.compactionHeuristic` and
`diff.indentHeuristic` configuration options.

It would be conceivable to introduce separate configuration options for
"blame" and "annotate"; for example `blame.compactionHeuristic` and
`blame.indentHeuristic`. But it would be confusing to users if blame
output is inconsistent with diff output, so it makes more sense for them
to respect the same configuration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00
Michael Haggerty
433860f3d0 diff: improve positioning of add/delete blocks in diffs
Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
     }

     if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
    +if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {
                            $smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

    --- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
    +++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
    @@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
            $initial_reply_to =~ s/(^\s+|\s+$)//g;
     }

    +if (!$smtp_server) {
    +       $smtp_server = $repo->config('sendemail.smtpserver');
    +}
     if (!$smtp_server) {
            foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
                    if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository            | count |      Git 2.9.0 |     compaction | compaction-fixed |       indent-1 |       indent-2 |
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| afnetworking          |   109 |    89  (81.7%) |    37  (33.9%) |      37  (33.9%) |     2   (1.8%) |     2   (1.8%) |
| alamofire             |    30 |    18  (60.0%) |    14  (46.7%) |      15  (50.0%) |     0   (0.0%) |     0   (0.0%) |
| angular               |   184 |   127  (69.0%) |    39  (21.2%) |      23  (12.5%) |     5   (2.7%) |     5   (2.7%) |
| animate               |   313 |     2   (0.6%) |     2   (0.6%) |       2   (0.6%) |     2   (0.6%) |     2   (0.6%) |
| ant                   |   380 |   356  (93.7%) |   152  (40.0%) |     148  (38.9%) |    15   (3.9%) |    15   (3.9%) | *
| bugzilla              |   306 |   263  (85.9%) |   109  (35.6%) |      99  (32.4%) |    14   (4.6%) |    15   (4.9%) | *
| corefx                |   126 |    91  (72.2%) |    22  (17.5%) |      21  (16.7%) |     6   (4.8%) |     6   (4.8%) |
| couchdb               |    78 |    44  (56.4%) |    26  (33.3%) |      28  (35.9%) |     6   (7.7%) |     6   (7.7%) | *
| cpython               |   937 |   158  (16.9%) |    50   (5.3%) |      49   (5.2%) |     5   (0.5%) |     5   (0.5%) | *
| discourse             |   160 |    95  (59.4%) |    42  (26.2%) |      36  (22.5%) |    18  (11.2%) |    13   (8.1%) |
| docker                |   307 |   194  (63.2%) |   198  (64.5%) |     253  (82.4%) |     8   (2.6%) |     8   (2.6%) | *
| electron              |   163 |   132  (81.0%) |    38  (23.3%) |      39  (23.9%) |     6   (3.7%) |     6   (3.7%) |
| git                   |   536 |   470  (87.7%) |    73  (13.6%) |      78  (14.6%) |    16   (3.0%) |    16   (3.0%) | *
| gitflow               |   127 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| ionic                 |   133 |    89  (66.9%) |    29  (21.8%) |      38  (28.6%) |     1   (0.8%) |     1   (0.8%) |
| ipython               |   482 |   362  (75.1%) |   167  (34.6%) |     169  (35.1%) |    11   (2.3%) |    11   (2.3%) | *
| junit                 |   161 |   147  (91.3%) |    67  (41.6%) |      66  (41.0%) |     1   (0.6%) |     1   (0.6%) | *
| lighttable            |    15 |     5  (33.3%) |     0   (0.0%) |       2  (13.3%) |     0   (0.0%) |     0   (0.0%) |
| magit                 |    88 |    75  (85.2%) |    11  (12.5%) |       9  (10.2%) |     1   (1.1%) |     0   (0.0%) |
| neural-style          |    28 |     0   (0.0%) |     0   (0.0%) |       0   (0.0%) |     0   (0.0%) |     0   (0.0%) |
| nodejs                |   781 |   649  (83.1%) |   118  (15.1%) |     111  (14.2%) |     4   (0.5%) |     5   (0.6%) | *
| phpmyadmin            |   491 |   481  (98.0%) |    75  (15.3%) |      48   (9.8%) |     2   (0.4%) |     2   (0.4%) | *
| react-native          |   168 |   130  (77.4%) |    79  (47.0%) |      81  (48.2%) |     0   (0.0%) |     0   (0.0%) |
| rust                  |   171 |   128  (74.9%) |    30  (17.5%) |      27  (15.8%) |    16   (9.4%) |    14   (8.2%) |
| spark                 |   186 |   149  (80.1%) |    52  (28.0%) |      52  (28.0%) |     2   (1.1%) |     2   (1.1%) |
| tensorflow            |   115 |    66  (57.4%) |    48  (41.7%) |      48  (41.7%) |     5   (4.3%) |     5   (4.3%) |
| test-more             |    19 |    15  (78.9%) |     2  (10.5%) |       2  (10.5%) |     1   (5.3%) |     1   (5.3%) | *
| test-unit             |    51 |    34  (66.7%) |    14  (27.5%) |       8  (15.7%) |     2   (3.9%) |     2   (3.9%) | *
| xmonad                |    23 |    22  (95.7%) |     2   (8.7%) |       2   (8.7%) |     1   (4.3%) |     1   (4.3%) | *
| --------------------- | ----- | -------------- | -------------- | ---------------- | -------------- | -------------- |
| totals                |  6668 |  4391  (65.9%) |  1496  (22.4%) |    1491  (22.4%) |   150   (2.2%) |   144   (2.2%) |
| totals (training set) |  4552 |  3195  (70.2%) |  1053  (23.1%) |    1061  (23.3%) |    86   (1.9%) |    88   (1.9%) |
| totals (test set)     |  2116 |  1196  (56.5%) |   443  (20.9%) |     430  (20.3%) |    64   (3.0%) |    56   (2.6%) |

In this table, the numbers are the count and percentage of human-rated
sliders that the corresponding algorithm got *wrong*. The columns are

* "repository" - the name of the repository used. I used the diffs
  between successive non-merge commits on the HEAD branch of the
  corresponding repository.

* "count" - the number of sliders that were human-rated. I chose most,
  but not all, sliders to rate from those among which the various
  algorithms gave different answers.

* "Git 2.9.0" - the default algorithm used by `git diff` in Git 2.9.0.

* "compaction" - the heuristic used by `git diff --compaction-heuristic`
  in Git 2.9.0.

* "compaction-fixed" - the heuristic used by `git diff
  --compaction-heuristic` after the fixes from earlier in this patch
  series. Note that the results are not dramatically different than
  those for "compaction". Both produce non-ideal diffs only about 1/3 as
  often as the default `git diff`.

* "indent-1" - the new `--indent-heuristic` algorithm, using the first
  set of weighting factors, determined as described above.

* "indent-2" - the new `--indent-heuristic` algorithm, using the final
  set of weighting factors, determined as described below.

* `*` - indicates that repo was part of training set used to determine
  the first set of weighting factors.

The fact that the heuristic performed nearly as well on the test set as
on the training set in column "indent-1" is a good indication that the
heuristic was not over-trained. Given that fact, I ran a second round of
optimization, using the entire corpus as the training set. The resulting
set of weights gave the results in column "indent-2". These are the
weights included in this patch.

The final result gives consistently and significantly better results
across the whole corpus than either `git diff` or `git diff
--compaction-heuristic`. It makes only about 1/30 as many errors as the
former and about 1/10 as many errors as the latter. (And a good fraction
of the remaining errors are for diffs that involve weirdly-formatted
code, sometimes apparently machine-generated.)

The tools that were used to do this optimization and analysis, along
with the human-generated data values, are recorded in a separate project
[1].

This patch adds a new command-line option `--indent-heuristic`, and a
new configuration setting `diff.indentHeuristic`, that activate this
heuristic. This interface is only meant for testing purposes, and should
be finalized before including this change in any release.

[1] https://github.com/mhagger/diff-slider-tools

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-19 10:25:11 -07:00