Commit Graph

1456 Commits

Author SHA1 Message Date
Junio C Hamano
d845d727cb Merge branch 'jk/setup-sequence-update'
There were numerous corner cases in which the configuration files
are read and used or not read at all depending on the directory a
Git command was run, leading to inconsistent behaviour.  The code
to set-up repository access at the beginning of a Git process has
been updated to fix them.

* jk/setup-sequence-update:
  t1007: factor out repeated setup
  init: reset cached config when entering new repo
  init: expand comments explaining config trickery
  config: only read .git/config from configured repos
  test-config: setup git directory
  t1302: use "git -C"
  pager: handle early config
  pager: use callbacks instead of configset
  pager: make pager_program a file-local static
  pager: stop loading git_default_config()
  pager: remove obsolete comment
  diff: always try to set up the repository
  diff: handle --no-index prefixes consistently
  diff: skip implicit no-index check when given --no-index
  patch-id: use RUN_SETUP_GENTLY
  hash-object: always try to set up the git repository
2016-09-21 15:15:24 -07:00
Junio C Hamano
4af9a7d344 Merge branch 'bc/object-id'
The "unsigned char sha1[20]" to "struct object_id" conversion
continues.  Notable changes in this round includes that ce->sha1,
i.e. the object name recorded in the cache_entry, turns into an
object_id.

It had merge conflicts with a few topics in flight (Christian's
"apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
"status --porcelain-v2").  Extra sets of eyes double-checking for
mismerges are highly appreciated.

* bc/object-id:
  builtin/reset: convert to use struct object_id
  builtin/commit-tree: convert to struct object_id
  builtin/am: convert to struct object_id
  refs: add an update_ref_oid function.
  sha1_name: convert get_sha1_mb to struct object_id
  builtin/update-index: convert file to struct object_id
  notes: convert init_notes to use struct object_id
  builtin/rm: convert to use struct object_id
  builtin/blame: convert file to use struct object_id
  Convert read_mmblob to take struct object_id.
  notes-merge: convert struct notes_merge_pair to struct object_id
  builtin/checkout: convert some static functions to struct object_id
  streaming: make stream_blob_to_fd take struct object_id
  builtin: convert textconv_object to use struct object_id
  builtin/cat-file: convert some static functions to struct object_id
  builtin/cat-file: convert struct expand_data to use struct object_id
  builtin/log: convert some static functions to use struct object_id
  builtin/blame: convert struct origin to use struct object_id
  builtin/apply: convert static functions to struct object_id
  cache: convert struct cache_entry to use struct object_id
2016-09-19 13:47:19 -07:00
Jeff King
4543926ba8 init: reset cached config when entering new repo
After we copy the templates into place, we re-read the
config in case we copied in a default config file. But since
git_config() is backed by a cache these days, it's possible
that the call will not actually touch the filesystem at all;
we need to tell it that something has changed behind the
scenes.

Note that we also need to reset the shared_repository
config. At first glance, it seems like this should probably
just be folded into git_config_clear(). But unfortunately
that is not quite right. The shared repository value may
come from config, _or_ it may have been set manually. So
only the caller who knows whether or not they set it is the
one who can clear it (and indeed, if you _do_ put it into
git_config_clear(), then many tests fail, as we have to
clear the config cache any time we set a new config
variable).

There are three tests here. The first two actually pass
already, though it's largely luck: they just don't happen to
actually read any config before we enter the new repo.

But the third one does fail without this patch; we look at
core.sharedrepository while creating the directory, but need
to make sure the value from the template config overrides
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King
b9605bc4f2 config: only read .git/config from configured repos
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config".  This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:

  - it's not sufficient, and therefore hides bugs in a
    confusing way. Config will be respected if commands are
    run from the top-level of the working tree, but not from
    a subdirectory.

  - it's not always true that we haven't set up the
    repository _yet_; we may not want to do it at all. For
    instance, if you run "git init /some/path" from inside
    another repository, it should not load config from the
    existing repository.

  - there might be a path ".git/config", but it is not the
    actual repository we would find via setup_git_directory().
    This may happen, e.g., if you are storing a git
    repository inside another git repository, but have
    munged one of the files in such a way that the
    inner repository is not valid (e.g., by removing HEAD).

We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any).  This shows up in two ways:

  1. If we happen to be in an existing repository directory,
     we'll read and respect core.sharedrepository from it,
     even though it should have no bearing on the new
     repository. A new test in t1301 covers this.

  2. Similarly, if we're in an existing repo that sets
     core.logallrefupdates, that will cause init to fail to
     set it in a newly created repository (because it thinks
     that the user's templates already did so). A new test
     in t0001 covers this.

We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.

That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.

After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice.  But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Jeff King
c0c08897c4 pager: make pager_program a file-local static
This variable is only ever used by the routines in pager.c,
and other parts of the code should always use those routines
(like git_pager()) to make decisions about which pager to
use. Let's reduce its scope to prevent accidents.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 15:45:45 -07:00
Junio C Hamano
e4ec05ed93 Merge branch 'rs/hex2chr'
* rs/hex2chr:
  introduce hex2chr() for converting two hexadecimal digits to a character
2016-09-12 15:34:36 -07:00
Junio C Hamano
305d7f1339 Merge branch 'jk/diff-submodule-diff-inline'
The "git diff --submodule={short,log}" mechanism has been enhanced
to allow "--submodule=diff" to show the patch between the submodule
commits bound to the superproject.

* jk/diff-submodule-diff-inline:
  diff: teach diff to display submodule difference with an inline diff
  submodule: refactor show_submodule_summary with helper function
  submodule: convert show_submodule_summary to use struct object_id *
  allow do_submodule_path to work even if submodule isn't checked out
  diff: prepare for additional submodule formats
  graph: add support for --line-prefix on all graph-aware output
  diff.c: remove output_prefix_length field
  cache: add empty_tree_oid object and helper function
2016-09-12 15:34:31 -07:00
Junio C Hamano
02c6c14d6c Merge branch 'sb/submodule-clone-rr'
"git clone --resurse-submodules --reference $path $URL" is a way to
reduce network transfer cost by borrowing objects in an existing
$path repository when cloning the superproject from $URL; it
learned to also peek into $path for presense of corresponding
repositories of submodules and borrow objects from there when able.

* sb/submodule-clone-rr:
  clone: recursive and reference option triggers submodule alternates
  clone: implement optional references
  clone: clarify option_reference as required
  clone: factor out checking for an alternate path
  submodule--helper update-clone: allow multiple references
  submodule--helper module-clone: allow multiple references
  t7408: merge short tests, factor out testing method
  t7408: modernize style
2016-09-08 21:49:50 -07:00
brian m. carlson
151b2911c1 sha1_name: convert get_sha1_mb to struct object_id
All of the callers of this function use struct object_id, so rename it
to get_oid_mb and make it take struct object_id instead of
unsigned char *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:43 -07:00
brian m. carlson
99d1a9861a cache: convert struct cache_entry to use struct object_id
Convert struct cache_entry to use struct object_id by applying the
following semantic patch and the object_id transforms from contrib, plus
the actual change to the struct:

@@
struct cache_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct cache_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 12:59:42 -07:00
René Scharfe
d23309733a introduce hex2chr() for converting two hexadecimal digits to a character
Add and use a helper function that decodes the char value of two
hexadecimal digits.  It returns a negative number on error, avoids
running over the end of the given string and doesn't shift negative
values.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-07 10:42:46 -07:00
Jacob Keller
99b43a61f2 allow do_submodule_path to work even if submodule isn't checked out
Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on <path>/.git. If this fails it just assumes the
<path>/.git is actually a git directory.

This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the superproject.

Unfortunately this fails if the <path> is not actually checked out any
longer, such as by removing the directory.

Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ directory of
the superproject.

If we can't locate the submodule configuration, this might occur because
for example a submodule gitlink was added but the corresponding
.gitmodules file was not properly updated.  A die() here would not be
pleasant to the users of submodule diff formats, so instead, modify
do_submodule_path() to return an error code:

 - git_pathdup_submodule() returns NULL when we fail to find a path.
 - strbuf_git_path_submodule() propagates the error code to the caller.

Modify the callers of these functions to check the error code and fail
properly. This ensures we don't attempt to use a bad path that doesn't
match the corresponding submodule.

Because this change fixes add_submodule_odb() to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"

Add tests to ensure this change works as expected.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:10 -07:00
Jacob Keller
8576fde6cb cache: add empty_tree_oid object and helper function
Similar to is_null_oid(), and is_empty_blob_sha1() add an
empty_tree_oid along with helper function is_empty_tree_oid(). For
completeness, also add an "is_empty_tree_sha1()",
"is_empty_blob_sha1()", "is_empty_tree_oid()" and "is_empty_blob_oid()"
helpers.

To ensure we only get one singleton, implement EMPTY_BLOB_SHA1_BIN as
simply getting the hash of empty_blob_oid structure.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-31 18:07:08 -07:00
Stefan Beller
9eeea7d2bc clone: factor out checking for an alternate path
In a later patch we want to determine if a path is suitable as an
alternate from other commands than builtin/clone. Move the checking
functionality of `add_one_reference` to `compute_alternate_path` that is
defined in cache.h.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-15 15:28:01 -07:00
Junio C Hamano
6d4960ac7d Merge branch 'jk/trace-fixup'
Various small fixups to the "GIT_TRACE" facility.

* jk/trace-fixup:
  trace: do not fall back to stderr
  write_or_die: drop write_or_whine_pipe()
  trace: disable key after write error
  trace: correct variable name in write() error message
  trace: cosmetic fixes for error messages
  trace: use warning() for printing trace errors
  trace: stop using write_or_whine_pipe()
  trace: handle NULL argument in trace_disable()
2016-08-12 09:47:36 -07:00
Junio C Hamano
f4fd627661 Merge branch 'jk/reset-ident-time-per-commit' into maint
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-12 09:16:56 -07:00
Junio C Hamano
24fbe00490 Merge branch 'jk/reset-ident-time-per-commit'
Not-so-recent rewrite of "git am" that started making internal
calls into the commit machinery had an unintended regression, in
that no matter how many seconds it took to apply many patches, the
resulting committer timestamp for the resulting commits were all
the same.

* jk/reset-ident-time-per-commit:
  am: reset cached ident date for each patch
2016-08-10 12:33:17 -07:00
Junio C Hamano
78849622ec Merge branch 'jk/pack-objects-optim'
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs.  The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.

* jk/pack-objects-optim:
  pack-objects: compute local/ignore_pack_keep early
  pack-objects: break out of want_object loop early
  find_pack_entry: replace last_found_pack with MRU cache
  add generic most-recently-used list
  sha1_file: drop free_pack_by_name
  t/perf: add tests for many-pack scenarios
2016-08-08 14:48:39 -07:00
Junio C Hamano
768ededa9c Merge branch 'va/i18n'
More i18n marking.

* va/i18n:
  i18n: config: unfold error messages marked for translation
  i18n: notes: mark comment for translation
2016-08-08 14:48:38 -07:00
Junio C Hamano
0d3279962a Merge branch 'jk/reflog-date'
The reflog output format is documented better, and a new format
--date=unix to report the seconds-since-epoch (without timezone)
has been added.

* jk/reflog-date:
  date: clarify --date=raw description
  date: add "unix" format
  date: document and test "raw-local" mode
  doc/pretty-formats: explain shortening of %gd
  doc/pretty-formats: describe index/time formats for %gd
  doc/rev-list-options: explain "-g" output formats
  doc/rev-list-options: clarify "commit@{Nth}" for "-g" option
2016-08-08 14:48:37 -07:00
Junio C Hamano
1e274ef2ba Merge branch 'jk/send-pack-stdio' into maint
Code clean-up.

* jk/send-pack-stdio:
  write_or_die: remove the unused write_or_whine() function
  send-pack: use buffered I/O to talk to pack-objects
2016-08-08 14:21:39 -07:00
Junio C Hamano
aa9136a87e Merge branch 'nd/pack-ofs-4gb-limit' into maint
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-08-08 14:21:36 -07:00
Jeff King
ca5c701ca5 write_or_die: drop write_or_whine_pipe()
This function has no callers, and is not likely to gain any
because it's confusing to use.

It unconditionally complains to stderr, but _doesn't_ die.
Yet any caller which wants a "gentle" write would generally
want to suppress the error message, because presumably
they're going to write a better one, and/or try the
operation again.

And the check_pipe() call leads to confusing behaviors. It
means we die for EPIPE, but not for other errors, which is
confusing and pointless.

On top of all that, it has unusual error return semantics,
which makes it easy for callers to get it wrong.

Let's drop the function, and if somebody ever needs to
resurrect something like it, they can fix these warts.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-05 09:28:17 -07:00
Jeff King
4d9c7e6f45 am: reset cached ident date for each patch
When we compute the date to go in author/committer lines of
commits, or tagger lines of tags, we get the current date
once and then cache it for the rest of the program.  This is
a good thing in some cases, like "git commit", because it
means we do not racily assign different times to the
author/committer fields of a single commit object.

But as more programs start to make many commits in a single
process (e.g., the recently builtin "git am"), it means that
you'll get long strings of commits with identical committer
timestamps (whereas before, we invoked "git commit" many
times and got true timestamps).

This patch addresses it by letting callers reset the cached
time, which means they'll get a fresh time on their next
call to git_committer_info() or git_author_info(). The first
caller to do so is "git am", which resets the time for each
patch it applies.

It would be nice if we could just do this automatically
before filling in the ident fields of commit and tag
objects. Unfortunately, it's hard to know where a particular
logical operation begins and ends.

For instance, if commit_tree_extended() were to call
reset_ident_date() before getting the committer/author
ident, that doesn't quite work; sometimes the author info is
passed in to us as a parameter, and it may or may not have
come from a previous call to ident_default_date(). So in
those cases, we lose the property that the committer and the
author timestamp always match.

You could similarly put a date-reset at the end of
commit_tree_extended(). That actually works in the current
code base, but it's fragile. It makes the assumption that
after commit_tree_extended() finishes, the caller has no
other operations that would logically want to fall into the
same timestamp.

So instead we provide the tool to easily do the reset, and
let the high-level callers use it to annotate their own
logical operations.

There's no automated test, because it would be inherently
racy (it depends on whether the program takes multiple
seconds to run). But you can see the effect with something
like:

  # make a fake 100-patch series
  top=$(git rev-parse HEAD)
  bottom=$(git rev-list --first-parent -100 HEAD | tail -n 1)
  git log --format=email --reverse --first-parent \
          --binary -m -p $bottom..$top >patch

  # now apply it; this presumably takes multiple seconds
  git checkout --detach $bottom
  git am <patch

  # now count the number of distinct committer times;
  # prior to this patch, there would only be one, but
  # now we'd typically see several.
  git log --format=%ct $bottom.. | sort -u

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Helped-by: Paul Tan <pyokagan@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-01 14:49:41 -07:00
Jeff King
a73cdd21c4 find_pack_entry: replace last_found_pack with MRU cache
Each pack has an index for looking up entries in O(log n)
time, but if we have multiple packs, we have to scan through
them linearly. This can produce a measurable overhead for
some operations.

We dealt with this long ago in f7c22cc (always start looking
up objects in the last used pack first, 2007-05-30), which
keeps what is essentially a 1-element most-recently-used
cache. In theory, we should be able to do better by keeping
a similar but longer cache, that is the same length as the
pack-list itself.

Since we now have a convenient generic MRU structure, we can
plug it in and measure. Here are the numbers for running
p5303 against linux.git:

Test                      HEAD^                HEAD
------------------------------------------------------------------------
5303.3: rev-list (1)      31.56(31.28+0.27)    31.30(31.08+0.20) -0.8%
5303.4: repack (1)        40.62(39.35+2.36)    40.60(39.27+2.44) -0.0%
5303.6: rev-list (50)     31.31(31.06+0.23)    31.23(31.00+0.22) -0.3%
5303.7: repack (50)       58.65(69.12+1.94)    58.27(68.64+2.05) -0.6%
5303.9: rev-list (1000)   38.74(38.40+0.33)    31.87(31.62+0.24) -17.7%
5303.10: repack (1000)    367.20(441.80+4.62)  342.00(414.04+3.72) -6.9%

The main numbers of interest here are the rev-list ones
(since that is exercising the normal object lookup code
path).  The single-pack case shouldn't improve at all; the
260ms speedup there is just part of the run-to-run noise
(but it's important to note that we didn't make anything
worse with the overhead of maintaining our cache). In the
50-pack case, we see similar results. There may be a slight
improvement, but it's mostly within the noise.

The 1000-pack case does show a big improvement, though. That
carries over to the repack case, as well. Even though we
haven't touched its pack-search loop yet, it does still do a
lot of normal object lookups (e.g., for the internal
revision walk), and so improves.

As a point of reference, I also ran the 1000-pack test
against a version of HEAD^ with the last_found_pack
optimization disabled. It takes ~60s, so that gives an
indication of how much even the single-element cache is
helping.

For comparison, here's a smaller repository, git.git:

Test                      HEAD^               HEAD
---------------------------------------------------------------------
5303.3: rev-list (1)      1.56(1.54+0.01)    1.54(1.51+0.02) -1.3%
5303.4: repack (1)        1.84(1.80+0.10)    1.82(1.80+0.09) -1.1%
5303.6: rev-list (50)     1.58(1.55+0.02)    1.59(1.57+0.01) +0.6%
5303.7: repack (50)       2.50(3.18+0.04)    2.50(3.14+0.04) +0.0%
5303.9: rev-list (1000)   2.76(2.71+0.04)    2.24(2.21+0.02) -18.8%
5303.10: repack (1000)    13.21(19.56+0.25)  11.66(18.01+0.21) -11.7%

You can see that the percentage improvement is similar.
That's because the lookup we are optimizing is roughly
O(nr_objects * nr_packs). Since the number of packs is
constant in both tests, we'd expect the improvement to be
linear in the number of objects. But the whole process is
also linear in the number of objects, so the improvement
is a constant factor.

The exact improvement does also depend on the contents of
the packs. In p5303, the extra packs all have 5 first-parent
commits in them, which is a reasonable simulation of a
pushed-to repository. But it also means that only 250
first-parent commits are in those packs (compared to almost
50,000 total in linux.git), and the rest are in the huge
"base" pack. So once we start looking at history in taht big
pack, that's where we'll find most everything, and even the
1-element cache gets close to 100% cache hits.  You could
almost certainly show better numbers with a more
pathological case (e.g., distributing the objects more
evenly across the packs). But that's simply not that
realistic a scenario, so it makes more sense to focus on
these numbers.

The implementation itself is a straightforward application
of the MRU code. We provide an MRU-ordered list of packs
that shadows the packed_git list. This is easy to do because
we only create and revise the pack list in one place. The
"reprepare" code path actually drops the whole MRU and
replaces it for simplicity. It would be more efficient to
just add new entries, but there's not much point in
optimizing here; repreparing happens rarely, and only after
doing a lot of other expensive work.  The key things to keep
optimized are traversal (which is just a normal linked list,
albeit with one extra level of indirection over the regular
packed_git list), and marking (which is a constant number of
pointer assignments, though slightly more than the old
last_found_pack was; it doesn't seem to create a measurable
slowdown, though).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:07 -07:00
Jeff King
3157c880f6 sha1_file: drop free_pack_by_name
The point of this function is to drop an entry from the
"packed_git" cache that points to a file we might be
overwriting, because our contents may not be the same (and
hence the only caller was pack-objects as it moved a
temporary packfile into place).

In older versions of git, this could happen because the
names of packfiles were derived from the set of objects they
contained, not the actual bits on disk. But since 1190a1a
(pack-objects: name pack files after trailer hash,
2013-12-05), the name reflects the actual bits on disk, and
any two packfiles with the same name can be used
interchangeably.

Dropping this function not only saves a few lines of code,
it makes the lifetime of "struct packed_git" much easier to
reason about: namely, we now do not ever free these structs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:06 -07:00
Junio C Hamano
ad2d777604 Merge branch 'nd/pack-ofs-4gb-limit'
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-07-28 10:34:42 -07:00
Vasco Almeida
1b8132d99d i18n: config: unfold error messages marked for translation
Introduced in 473166b ("config: add 'origin_type' to config_source
struct", 2016-02-19), Git can inform the user about the origin of a
config error, but the implementation does not allow translators to
translate the keywords 'file', 'blob, 'standard input', and
'submodule-blob'. Moreover, for the second message, a reason for the
error is appended to the message, not allowing translators to translate
that reason either.

Unfold the message into several templates for each known origin_type.
That would result in better translation at the expense of code
verbosity.

Add enum config_oringin_type to ease management of the various
configuration origin types (blob, file, etc).  Previously origin type
was considered from command line if cf->origin_type == NULL, i.e.,
uninitialized. Now we set origin_type to CONFIG_ORIGIN_CMDLINE in
git_config_from_parameters() and configset_add_value().

For error message in git_parse_source(), use xstrfmt() function to
prepare the message string, instead of doing something like it's done
for die_bad_number(), because intelligibility and code conciseness are
improved for that instance.

Signed-off-by: Vasco Almeida <vascomalmeida@sapo.pt>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-28 09:11:09 -07:00
Jeff King
642833db78 date: add "unix" format
We already have "--date=raw", which is a Unix epoch
timestamp plus a contextual timezone (either the author's or
the local). But one may not care about the timezone and just
want the epoch timestamp by itself. It's not hard to parse
the two apart, but if you are using a pretty-print format,
you may want git to show the "finished" form that the user
will see.

We can accomodate this by adding a new date format, "unix",
which is basically "raw" without the timezone.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-27 14:15:51 -07:00
Junio C Hamano
21bed620cd Merge branch 'jc/renormalize-merge-kill-safer-crlf'
"git merge" with renormalization did not work well with
merge-recursive, due to "safer crlf" conversion kicking in when it
shouldn't.

* jc/renormalize-merge-kill-safer-crlf:
  merge: avoid "safer crlf" during recording of merge results
  convert: unify the "auto" handling of CRLF
2016-07-25 14:13:39 -07:00
Junio C Hamano
6b34ce90a7 Merge branch 'mh/split-under-lock'
Further preparatory work on the refs API before the pluggable
backend series can land.

* mh/split-under-lock: (33 commits)
  lock_ref_sha1_basic(): only handle REF_NODEREF mode
  commit_ref_update(): remove the flags parameter
  lock_ref_for_update(): don't resolve symrefs
  lock_ref_for_update(): don't re-read non-symbolic references
  refs: resolve symbolic refs first
  ref_transaction_update(): check refname_is_safe() at a minimum
  unlock_ref(): move definition higher in the file
  lock_ref_for_update(): new function
  add_update(): initialize the whole ref_update
  verify_refname_available(): adjust constness in declaration
  refs: don't dereference on rename
  refs: allow log-only updates
  delete_branches(): use resolve_refdup()
  ref_transaction_commit(): correctly report close_ref() failure
  ref_transaction_create(): disallow recursive pruning
  refs: make error messages more consistent
  lock_ref_sha1_basic(): remove unneeded local variable
  read_raw_ref(): move docstring to header file
  read_raw_ref(): improve docstring
  read_raw_ref(): rename symref argument to referent
  ...
2016-07-25 14:13:32 -07:00
Junio C Hamano
2b6456b808 Merge branch 'jk/write-file'
General code clean-up around a helper function to write a
single-liner to a file.

* jk/write-file:
  branch: use write_file_buf instead of write_file
  use write_file_buf where applicable
  write_file: add format attribute
  write_file: add pointer+len variant
  write_file: use xopen
  write_file: drop "gently" form
  branch: use non-gentle write_file for branch description
  am: ignore return value of write_file()
  config: fix bogus fd check when setting up default config
2016-07-19 13:22:23 -07:00
Junio C Hamano
a63d31b4d3 Merge branch 'bc/cocci'
Conversion from unsigned char sha1[20] to struct object_id
continues.

* bc/cocci:
  diff: convert prep_temp_blob() to struct object_id
  merge-recursive: convert merge_recursive_generic() to object_id
  merge-recursive: convert leaf functions to use struct object_id
  merge-recursive: convert struct merge_file_info to object_id
  merge-recursive: convert struct stage_data to use object_id
  diff: rename struct diff_filespec's sha1_valid member
  diff: convert struct diff_filespec to struct object_id
  coccinelle: apply object_id Coccinelle transformations
  coccinelle: convert hashcpy() with null_sha1 to hashclr()
  contrib/coccinelle: add basic Coccinelle transforms
  hex: add oid_to_hex_r()
2016-07-19 13:22:16 -07:00
Nguyễn Thái Ngọc Duy
166df26f28 sha1_file.c: use type off_t* for object_info->disk_sizep
This field, filled by sha1_object_info() contains the on-disk size of
an object, which could go over 4GB limit of unsigned long on 32-bit
systems. Use off_t for it instead and update all callers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-13 09:14:20 -07:00
Junio C Hamano
1335d76e45 merge: avoid "safer crlf" during recording of merge results
When merge_recursive() decides what the correct blob object merge
result for a path should be, it uses update_file_flags() helper
function to write it out to a working tree file and then calls
add_cacheinfo().  The add_cacheinfo() function in turn calls
make_cache_entry() to create a new cache entry to replace the
higher-stage entries for the path that represents the conflict.

The make_cache_entry() function calls refresh_cache_entry() to fill
in the cached stat information.  To mark a cache entry as
up-to-date, the data is re-read from the file in the working tree,
and goes through convert_to_git() conversion to be compared with the
blob object name the new cache entry records.

It is important to note that this happens while the higher-stage
entries, which are going to be replaced with the new entry, are
still in the index.  Unfortunately, the convert_to_git() conversion
has a misguided "safer crlf" mechanism baked in, and looks at the
existing cache entry for the path to decide how to convert the
contents in the working tree file.  If our side (i.e. stage#2)
records a text blob with CRLF in it, even when the system is
configured to record LF in blobs and convert them to CRLF upon
checkout (and back to LF upon checkin), the "safer crlf" mechanism
stops us doing so.

This especially poses a problem during a renormalizing merge, where
the merge result for the path is computed by first "normalizing" the
blobs involved in the merge by using convert_to_working_tree()
followed by convert_to_git() with "safer crlf" disabled.  The merge
result that is computed correctly and fed to add_cacheinfo() via
update_file_flags() does _not_ match what refresh_cache_entry() sees
by converting the working tree file via convert_to_git().

We can work this around by not refreshing the new cache entry in
make_cache_entry() called by add_cacheinfo().  After add_cacheinfo()
adds the new entry, we can call refresh_cache_entry() on that,
knowing that addition of this new cache entry would have removed the
stale cache entries that had CRLF in stage #2 that were carried over
before the renormalizing merge started and will not interfere with
the correct recording of the result.

The test update was taken from a series by Torsten Bögershausen
that attempted to fix this with a different approach.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reviewed-by: Torsten Bögershausen <tboegi@web.de>
2016-07-12 13:06:43 -07:00
Jeff King
e04d08a4b3 write_file: add format attribute
This gives us compile-time checking of our format strings,
which is a good thing.

I had also hoped it would help with confusing write_file()
and write_file_buf(), since the former's "..." can make it
match the signature of the latter. But given that the buffer
for write_file_buf() is generally not a string literal, the
compiler won't complain unless -Wformat-nonliteral is on,
and that creates a ton of false positives elsewhere in the
code base.

While we're there, let's also give the function a docstring,
which it never had.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-08 09:47:29 -07:00
Jeff King
52563d7ecc write_file: add pointer+len variant
There are many callsites which could use write_file, but for
which it is a little awkward because they have a strbuf or
other pointer/len combo. Specifically:

 1. write_file() takes a format string, so we have to use
    "%s" or "%.*s", which are ugly.

 2. Using any form of "%s" does not handle embedded NULs in
    the output. That probably doesn't matter for our
    call-sites, but it's nicer not to have to worry.

 3. It's less efficient; we format into another strbuf
    just to do the write. That's probably not measurably
    slow for our uses, but it's simply inelegant.

We can fix this by providing a helper to write out the
formatted buffer, and just calling it from write_file().

Note that we don't do the usual "complete with a newline"
that write_file does. If the caller has their own buffer,
there's a reasonable chance they're doing something more
complicated than a single line, and they can call
strbuf_complete_line() themselves.

We could go even further and add strbuf_write_file(), but it
doesn't save much:

  -  write_file_buf(path, sb.buf, sb.len);
  +  strbuf_write_file(&sb, path);

It would also be somewhat asymmetric with strbuf_read_file,
which actually returns errors rather than dying (and the
error handling is most of the benefit of write_file() in the
first place).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-08 09:47:29 -07:00
Jeff King
ef22318cff write_file: drop "gently" form
There are no callers left of write_file_gently(). Let's drop
it, as it doesn't seem likely for new callers to be added
(since its inception, the only callers who wanted the gentle
form generally just died immediately themselves, and have
since been converted).

While we're there, let's also drop the "int" return from
write_file, as it is never meaningful (in the non-gentle
form, we always either die or return 0).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-08 09:47:29 -07:00
Junio C Hamano
1e4bf90789 Merge branch 'jk/upload-pack-hook'
"upload-pack" allows a custom "git pack-objects" replacement when
responding to "fetch/clone" via the uploadpack.packObjectsHook.

* jk/upload-pack-hook:
  upload-pack: provide a hook for running pack-objects
  t1308: do not get fooled by symbolic links to the source tree
  config: add a notion of "scope"
  config: return configset value for current_config_ functions
  config: set up config_source for command-line config
  git_config_parse_parameter: refactor cleanup code
  git_config_with_options: drop "found" counting
2016-07-06 13:38:11 -07:00
Junio C Hamano
7e58b8166e Merge branch 'jk/send-pack-stdio'
Code clean-up.

* jk/send-pack-stdio:
  write_or_die: remove the unused write_or_whine() function
  send-pack: use buffered I/O to talk to pack-objects
2016-07-06 13:38:07 -07:00
brian m. carlson
55c529a700 hex: add oid_to_hex_r()
This function works just like sha1_to_hex_r, except that it takes a
pointer to struct object_id instead of a pointer to unsigned char.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-28 11:39:02 -07:00
Ramsay Jones
b333d0d6f4 write_or_die: remove the unused write_or_whine() function
Now the last caller of this function is gone, and new ones are
unlikely to appear, because this function is doing very little that
a regular if() does not besides obfuscating the error message (and
if we ever did want something like it, we would probably prefer the
function to come back with more "normal" return value semantics).

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-10 10:54:27 -07:00
Edward Thomson
4e55ed32db add: add --chmod=+x / --chmod=-x options
The executable bit will not be detected (and therefore will not be
set) for paths in a repository with `core.filemode` set to false,
though the users may still wish to add files as executable for
compatibility with other users who _do_ have `core.filemode`
functionality.  For example, Windows users adding shell scripts may
wish to add them as executable for compatibility with users on
non-Windows.

Although this can be done with a plumbing command
(`git update-index --add --chmod=+x foo`), teaching the `git-add`
command allows users to set a file executable with a command that
they're already familiar with.

Signed-off-by: Edward Thomson <ethomson@edwardthomson.com>
Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-07 17:43:39 -07:00
Jeff King
9acc591111 config: add a notion of "scope"
A config callback passed to git_config() doesn't know very
much about the context in which it sees a variable. It can
ask whether the variable comes from a file, and get the file
name. But without analyzing the filename (which is hard to
do accurately), it cannot tell whether it is in system-level
config, user-level config, or repo-specific config.

Generally this doesn't matter; the point of not passing this
to the callback is that it should treat the config the same
no matter where it comes from. But some programs, like
upload-pack, are a special case: we should be able to run
them in an untrusted repository, which means we cannot use
any "dangerous" config from the repository config file (but
it is OK to use it from system or user config).

This patch teaches the config code to record the "scope" of
each variable, and make it available inside config
callbacks, similar to how we give access to the filename.
The scope is the starting source for a particular parsing
operation, and remains the same even if we include other
files (so a .git/config which includes another file will
remain CONFIG_SCOPE_REPO, as it would be similarly
untrusted).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-27 10:45:40 -07:00
Jeff King
0d44a2dacc config: return configset value for current_config_ functions
When 473166b (config: add 'origin_type' to config_source
struct, 2016-02-19) added accessor functions for the origin
type and name, it taught them only to look at the "cf"
struct that is filled in while we are parsing the config.
This is sufficient to make it work with git-config, which
uses git_config_with_options() under the hood. That function
freshly parses the config files and triggers the callback
when it parses each key.

Most git programs, however, use git_config(). This interface
will populate a cache during the actual parse, and then
serve values from the cache. Calling current_config_filename()
in a callback here will find a NULL cf and produce an error.
There are no such callers right now, but let's prepare for
adding some by making this work.

We already record source information in a struct attached to
each value. We just need to make it globally available and
then consult it from the accessor functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-27 10:44:54 -07:00
Junio C Hamano
e29300d69f Merge branch 'js/windows-dotgit' into maint
On Windows, .git and optionally any files whose name starts with a
dot are now marked as hidden, with a core.hideDotFiles knob to
customize this behaviour.

* js/windows-dotgit:
  mingw: remove unnecessary definition
  mingw: introduce the 'core.hideDotFiles' setting
2016-05-26 13:17:23 -07:00
Junio C Hamano
352d72a30e Merge branch 'nd/worktree-various-heads'
The experimental "multiple worktree" feature gains more safety to
forbid operations on a branch that is checked out or being actively
worked on elsewhere, by noticing that e.g. it is being rebased.

* nd/worktree-various-heads:
  branch: do not rename a branch under bisect or rebase
  worktree.c: check whether branch is bisected in another worktree
  wt-status.c: split bisect detection out of wt_status_get_state()
  worktree.c: check whether branch is rebased in another worktree
  worktree.c: avoid referencing to worktrees[i] multiple times
  wt-status.c: make wt_status_check_rebase() work on any worktree
  wt-status.c: split rebase detection out of wt_status_get_state()
  path.c: refactor and add worktree_git_path()
  worktree.c: mark current worktree
  worktree.c: make find_shared_symref() return struct worktree *
  worktree.c: store "id" instead of "git_dir"
  path.c: add git_common_path() and strbuf_git_common_path()
  dir.c: rename str(n)cmp_icase to fspath(n)cmp
2016-05-23 14:54:29 -07:00
Junio C Hamano
14af79b93d Merge branch 'nd/remove-unused' into HEAD
Code cleanup.

* nd/remove-unused:
  wrapper.c: delete dead function git_mkstemps()
  dir.c: remove dead function fnmatch_icase()
2016-05-18 14:40:11 -07:00
Junio C Hamano
bfc99b63fe Merge branch 'js/windows-dotgit'
On Windows, .git and optionally any files whose name starts with a
dot are now marked as hidden, with a core.hideDotFiles knob to
customize this behaviour.

* js/windows-dotgit:
  mingw: remove unnecessary definition
  mingw: introduce the 'core.hideDotFiles' setting
2016-05-17 14:38:39 -07:00
Junio C Hamano
6675f501f6 Merge branch 'ab/hooks'
A new configuration variable core.hooksPath allows customizing
where the hook directory is.

* ab/hooks:
  hooks: allow customizing where the hook directory is
  githooks.txt: minor improvements to the grammar & phrasing
  githooks.txt: amend dangerous advice about 'update' hook ACL
  githooks.txt: improve the intro section
2016-05-17 14:38:17 -07:00