Commit Graph

38175 Commits

Author SHA1 Message Date
Stefan Beller
8b148bf085 .mailmap: add Stefan Bellers corporate mail address
Note that despite the private address being first and primary,
Google owns the copyright on this patch as any other patch I'll be
sending signed off by the sbeller@google.com address.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-21 11:01:27 -07:00
Stefan Beller
dc76c7f783 transport: free leaking head in transport_print_push_status()
Found by scan.coverity.com (ID: 1248110)

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-21 11:01:18 -07:00
Junio C Hamano
13da0fc092 Update draft release notes to 2.2
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-20 13:07:32 -07:00
Junio C Hamano
9d04401ffe Merge branch 'cc/interpret-trailers'
A new filter to programatically edit the tail end of the commit log
messages.

* cc/interpret-trailers:
  Documentation: add documentation for 'git interpret-trailers'
  trailer: add tests for commands in config file
  trailer: execute command from 'trailer.<name>.command'
  trailer: add tests for "git interpret-trailers"
  trailer: add interpret-trailers command
  trailer: put all the processing together and print
  trailer: parse trailers from file or stdin
  trailer: process command line trailer arguments
  trailer: read and process config information
  trailer: process trailers from input message and arguments
  trailer: add data structures and basic functions
2014-10-20 12:25:32 -07:00
Junio C Hamano
7df3b072a9 Merge branch 'rm/gitweb-start-form'
* rm/gitweb-start-form:
  gitweb: use start_form, not startform that was removed in CGI.pm 4.04
2014-10-20 12:25:27 -07:00
Junio C Hamano
9c6be8b5ab Merge branch 'ss/contrib-subtree-contacts'
* ss/contrib-subtree-contacts:
  contacts: add a Makefile to generate docs and install
  subtree: add an install-html target
2014-10-20 12:25:16 -07:00
Junio C Hamano
b946576839 Merge branch 'jn/parse-config-slot'
Code cleanup.

* jn/parse-config-slot:
  color_parse: do not mention variable name in error message
  pass config slots as pointers instead of offsets
2014-10-20 12:23:48 -07:00
Junio C Hamano
b67588d018 Merge branch 'rs/receive-pack-argv-leak-fix'
* rs/receive-pack-argv-leak-fix:
  receive-pack: plug minor memory leak in unpack()
2014-10-20 12:23:45 -07:00
Junio C Hamano
713ee7fe46 Merge branch 'ta/config-set'
* ta/config-set:
  t1308: fix broken here document in test script
2014-10-20 12:23:43 -07:00
Junio C Hamano
f9a2fd3616 Merge branch 'jk/test-shell-trace'
Test scripts were taught to notice "-x" option to show shell trace,
as if the tests were run under "sh -x".

* jk/test-shell-trace:
  test-lib.sh: support -x option for shell-tracing
  t5304: use helper to report failure of "test foo = bar"
  t5304: use test_path_is_* instead of "test -f"
2014-10-20 12:23:40 -07:00
Junio C Hamano
6459cf8cdf Merge branch 'bc/asciidoc'
Formatting nitpicks to help a (pickier) reimplementation of
AsciiDoc to grok our documentation.

* bc/asciidoc:
  Documentation: fix mismatched delimiters in git-imap-send
  Documentation: adjust document title underlining
2014-10-20 12:23:30 -07:00
Junio C Hamano
15c6ef7b06 Revert "archive: honor tar.umask even for pax headers"
This reverts commit 10f343ea81, whose
output is no longer bit-for-bit equivalent from the older versions
of Git, which the infrastructure to (pretend to) upload tarballs
kernel.org uses depends on.
2014-10-20 12:04:46 -07:00
Torsten Bögershausen
ecdab41267 core.filemode may need manual action
core.filemode is set automatically when a repo is created.
But when a repo is exported via CIFS or cygwin is mixed with Git for Windows
or Eclipse core.filemode may better be set manually to false.
Update and improve the documentation

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 20:47:40 -07:00
Philip Oakley
7c45cee6bb doc: fix 'git status --help' character quoting
Correct backtick quoting for some of the modification states to give
consistent web rendering.

Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 20:45:16 -07:00
W. Trevor King
7544b2e2da t1304: Set LOGNAME even if USER is unset or null
Avoid:

  # ./t1304-default-acl.sh
  ok 1 - checking for a working acl setup
  ok 2 - Setup test repo
  not ok 3 - Objects creation does not break ACLs with restrictive umask
  #
  #               # SHA1 for empty blob
  #               check_perms_and_acl .git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
  #
  not ok 4 - git gc does not break ACLs with restrictive umask
  #
  #               git gc &&
  #               check_perms_and_acl .git/objects/pack/*.pack
  #
  # failed 2 among 4 test(s)
  1..4

on systems where USER isn't set.  It's usually set by the login
process, but it isn't set when launching some Docker images.  For
example:

  $ docker run --rm debian env
  HOME=/
  PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
  HOSTNAME=b2dfdfe797ed

'id -u -n' has been in POSIX from Issue 2 through 2013 [1], so I don't
expect compatibility issues.

[1]: http://pubs.opengroup.org/onlinepubs/9699919799/utilities/id.html

Signed-off-by: W. Trevor King <wking@tremily.us>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:47:20 -07:00
Etienne Buira
0f4b6db3ba Handle atexit list internaly for unthreaded builds
Wrap atexit()s calls on unthreaded builds to handle callback list
internally.

This is needed because on unthreaded builds, asyncs inherits parent's
atexit() list, that gets run as soon as the async exit()s (and again at
the end of async's parent process). That led to remove temporary files
too early.

Also remove a by-atexit-callback guard against this kind of issue in
clone.c, as this patch makes it redundant.

Fixes test 5537 (temporary shallow file vanished before unpack-objects
could open it)

BTW remove an unused variable in shallow.c.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Andreas Schwab <schwab@linux-m68k.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:38:30 -07:00
Jeff King
189a122249 drop add_object_array_with_mode
This is a thin compatibility wrapper around
add_pending_object_with_path. But the only caller is
add_object_array, which is itself just a thin compatibility
wrapper. There are no external callers, so we can just
remove this middle wrapper.

Noticed-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:28:30 -07:00
Ramsay Jones
d7702be1e1 revision: remove definition of unused 'add_object' function
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:27:24 -07:00
René Scharfe
a915459097 use env_array member of struct child_process
Convert users of struct child_process to using the managed env_array for
specifying environment variables instead of supplying an array on the
stack or bringing their own argv_array.  This shortens and simplifies
the code and ensures automatically that the allocated memory is freed
after use.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:26:34 -07:00
René Scharfe
19a583dc39 run-command: add env_array, an optional argv_array for env
Similar to args, add a struct argv_array member to struct child_process
that simplifies specifying the environment for children.  It is freed
automatically by finish_command() or if start_command() encounters an
error.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:26:31 -07:00
Jeff King
2113471478 pack-objects: turn off bitmaps when we split packs
If a pack.packSizeLimit is set, we may split the pack data
across multiple packfiles. This means we cannot generate
.bitmap files, as they require that all of the reachable
objects are in the same pack. We check that condition when
we are generating the list of objects to pack (and disable
bitmaps if we are not packing everything), but we forgot to
update it when we notice that we needed to split (which
doesn't happen until the actual write phase).

The resulting bitmaps are quite bogus (they mention entries
that do not exist in the pack!) and can cause a fetch or
push to send insufficient objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:08:38 -07:00
Jeff King
b1e757f363 pack-objects: double-check options before discarding objects
When we are given an expiration time like
--unpack-unreachable=2.weeks.ago, we avoid writing out old,
unreachable loose objects entirely, under the assumption
that running "prune" would simply delete them immediately
anyway. However, this is only valid if we computed the same
set of reachable objects as prune would.

In practice, this is the case, because only git-repack uses
the --unpack-unreachable option with an expiration, and it
always feeds as many objects into the pack as possible. But
we can double-check at runtime just to be sure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
c90f9e13ab repack: pack objects mentioned by the index
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.

By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.

This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:

  1. You make a commit on a branch that references blob X.

  2. You repack, moving X into the pack.

  3. You delete the branch (and its reflog), so that X is
     unreferenced.

  4. You "git add" blob X so that it is now referenced only
     by the index.

  5. You repack again with git-gc. The pack-objects we
     invoke will see that X is neither referenced nor
     recent and not bother loosening it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
edfbb2aa53 pack-objects: use argv_array
This saves us from having to bump the rp_av count when we
add new traversal options.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
1be111d88f reachable: use revision machinery's --indexed-objects code
This does the same thing as our custom code, so let's not
repeat ourselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
4fe10219bc rev-list: add --indexed-objects option
There is currently no easy way to ask the revision traversal
machinery to include objects reachable from the index (e.g.,
blobs and trees that have not yet been committed). This
patch adds an option to do so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
41d018d146 rev-list: document --reflog option
This is mostly used internally, but it does not hurt to
explain it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
458a7e508c t5516: test pushing a tag of an otherwise unreferenced blob
It's not unreasonable to have a tag that points to a blob
that is not part of the normal history. We do this in
git.git to distribute gpg keys. However, we never explicitly
checked in our test suite that this actually works (i.e.,
that pack-objects actually sends the blob because of the tag
mentioning it).

It does in fact work fine, but a recent patch under
discussion broke this, and the test suite didn't notice.
Let's make the test suite more complete.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:06 -07:00
Jeff King
207394908e traverse_commit_list: support pending blobs/trees with paths
When we call traverse_commit_list, we may have trees and
blobs in the pending array. As we process these, we pass the
"name" field from the pending entry as the path of the
object within the tree (which then becomes the root path if
we recurse into subtrees).

When we set up the traversal in prepare_revision_walk,
though, the "name" field of any pending trees and blobs is
likely to be the ref at which we found the object. We would
not want to make this part of the path (e.g., doing so would
make "git rev-list --objects v2.6.11-tree" in linux.git show
paths like "v2.6.11-tree/Makefile", which is nonsensical).
Therefore prepare_revision_walk sets the name field of each
pending tree and blobs to the empty string.

However, this leaves no room for a caller who does know the
correct path of a pending object to propagate that
information to the revision walker. We can fix this by
making two related changes:

  1. Use the "path" field as the path instead of the "name"
     field in traverse_commit_list. If the path is not set,
     default to "" (which is what we always ended up with in
     the current code, because of prepare_revision_walk).

  2. In prepare_revision_walk, make a complete copy of the
     entry. This makes the path field available to the
     walker (if there is one), solving our problem.
     Leaving the name field intact is now OK, as we do not
     use it as a path due to point (1) above (and we can use
     it to make more meaningful error messages if we want).
     We also make the original "mode" field available to the
     walker, though it does not actually use it.

Note that we still re-add the pending objects and free the
old ones (so we may strdup the path and name only to free
the old ones). This could be made more efficient by simply
copying the object_array entries that we are keeping.
However, that would require more restructuring of the code,
and is not done here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:06:31 -07:00
Junio C Hamano
98349e5364 Merge branch 'jc/completion-no-chdir'
* jc/completion-no-chdir:
  completion: use "git -C $there" instead of (cd $there && git ...)
2014-10-16 14:16:49 -07:00
Junio C Hamano
c11dc64722 Merge branch 'bw/trace-no-inline-getnanotime'
No file-scope static variables in an inlined function, please.

* bw/trace-no-inline-getnanotime:
  trace.c: do not mark getnanotime() as "inline"
2014-10-16 14:16:45 -07:00
Junio C Hamano
1cb3324e61 Merge branch 'po/everyday-doc'
"git help everyday" to show the Everyday Git document.

* po/everyday-doc:
  doc: add 'everyday' to 'git help'
  doc: Makefile regularise OBSOLETE_HTML list building
  doc: modernise everyday.txt wording and format in man page style
2014-10-16 14:16:42 -07:00
Roland Mas
4750f4b962 gitweb: use start_form, not startform that was removed in CGI.pm 4.04
CGI.pm 4.04 removed the startform method, which had previously been
deprecated in favour of start_form.  Changes file for CGI.pm says:

    4.04 2014-09-04
     [ REMOVED / DEPRECATIONS ]
	- startform and endform methods removed (previously deprecated,
	  you should be using the start_form and end_form methods)

Signed-off-by: Roland Mas <lolando@debian.org>
Reviewed-by: Jakub Narębski <jnareb@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 13:12:34 -07:00
David Aguilar
688684eba4 t7610-mergetool: add test cases for mergetool.writeToTemp
Add tests to ensure that filenames start with "./" when
mergetool.writeToTemp is false and do not start with "./" when
mergetool.writeToTemp is true.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 12:09:51 -07:00
David Aguilar
8f0cb41da2 mergetool: add an option for writing to a temporary directory
Teach mergetool to write files in a temporary directory when
'mergetool.writeToTemp' is true.

This is helpful for tools such as Eclipse which cannot cope with
multiple copies of the same file in the worktree.

Suggested-by: Charles Bailey <charles@hashpling.org>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 12:09:51 -07:00
David Aguilar
eab335c46d mergetool: use more conservative temporary filenames
Avoid filenames with multiple dots so that overly-picky tools do
not misinterpret their extension.

Previously, foo/bar.ext in the worktree would result in e.g.

	./foo/bar.ext.BASE.1234.ext

This can be improved by having only a single .ext and using
underscore instead of dot so that the extension cannot be
misinterpreted.  The resulting path becomes:

	./foo/bar_BASE_1234.ext

Suggested-by: Sergio Ferrero <sferrero@ensoftcorp.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 12:07:48 -07:00
David Aguilar
9e8f8dea46 test-lib-functions: adjust style to match CodingGuidelines
Prefer "test" over "[ ]" for conditionals.
Prefer "$()" over backticks for command substitutions.
Avoid control structures on a single line with semicolons.

Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 12:04:05 -07:00
David Aguilar
d7d300ea59 t7610-mergetool: use test_config to isolate tests
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 12:03:21 -07:00
David Aguilar
b12d04503b mergetools/meld: make usage of --output configurable and more robust
Older versions of meld listed --output in `meld --help`.
Newer versions only mention `meld [OPTIONS...]`.
Improve the checks to catch these newer versions.

Add a `mergetool.meld.hasOutput` configuration to allow
overriding the heuristic.

Reported-by: Andrey Novoseltsev <novoselt@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: David Aguilar <davvid@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 11:58:11 -07:00
Jeff King
9e0c3c4fcd make add_object_array_with_context interface more sane
When you resolve a sha1, you can optionally keep any context
found during the resolution, including the path and mode of
a tree entry (e.g., when looking up "HEAD:subdir/file.c").

The add_object_array_with_context function lets you then
attach that context to an entry in a list. Unfortunately,
the interface for doing so is horrible. The object_context
structure is large and most object_array users do not use
it. Therefore we keep a pointer to the structure to avoid
burdening other users too much. But that means when we do
use it that we must allocate the struct ourselves. And the
struct contains a fixed PATH_MAX-sized buffer, which makes
this wholly unsuitable for any large arrays.

We can observe that there is only a single user of the
"with_context" variant: builtin/grep.c. And in that use
case, the only element we care about is the path. We can
therefore store only the path as a pointer (the context's
mode field was redundant with the object_array_entry itself,
and nobody actually cared about the surrounding tree). This
still requires a strdup of the pathname, but at least we are
only consuming the minimum amount of memory for each string.

We can also handle the copying ourselves in
add_object_array_*, and free it as appropriate in
object_array_release_entry.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:44 -07:00
Jeff King
33d4221c79 write_sha1_file: freshen existing objects
When we try to write a loose object file, we first check
whether that object already exists. If so, we skip the
write as an optimization. However, this can interfere with
prune's strategy of using mtimes to mark files in progress.

For example, if a branch contains a particular tree object
and is deleted, that tree object may become unreachable, and
have an old mtime. If a new operation then tries to write
the same tree, this ends up as a noop; we notice we
already have the object and do nothing. A prune running
simultaneously with this operation will see the object as
old, and may delete it.

We can solve this by "freshening" objects that we avoid
writing by updating their mtime. The algorithm for doing so
is essentially the same as that of has_sha1_file. Therefore
we provide a new (static) interface "check_and_freshen",
which finds and optionally freshens the object. It's trivial
to implement freshening and simple checking by tweaking a
single parameter.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:43 -07:00
Jeff King
abcb86553d pack-objects: match prune logic for discarding objects
A recent commit taught git-prune to keep non-recent objects
that are reachable from recent ones. However, pack-objects,
when loosening unreachable objects, tries to optimize out
the write in the case that the object will be immediately
pruned. It now gets this wrong, since its rule does not
reflect the new prune code (and this can be seen by running
t6501 with a strategically placed repack).

Let's teach pack-objects similar logic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:43 -07:00
Jeff King
d0d46abc16 pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.

Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime.  But the new rules will depend on the exact
object, so we need to perform the check even for old packs.

Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Jeff King
d3038d22f9 prune: keep objects reachable from recent objects
Our current strategy with prune is that an object falls into
one of three categories:

  1. Reachable (from ref tips, reflogs, index, etc).

  2. Not reachable, but recent (based on the --expire time).

  3. Not reachable and not recent.

We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.

However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.

One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have objects it refers to, and so forth
recursively. The problem is that this is very expensive.
Creating a parent link would require traversing the entire
object graph!

Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.

This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent. We can make a simple inductive argument that
with this patch, this property holds (that there are no
objects with missing referents in the repository):

  0. When we have no objects, we have nothing to refer or be
     referred to, so the property holds.

  1. If we add objects to the repository, their direct
     referents must generally exist (e.g., if you create a
     tree, the blobs it references must exist; if you create
     a commit to point at the tree, the tree must exist).
     This is already the case before this patch. And it is
     not 100% foolproof (you can make bogus objects using
     `git hash-object`, for example), but it should be the
     case for normal usage.

     Therefore for any sequence of object additions, the
     property will continue to hold.

  2. If we remove objects from the repository, then we will
     not remove a child object (like a blob) if an object
     that refers to it is being kept. That is the part
     implemented by this patch.

     Note, however, that our reachability check and the
     actual pruning are not atomic. So it _is_ still
     possible to violate the property (e.g., an object
     becomes referenced just as we are deleting it). This
     patch is shooting for eliminating problems where the
     mtimes of dependent objects differ by hours or days,
     and one is dropped without the other. It does nothing
     to help with short races.

Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to look at each object twice. This patch
instead performs the reachability traversal, then follows up
with a second traversal for recent objects, skipping any
that have already been marked.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Jeff King
660c889e46 sha1_file: add for_each iterators for loose and packed objects
We typically iterate over the reachable objects in a
repository by starting at the tips and walking the graph.
There's no easy way to iterate over all of the objects,
including unreachable ones. Let's provide a way of doing so.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
4a1e693a30 count-objects: use for_each_loose_file_in_objdir
This drops our line count considerably, and should make
things more readable by keeping the counting logic separate
from the traversal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
cac05d4dfd count-objects: do not use xsize_t when counting object size
The point of xsize_t is to safely cast an off_t into a size_t
(because we are about to mmap). But in count-objects, we are
summing the sizes in an off_t. Using xsize_t means that
count-objects could fail on a 32-bit system with a 4G
object (not likely, as other parts of git would fail, but
we should at least be correct here).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
0d3b729680 prune-packed: use for_each_loose_file_in_objdir
This saves us from manually traversing the directory
structure ourselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:40 -07:00
Jeff King
3725427945 reachable: mark index blobs as SEEN
When we mark all reachable objects for pruning, that
includes blobs mentioned by the index. However, we do not
mark these with the SEEN flag, as we do for objects that we
find by traversing (we also do not add them to the pending
list, but that is because there is nothing further to
traverse with them).

This doesn't cause any problems with prune, because it
checks only that the object exists in the global object
hash, and not its flags. However, let's mark these objects
to be consistent and avoid any later surprises.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:40 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00