Commit Graph

3455 Commits

Author SHA1 Message Date
Nguyễn Thái Ngọc Duy
77a6d84045 count-objects: report unused files in $GIT_DIR/worktrees/...
In linked checkouts, borrowed parts like config is taken from
$GIT_COMMON_DIR. $GIT_DIR/config is never used. Report them as
garbage.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy
e3df33bb1b gc: support prune --worktrees
Helped-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy
09dbb90b09 gc: factor out gc.pruneexpire parsing code
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
2cfe2a7878 gc: style change -- no SP before closing parenthesis
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
3b8925c78b checkout: clean up half-prepared directories in --to mode
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
5883034c61 checkout: reject if the branch is already checked out elsewhere
One branch obviously can't be checked out at two places (but detached
heads are ok). Give the user a choice in this case: --detach, -b
new-branch, switch branch in the other checkout first or simply 'cd'
and continue to work there.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
23af91d102 prune: strategies for linked checkouts
(alias R=$GIT_COMMON_DIR/worktrees/<id>)

 - linked checkouts are supposed to keep its location in $R/gitdir up
   to date. The use case is auto fixup after a manual checkout move.

 - linked checkouts are supposed to update mtime of $R/gitdir. If
   $R/gitdir's mtime is older than a limit, and it points to nowhere,
   worktrees/<id> is to be pruned.

 - If $R/locked exists, worktrees/<id> is not supposed to be pruned. If
   $R/locked exists and $R/gitdir's mtime is older than a really long
   limit, warn about old unused repo.

 - "git checkout --to" is supposed to make a hard link named $R/link
   pointing to the .git file on supported file systems to help detect
   the user manually deleting the checkout. If $R/link exists and its
   link count is greated than 1, the repo is kept.

Helped-by: Marc Branchaud <marcnarc@xiplink.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy
529fef20cf checkout: support checking out into a new working directory
"git checkout --to" sets up a new working directory with a .git file
pointing to $GIT_DIR/worktrees/<id>. It then executes "git checkout"
again on the new worktree with the same arguments except "--to" is
taken out. The second checkout execution, which is not contaminated
with any info from the current repository, will actually check out and
everything that normal "git checkout" does.

Helped-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:16 -08:00
Nguyễn Thái Ngọc Duy
91aacda85a use new wrapper write_file() for simple file writing
This fixes common problems in these code about error handling,
forgetting to close the file handle after fprintf() fails, or not
printing out the error string..

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:16 -08:00
Nguyễn Thái Ngọc Duy
31e26ebcb5 setup.c: support multi-checkout repo setup
The repo setup procedure is updated to detect $GIT_DIR/commondir and
set $GIT_COMMON_DIR properly.

The core.worktree is ignored when $GIT_COMMON_DIR is set. This is
because the config file is shared in multi-checkout setup, but
checkout directories _are_ different. Making core.worktree effective
in all checkouts mean it's back to a single checkout.

Helped-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:15 -08:00
Nguyễn Thái Ngọc Duy
af07b20d51 commit: use SEQ_DIR instead of hardcoding "sequencer"
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:12 -08:00
Nguyễn Thái Ngọc Duy
1fdc2abf1b reflog: avoid constructing .lock path with git_path
Among pathnames in $GIT_DIR, e.g. "index" or "packed-refs", we want to
automatically and silently map some of them to the $GIT_DIR of the
repository we are borrowing from via $GIT_COMMON_DIR mechanism.  When
we formulate the pathname for its lockfile, we want it to be in the
same location as its final destination.  "index" is not shared and
needs to remain in the borrowing repository, while "packed-refs" is
shared and needs to go to the borrowed repository.

git_path() could be taught about the ".lock" suffix and map
"index.lock" and "packed-refs.lock" the same way their basenames are
mapped, but instead the caller can help by asking where the basename
(e.g. "index") is mapped to git_path() and then appending ".lock"
after the mapping is done.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:12 -08:00
Nguyễn Thái Ngọc Duy
557bd833bb git_path(): be aware of file relocation in $GIT_DIR
We allow the user to relocate certain paths out of $GIT_DIR via
environment variables, e.g. GIT_OBJECT_DIRECTORY, GIT_INDEX_FILE and
GIT_GRAFT_FILE. Callers are not supposed to use git_path() or
git_pathdup() to get those paths. Instead they must use
get_object_directory(), get_index_file() and get_graft_file()
respectively. This is inconvenient and could be missed in review (for
example, there's git_path("objects/info/alternates") somewhere in
sha1_file.c).

This patch makes git_path() and git_pathdup() understand those
environment variables. So if you set GIT_OBJECT_DIRECTORY to /foo/bar,
git_path("objects/abc") should return /foo/bar/abc. The same is done
for the two remaining env variables.

"git rev-parse --git-path" is the wrapper for script use.

This patch kinda reverts a0279e1 (setup_git_env: use git_pathdup
instead of xmalloc + sprintf - 2014-06-19) because using git_pathdup
here would result in infinite recursion:

  setup_git_env() -> git_pathdup("objects") -> .. -> adjust_git_path()
  -> get_object_directory() -> oops, git_object_directory is NOT set
  yet -> setup_git_env()

I wanted to make git_pathdup_literal() that skips adjust_git_path().
But that won't work because later on when $GIT_COMMON_DIR is
introduced, git_pathdup_literal("objects") needs adjust_git_path() to
replace $GIT_DIR with $GIT_COMMON_DIR.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
1a83c240f2 git_snpath(): retire and replace with strbuf_git_path()
In the previous patch, git_snpath() is modified to allocate a new
strbuf buffer because vsnpath() needs that. But that makes it
awkward because git_snpath() receives a pre-allocated buffer from
outside and has to copy data back. Rename it to strbuf_git_path()
and make it receive strbuf directly.

Using git_path() in update_refs_for_switch() which used to call
git_snpath() is safe because that function and all of its callers do
not keep any pointer to the round-robin buffer pool allocated by
get_pathname().

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:11 -08:00
Nguyễn Thái Ngọc Duy
dcf692625a path.c: make get_pathname() call sites return const char *
Before the previous commit, get_pathname returns an array of PATH_MAX
length. Even if git_path() and similar functions does not use the
whole array, git_path() caller can, in theory.

After the commit, get_pathname() may return a buffer that has just
enough room for the returned string and git_path() caller should never
write beyond that.

Make git_path(), mkpath() and git_path_submodule() return a const
buffer to make sure callers do not write in it at all.

This could have been part of the previous commit, but the "const"
conversion is too much distraction from the core changes in path.c.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:10 -08:00
Junio C Hamano
a1671dd82b Merge branch 'jk/fetch-reflog-df-conflict'
Corner-case bugfixes for "git fetch" around reflog handling.

* jk/fetch-reflog-df-conflict:
  ignore stale directories when checking reflog existence
  fetch: load all default config at startup
2014-11-06 10:52:32 -08:00
Jeff King
72549dfd5d fetch: load all default config at startup
When we start the git-fetch program, we call git_config to
load all config, but our callback only processes the
fetch.prune option; we do not chain to git_default_config at
all.

This means that we may not load some core configuration
which will have an effect. For instance, we do not load
core.logAllRefUpdates, which impacts whether or not we
create reflogs in a bare repository.

Note that I said "may" above. It gets even more exciting. If
we have to transfer actual objects as part of the fetch,
then we call fetch_pack as part of the same process. That
function loads its own config, which does chain to
git_default_config, impacting global variables which are
used by the rest of fetch. But if the fetch is a pure ref
update (e.g., a new ref which is a copy of an old one), we
skip fetch_pack entirely. So we get inconsistent results
depending on whether or not we have actual objects to
transfer or not!

Let's just load the core config at the start of fetch, so we
know we have it (we may also load it again as part of
fetch_pack, but that's OK; it's designed to be idempotent).

Our tests check both cases (with and without a pack). We
also check similar behavior for push for good measure, but
it already works as expected.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-04 12:13:46 -08:00
Junio C Hamano
1d42cf3c6c Merge branch 'jc/push-cert'
* jc/push-cert:
  receive-pack: avoid minor leak in case start_async() fails
2014-10-31 11:49:55 -07:00
Junio C Hamano
d70e331c0e Merge branch 'jk/prune-mtime'
Tighten the logic to decide that an unreachable cruft is
sufficiently old by covering corner cases such as an ancient object
becoming reachable and then going unreachable again, in which case
its retention period should be prolonged.

* jk/prune-mtime: (28 commits)
  drop add_object_array_with_mode
  revision: remove definition of unused 'add_object' function
  pack-objects: double-check options before discarding objects
  repack: pack objects mentioned by the index
  pack-objects: use argv_array
  reachable: use revision machinery's --indexed-objects code
  rev-list: add --indexed-objects option
  rev-list: document --reflog option
  t5516: test pushing a tag of an otherwise unreferenced blob
  traverse_commit_list: support pending blobs/trees with paths
  make add_object_array_with_context interface more sane
  write_sha1_file: freshen existing objects
  pack-objects: match prune logic for discarding objects
  pack-objects: refactor unpack-unreachable expiration check
  prune: keep objects reachable from recent objects
  sha1_file: add for_each iterators for loose and packed objects
  count-objects: use for_each_loose_file_in_objdir
  count-objects: do not use xsize_t when counting object size
  prune-packed: use for_each_loose_file_in_objdir
  reachable: mark index blobs as SEEN
  ...
2014-10-29 10:07:56 -07:00
René Scharfe
5d222c099e receive-pack: avoid minor leak in case start_async() fails
If the asynchronous start of copy_to_sideband() fails, then any
env_array entries added to struct child_process proc by
prepare_push_cert_sha1() are leaked.  Call the latter function only
after start_async() succeeded so that the allocated entries are
cleaned up automatically by start_command() or finish_command().

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-28 14:55:15 -07:00
Junio C Hamano
a33043f639 Merge branch 'jc/push-cert'
* jc/push-cert:
  push: heed user.signingkey for signed pushes
2014-10-24 15:01:32 -07:00
Junio C Hamano
e4da4fbe0e Merge branch 'eb/no-pthreads'
Allow us build with NO_PTHREADS=NoThanks compilation option.

* eb/no-pthreads:
  Handle atexit list internaly for unthreaded builds
  pack-objects: set number of threads before checking and warning
  index-pack: fix compilation with NO_PTHREADS
2014-10-24 14:59:10 -07:00
Junio C Hamano
217610d7d6 Merge branch 'rs/run-command-env-array'
Add managed "env" array to child_process to clarify the lifetime
rules.

* rs/run-command-env-array:
  use env_array member of struct child_process
  run-command: add env_array, an optional argv_array for env
2014-10-24 14:57:54 -07:00
Junio C Hamano
26a22d8d00 Merge branch 'jk/pack-objects-no-bitmap-when-splitting'
Splitting pack-objects output into multiple packs is incompatible
with the use of reachability bitmap.

* jk/pack-objects-no-bitmap-when-splitting:
  pack-objects: turn off bitmaps when we split packs
2014-10-24 14:56:10 -07:00
Michael J Gruber
b9459019bb push: heed user.signingkey for signed pushes
push --signed promises to take user.signingkey as the signing key but
fails to read the config.

Make it do so.

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-24 10:50:05 -07:00
Junio C Hamano
3c85452bb0 Merge branch 'rs/ref-transaction'
The API to update refs have been restructured to allow introducing
a true transactional updates later.  We would even allow storing
refs in backends other than the traditional filesystem-based one.

* rs/ref-transaction: (25 commits)
  ref_transaction_commit: bail out on failure to remove a ref
  lockfile: remove unable_to_lock_error
  refs.c: do not permit err == NULL
  remote rm/prune: print a message when writing packed-refs fails
  for-each-ref: skip and warn about broken ref names
  refs.c: allow listing and deleting badly named refs
  test: put tests for handling of bad ref names in one place
  packed-ref cache: forbid dot-components in refnames
  branch -d: simplify by using RESOLVE_REF_READING
  branch -d: avoid repeated symref resolution
  reflog test: test interaction with detached HEAD
  refs.c: change resolve_ref_unsafe reading argument to be a flags field
  refs.c: make write_ref_sha1 static
  fetch.c: change s_update_ref to use a ref transaction
  refs.c: ref_transaction_commit: distinguish name conflicts from other errors
  refs.c: pass a list of names to skip to is_refname_available
  refs.c: call lock_ref_sha1_basic directly from commit
  refs.c: refuse to lock badly named refs in lock_ref_sha1_basic
  rename_ref: don't ask read_ref_full where the ref came from
  refs.c: pass the ref log message to _create/delete/update instead of _commit
  ...
2014-10-21 13:28:10 -07:00
Junio C Hamano
9d04401ffe Merge branch 'cc/interpret-trailers'
A new filter to programatically edit the tail end of the commit log
messages.

* cc/interpret-trailers:
  Documentation: add documentation for 'git interpret-trailers'
  trailer: add tests for commands in config file
  trailer: execute command from 'trailer.<name>.command'
  trailer: add tests for "git interpret-trailers"
  trailer: add interpret-trailers command
  trailer: put all the processing together and print
  trailer: parse trailers from file or stdin
  trailer: process command line trailer arguments
  trailer: read and process config information
  trailer: process trailers from input message and arguments
  trailer: add data structures and basic functions
2014-10-20 12:25:32 -07:00
Junio C Hamano
b946576839 Merge branch 'jn/parse-config-slot'
Code cleanup.

* jn/parse-config-slot:
  color_parse: do not mention variable name in error message
  pass config slots as pointers instead of offsets
2014-10-20 12:23:48 -07:00
Junio C Hamano
b67588d018 Merge branch 'rs/receive-pack-argv-leak-fix'
* rs/receive-pack-argv-leak-fix:
  receive-pack: plug minor memory leak in unpack()
2014-10-20 12:23:45 -07:00
Etienne Buira
0f4b6db3ba Handle atexit list internaly for unthreaded builds
Wrap atexit()s calls on unthreaded builds to handle callback list
internally.

This is needed because on unthreaded builds, asyncs inherits parent's
atexit() list, that gets run as soon as the async exit()s (and again at
the end of async's parent process). That led to remove temporary files
too early.

Also remove a by-atexit-callback guard against this kind of issue in
clone.c, as this patch makes it redundant.

Fixes test 5537 (temporary shallow file vanished before unpack-objects
could open it)

BTW remove an unused variable in shallow.c.

Helped-by: Duy Nguyen <pclouds@gmail.com>
Helped-by: Andreas Schwab <schwab@linux-m68k.org>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Etienne Buira <etienne.buira@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:38:30 -07:00
René Scharfe
a915459097 use env_array member of struct child_process
Convert users of struct child_process to using the managed env_array for
specifying environment variables instead of supplying an array on the
stack or bringing their own argv_array.  This shortens and simplifies
the code and ensures automatically that the allocated memory is freed
after use.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:26:34 -07:00
Jeff King
2113471478 pack-objects: turn off bitmaps when we split packs
If a pack.packSizeLimit is set, we may split the pack data
across multiple packfiles. This means we cannot generate
.bitmap files, as they require that all of the reachable
objects are in the same pack. We check that condition when
we are generating the list of objects to pack (and disable
bitmaps if we are not packing everything), but we forgot to
update it when we notice that we needed to split (which
doesn't happen until the actual write phase).

The resulting bitmaps are quite bogus (they mention entries
that do not exist in the pack!) and can cause a fetch or
push to send insufficient objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:08:38 -07:00
Jeff King
b1e757f363 pack-objects: double-check options before discarding objects
When we are given an expiration time like
--unpack-unreachable=2.weeks.ago, we avoid writing out old,
unreachable loose objects entirely, under the assumption
that running "prune" would simply delete them immediately
anyway. However, this is only valid if we computed the same
set of reachable objects as prune would.

In practice, this is the case, because only git-repack uses
the --unpack-unreachable option with an expiration, and it
always feeds as many objects into the pack as possible. But
we can double-check at runtime just to be sure.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
c90f9e13ab repack: pack objects mentioned by the index
When we pack all objects, we use only the objects reachable
from references and reflogs. This misses any objects which
are reachable from the index, but not yet referenced.

By itself this isn't a big deal; the objects can remain
loose until they are actually used in a commit. However, it
does create a problem when we drop packed but unreachable
objects. We try to optimize out the writing of objects that
we will immediately prune, which means we must follow the
same rules as prune in determining what is reachable. And
prune uses the index for this purpose.

This is rather uncommon in practice, as objects in the index
would not usually have been packed in the first place. But
it could happen in a sequence like:

  1. You make a commit on a branch that references blob X.

  2. You repack, moving X into the pack.

  3. You delete the branch (and its reflog), so that X is
     unreferenced.

  4. You "git add" blob X so that it is now referenced only
     by the index.

  5. You repack again with git-gc. The pack-objects we
     invoke will see that X is neither referenced nor
     recent and not bother loosening it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Jeff King
edfbb2aa53 pack-objects: use argv_array
This saves us from having to bump the rp_av count when we
add new traversal options.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-19 15:07:07 -07:00
Junio C Hamano
1cb3324e61 Merge branch 'po/everyday-doc'
"git help everyday" to show the Everyday Git document.

* po/everyday-doc:
  doc: add 'everyday' to 'git help'
  doc: Makefile regularise OBSOLETE_HTML list building
  doc: modernise everyday.txt wording and format in man page style
2014-10-16 14:16:42 -07:00
Jeff King
9e0c3c4fcd make add_object_array_with_context interface more sane
When you resolve a sha1, you can optionally keep any context
found during the resolution, including the path and mode of
a tree entry (e.g., when looking up "HEAD:subdir/file.c").

The add_object_array_with_context function lets you then
attach that context to an entry in a list. Unfortunately,
the interface for doing so is horrible. The object_context
structure is large and most object_array users do not use
it. Therefore we keep a pointer to the structure to avoid
burdening other users too much. But that means when we do
use it that we must allocate the struct ourselves. And the
struct contains a fixed PATH_MAX-sized buffer, which makes
this wholly unsuitable for any large arrays.

We can observe that there is only a single user of the
"with_context" variant: builtin/grep.c. And in that use
case, the only element we care about is the path. We can
therefore store only the path as a pointer (the context's
mode field was redundant with the object_array_entry itself,
and nobody actually cared about the surrounding tree). This
still requires a strdup of the pathname, but at least we are
only consuming the minimum amount of memory for each string.

We can also handle the copying ourselves in
add_object_array_*, and free it as appropriate in
object_array_release_entry.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:44 -07:00
Jeff King
abcb86553d pack-objects: match prune logic for discarding objects
A recent commit taught git-prune to keep non-recent objects
that are reachable from recent ones. However, pack-objects,
when loosening unreachable objects, tries to optimize out
the write in the case that the object will be immediately
pruned. It now gets this wrong, since its rule does not
reflect the new prune code (and this can be seen by running
t6501 with a strategically placed repack).

Let's teach pack-objects similar logic.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:43 -07:00
Jeff King
d0d46abc16 pack-objects: refactor unpack-unreachable expiration check
When we are loosening unreachable packed objects, we do not
bother to process objects that would simply be pruned
immediately anyway. The "would be pruned" check is a simple
comparison, but is about to get more complicated. Let's pull
it out into a separate function.

Note that this is slightly less efficient than the original,
which avoided even opening old packs, since no object in
them could pass the current check, which cares only about
the pack mtime.  But the new rules will depend on the exact
object, so we need to perform the check even for old packs.

Note also that we fix a minor buglet when the pack mtime is
exactly the same as the expiration time. The prune code
considers that worth pruning, whereas our check here
considered it worth keeping. This wasn't a big deal. Besides
being unlikely to happen, the result was simply that the
object was loosened and then pruned, missing the
optimization. Still, we can easily fix it while we are here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Jeff King
d3038d22f9 prune: keep objects reachable from recent objects
Our current strategy with prune is that an object falls into
one of three categories:

  1. Reachable (from ref tips, reflogs, index, etc).

  2. Not reachable, but recent (based on the --expire time).

  3. Not reachable and not recent.

We keep objects from (1) and (2), but prune objects in (3).
The point of (2) is that these objects may be part of an
in-progress operation that has not yet updated any refs.

However, it is not always the case that objects for an
in-progress operation will have a recent mtime. For example,
the object database may have an old copy of a blob (from an
abandoned operation, a branch that was deleted, etc). If we
create a new tree that points to it, a simultaneous prune
will leave our tree, but delete the blob. Referencing that
tree with a commit will then work (we check that the tree is
in the object database, but not that all of its referred
objects are), as will mentioning the commit in a ref. But
the resulting repo is corrupt; we are missing the blob
reachable from a ref.

One way to solve this is to be more thorough when
referencing a sha1: make sure that not only do we have that
sha1, but that we have objects it refers to, and so forth
recursively. The problem is that this is very expensive.
Creating a parent link would require traversing the entire
object graph!

Instead, this patch pushes the extra work onto prune, which
runs less frequently (and has to look at the whole object
graph anyway). It creates a new category of objects: objects
which are not recent, but which are reachable from a recent
object. We do not prune these objects, just like the
reachable and recent ones.

This lets us avoid the recursive check above, because if we
have an object, even if it is unreachable, we should have
its referent. We can make a simple inductive argument that
with this patch, this property holds (that there are no
objects with missing referents in the repository):

  0. When we have no objects, we have nothing to refer or be
     referred to, so the property holds.

  1. If we add objects to the repository, their direct
     referents must generally exist (e.g., if you create a
     tree, the blobs it references must exist; if you create
     a commit to point at the tree, the tree must exist).
     This is already the case before this patch. And it is
     not 100% foolproof (you can make bogus objects using
     `git hash-object`, for example), but it should be the
     case for normal usage.

     Therefore for any sequence of object additions, the
     property will continue to hold.

  2. If we remove objects from the repository, then we will
     not remove a child object (like a blob) if an object
     that refers to it is being kept. That is the part
     implemented by this patch.

     Note, however, that our reachability check and the
     actual pruning are not atomic. So it _is_ still
     possible to violate the property (e.g., an object
     becomes referenced just as we are deleting it). This
     patch is shooting for eliminating problems where the
     mtimes of dependent objects differ by hours or days,
     and one is dropped without the other. It does nothing
     to help with short races.

Naively, the simplest way to implement this would be to add
all recent objects as tips to the reachability traversal.
However, this does not perform well. In a recently-packed
repository, all reachable objects will also be recent, and
therefore we have to look at each object twice. This patch
instead performs the reachability traversal, then follows up
with a second traversal for recent objects, skipping any
that have already been marked.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:42 -07:00
Jeff King
4a1e693a30 count-objects: use for_each_loose_file_in_objdir
This drops our line count considerably, and should make
things more readable by keeping the counting logic separate
from the traversal.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
cac05d4dfd count-objects: do not use xsize_t when counting object size
The point of xsize_t is to safely cast an off_t into a size_t
(because we are about to mmap). But in count-objects, we are
summing the sizes in an off_t. Using xsize_t means that
count-objects could fail on a 32-bit system with a 4G
object (not likely, as other parts of git would fail, but
we should at least be correct here).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:41 -07:00
Jeff King
0d3b729680 prune-packed: use for_each_loose_file_in_objdir
This saves us from manually traversing the directory
structure ourselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:40 -07:00
Jeff King
27e1e22d5e prune: factor out loose-object directory traversal
Prune has to walk $GIT_DIR/objects/?? in order to find the
set of loose objects to prune. Other parts of the code
(e.g., count-objects) want to do the same. Let's factor it
out into a reusable for_each-style function.

Note that this is not quite a straight code movement. The
original code had strange behavior when it found a file of
the form "[0-9a-f]{2}/.{38}" that did _not_ contain all hex
digits. It executed a "break" from the loop, meaning that we
stopped pruning in that directory (but still pruned other
directories!). This was probably a bug; we do not want to
process the file as an object, but we should keep going
otherwise (and that is how the new code handles it).

We are also a little more careful with loose object
directories which fail to open. The original code silently
ignored any failures, but the new code will complain about
any problems besides ENOENT.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-16 10:10:39 -07:00
Ronnie Sahlberg
2ebb49ca8a remote rm/prune: print a message when writing packed-refs fails
Until v2.1.0-rc0~22^2~11 (refs.c: add an err argument to
repack_without_refs, 2014-06-20), repack_without_refs forgot to
provide an error message when commit_packed_refs fails.  Even today,
it only provides a message for callers that pass a non-NULL err
parameter.  Internal callers in refs.c pass non-NULL err but
"git remote" does not.

That means that "git remote rm" and "git remote prune" can fail
without printing a message about why.  Fix them by passing in a
non-NULL err parameter and printing the returned message.

This is the last caller to a ref handling function passing err ==
NULL.  A later patch can drop support for err == NULL, avoiding such
problems in the future.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:26 -07:00
Ronnie Sahlberg
971c41c717 for-each-ref: skip and warn about broken ref names
Print a warning message for any bad ref names we find in the repo and
skip them so callers don't have to deal with parsing them.

It might be useful in the future to have a flag where we would not
skip these refs for those callers that want to and are prepared (for
example by using a --format argument with %0 as a delimiter after the
ref name).

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:26 -07:00
Ronnie Sahlberg
d0f810f0bc refs.c: allow listing and deleting badly named refs
We currently do not handle badly named refs well:

  $ cp .git/refs/heads/master .git/refs/heads/master.....@\*@\\.
  $ git branch
    fatal: Reference has invalid format: 'refs/heads/master.....@*@\.'
  $ git branch -D master.....@\*@\\.
    error: branch 'master.....@*@\.' not found.

Users cannot recover from a badly named ref without manually finding
and deleting the loose ref file or appropriate line in packed-refs.
Making that easier will make it easier to tweak the ref naming rules
in the future, for example to forbid shell metacharacters like '`'
and '"', without putting people in a state that is hard to get out of.

So allow "branch --list" to show these refs and allow "branch -d/-D"
and "update-ref -d" to delete them.  Other commands (for example to
rename refs) will continue to not handle these refs but can be changed
in later patches.

Details:

In resolving functions, refuse to resolve refs that don't pass the
git-check-ref-format(1) check unless the new RESOLVE_REF_ALLOW_BAD_NAME
flag is passed.  Even with RESOLVE_REF_ALLOW_BAD_NAME, refuse to
resolve refs that escape the refs/ directory and do not match the
pattern [A-Z_]* (think "HEAD" and "MERGE_HEAD").

In locking functions, refuse to act on badly named refs unless they
are being deleted and either are in the refs/ directory or match [A-Z_]*.

Just like other invalid refs, flag resolved, badly named refs with the
REF_ISBROKEN flag, treat them as resolving to null_sha1, and skip them
in all iteration functions except for for_each_rawref.

Flag badly named refs (but not symrefs pointing to badly named refs)
with a REF_BAD_NAME flag to make it easier for future callers to
notice and handle them specially.  For example, in a later patch
for-each-ref will use this flag to detect refs whose names can confuse
callers parsing for-each-ref output.

In the transaction API, refuse to create or update badly named refs,
but allow deleting them (unless they try to escape refs/ and don't match
[A-Z_]*).

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:26 -07:00
Ronnie Sahlberg
18f29fc61e branch -d: simplify by using RESOLVE_REF_READING
When "git branch -d" reads the branch it is about to delete, it used
to avoid passing the RESOLVE_REF_READING ('treat missing ref as
error') flag because a symref pointing to a nonexistent ref would show
up as missing instead of as something that could be deleted.  To check
if a ref is actually missing, we then check

 - is it a symref?
 - if not, did it resolve to null_sha1?

Now we pass RESOLVE_REF_NO_RECURSE and the correct information is
returned for a symref even when it points to a missing ref.  Simplify
by relying on RESOLVE_REF_READING.

No functional change intended.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:25 -07:00
Jonathan Nieder
62a2d52514 branch -d: avoid repeated symref resolution
If a repository gets in a broken state with too much symref nesting,
it cannot be repaired with "git branch -d":

 $ git symbolic-ref refs/heads/nonsense refs/heads/nonsense
 $ git branch -d nonsense
 error: branch 'nonsense' not found.

Worse, "git update-ref --no-deref -d" doesn't work for such repairs
either:

 $ git update-ref -d refs/heads/nonsense
 error: unable to resolve reference refs/heads/nonsense: Too many levels of symbolic links

Fix both by teaching resolve_ref_unsafe a new RESOLVE_REF_NO_RECURSE
flag and passing it when appropriate.

Callers can still read the value of a symref (for example to print a
message about it) with that flag set --- resolve_ref_unsafe will
resolve one level of symrefs and stop there.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Reviewed-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:25 -07:00
Ronnie Sahlberg
7695d118e5 refs.c: change resolve_ref_unsafe reading argument to be a flags field
resolve_ref_unsafe takes a boolean argument for reading (a nonexistent ref
resolves successfully for writing but not for reading).  Change this to be
a flags field instead, and pass the new constant RESOLVE_REF_READING when
we want this behaviour.

While at it, swap two of the arguments in the function to put output
arguments at the end.  As a nice side effect, this ensures that we can
catch callers that were unaware of the new API so they can be audited.

Give the wrapper functions resolve_refdup and read_ref_full the same
treatment for consistency.

Signed-off-by: Ronnie Sahlberg <sahlberg@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-15 10:47:24 -07:00