2005-04-08 00:13:13 +02:00
|
|
|
#ifndef CACHE_H
|
|
|
|
#define CACHE_H
|
|
|
|
|
2005-12-05 20:54:29 +01:00
|
|
|
#include "git-compat-util.h"
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
#include "strbuf.h"
|
2013-11-14 20:20:58 +01:00
|
|
|
#include "hashmap.h"
|
2018-01-24 00:46:51 +01:00
|
|
|
#include "list.h"
|
2009-09-09 13:38:58 +02:00
|
|
|
#include "advice.h"
|
2011-02-23 00:41:20 +01:00
|
|
|
#include "gettext.h"
|
2011-05-20 21:59:01 +02:00
|
|
|
#include "convert.h"
|
2014-06-11 09:56:49 +02:00
|
|
|
#include "trace.h"
|
2019-02-22 23:25:01 +01:00
|
|
|
#include "trace2.h"
|
2014-08-07 13:59:17 +02:00
|
|
|
#include "string-list.h"
|
pack-revindex: drop hash table
The main entry point to the pack-revindex code is
find_pack_revindex(). This calls revindex_for_pack(), which
lazily computes and caches the revindex for the pack.
We store the cache in a very simple hash table. It's created
by init_pack_revindex(), which inserts an entry for every
packfile we know about, and we never grow or shrink the
hash. If we ever need the revindex for a pack that isn't in
the hash, we die() with an internal error.
This can lead to a race, because we may load more packs
after having called init_pack_revindex(). For example,
imagine we have one process which needs to look at the
revindex for a variety of objects (e.g., cat-file's
"%(objectsize:disk)" format). Simultaneously, git-gc is
running, which is doing a `git repack -ad`. We might hit a
sequence like:
1. We need the revidx for some packed object. We call
find_pack_revindex() and end up in init_pack_revindex()
to create the hash table for all packs we know about.
2. We look up another object and can't find it, because
the repack has removed the pack it's in. We re-scan the
pack directory and find a new pack containing the
object. It gets added to our packed_git list.
3. We call find_pack_revindex() for the new object, which
hits revindex_for_pack() for our new pack. It can't
find the packed_git in the revindex hash, and dies.
You could also replace the `repack` above with a push or
fetch to create a new pack, though these are less likely
(you would have to somehow learn about the new objects to
look them up).
Prior to 1a6d8b9 (do not discard revindex when re-preparing
packfiles, 2014-01-15), this was safe, as we threw away the
revindex whenever we re-scanned the pack directory (and thus
re-created the revindex hash on the fly). However, we don't
want to simply revert that commit, as it was solving a
different race.
So we have a few options:
- We can fix the race in 1a6d8b9 differently, by having
the bitmap code look in the revindex hash instead of
caching the pointer. But this would introduce a lot of
extra hash lookups for common bitmap operations.
- We could teach the revindex to dynamically add new packs
to the hash table. This would perform the same, but
would mean adding extra code to the revindex hash (which
currently cannot be resized at all).
- We can get rid of the hash table entirely. There is
exactly one revindex per pack, so we can just store it
in the packed_git struct. Since it's initialized lazily,
it does not add to the startup cost.
This is the best of both worlds: less code and fewer
hash table lookups. The original code likely avoided
this in the name of encapsulation. But the packed_git
and reverse_index code are fairly intimate already, so
it's not much of a loss.
This patch implements the final option. It's a minimal
conversion that retains the pack_revindex struct. No callers
need to change, and we can do further cleanup in a follow-on
patch.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-21 07:19:49 +01:00
|
|
|
#include "pack-revindex.h"
|
2017-03-11 23:28:18 +01:00
|
|
|
#include "hash.h"
|
2017-06-22 20:43:35 +02:00
|
|
|
#include "path.h"
|
2020-03-30 16:03:46 +02:00
|
|
|
#include "oid-array.h"
|
2017-11-12 22:28:53 +01:00
|
|
|
#include "repository.h"
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
#include "mem-pool.h"
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2011-06-10 20:52:15 +02:00
|
|
|
typedef struct git_zstream {
|
|
|
|
z_stream z;
|
|
|
|
unsigned long avail_in;
|
|
|
|
unsigned long avail_out;
|
|
|
|
unsigned long total_in;
|
|
|
|
unsigned long total_out;
|
|
|
|
unsigned char *next_in;
|
|
|
|
unsigned char *next_out;
|
|
|
|
} git_zstream;
|
|
|
|
|
|
|
|
void git_inflate_init(git_zstream *);
|
|
|
|
void git_inflate_init_gzip_only(git_zstream *);
|
|
|
|
void git_inflate_end(git_zstream *);
|
|
|
|
int git_inflate(git_zstream *, int flush);
|
|
|
|
|
|
|
|
void git_deflate_init(git_zstream *, int level);
|
|
|
|
void git_deflate_init_gzip(git_zstream *, int level);
|
2013-03-15 23:21:51 +01:00
|
|
|
void git_deflate_init_raw(git_zstream *, int level);
|
2011-06-10 20:52:15 +02:00
|
|
|
void git_deflate_end(git_zstream *);
|
2011-10-28 23:48:40 +02:00
|
|
|
int git_deflate_abort(git_zstream *);
|
2011-06-10 20:52:15 +02:00
|
|
|
int git_deflate_end_gently(git_zstream *);
|
|
|
|
int git_deflate(git_zstream *, int flush);
|
|
|
|
unsigned long git_deflate_bound(git_zstream *, unsigned long);
|
2009-01-08 04:54:47 +01:00
|
|
|
|
2006-02-26 16:13:46 +01:00
|
|
|
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DTYPE(de) ((de)->d_type)
|
|
|
|
#else
|
2006-01-20 22:33:20 +01:00
|
|
|
#undef DT_UNKNOWN
|
|
|
|
#undef DT_DIR
|
|
|
|
#undef DT_REG
|
|
|
|
#undef DT_LNK
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DT_UNKNOWN 0
|
|
|
|
#define DT_DIR 1
|
|
|
|
#define DT_REG 2
|
2005-05-13 02:16:04 +02:00
|
|
|
#define DT_LNK 3
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DTYPE(de) DT_UNKNOWN
|
|
|
|
#endif
|
|
|
|
|
2007-04-22 18:43:56 +02:00
|
|
|
/* unknown mode (impossible combination S_IFIFO|S_IFCHR) */
|
|
|
|
#define S_IFINVALID 0030000
|
|
|
|
|
2007-04-10 06:14:58 +02:00
|
|
|
/*
|
|
|
|
* A "directory link" is a link to another git directory.
|
|
|
|
*
|
|
|
|
* The value 0160000 is not normally a valid mode, and
|
|
|
|
* also just happens to be S_IFDIR + S_IFLNK
|
|
|
|
*/
|
2007-05-21 22:08:28 +02:00
|
|
|
#define S_IFGITLINK 0160000
|
|
|
|
#define S_ISGITLINK(m) (((m) & S_IFMT) == S_IFGITLINK)
|
2007-04-10 06:14:58 +02:00
|
|
|
|
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-06 23:46:26 +02:00
|
|
|
/*
|
|
|
|
* Some mode bits are also used internally for computations.
|
|
|
|
*
|
|
|
|
* They *must* not overlap with any valid modes, and they *must* not be emitted
|
|
|
|
* to outside world - i.e. appear on disk or network. In other words, it's just
|
|
|
|
* temporary fields, which we internally use, but they have to stay in-house.
|
|
|
|
*
|
|
|
|
* ( such approach is valid, as standard S_IF* fits into 16 bits, and in Git
|
|
|
|
* codebase mode is `unsigned int` which is assumed to be at least 32 bits )
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* used internally in tree-diff */
|
|
|
|
#define S_DIFFTREE_IFXMIN_NEQ 0x80000000
|
|
|
|
|
|
|
|
|
2005-07-14 03:46:20 +02:00
|
|
|
/*
|
|
|
|
* Intensive research over the course of many years has shown that
|
|
|
|
* port 9418 is totally unused by anything else. Or
|
|
|
|
*
|
|
|
|
* Your search - "port 9418" - did not match any documents.
|
|
|
|
*
|
|
|
|
* as www.google.com puts it.
|
2005-09-12 20:23:00 +02:00
|
|
|
*
|
|
|
|
* This port has been properly assigned for git use by IANA:
|
|
|
|
* git (Assigned-9418) [I06-050728-0001].
|
|
|
|
*
|
|
|
|
* git 9418/tcp git pack transfer service
|
|
|
|
* git 9418/udp git pack transfer service
|
|
|
|
*
|
|
|
|
* with Linus Torvalds <torvalds@osdl.org> as the point of
|
|
|
|
* contact. September 2005.
|
|
|
|
*
|
|
|
|
* See http://www.iana.org/assignments/port-numbers
|
2005-07-14 03:46:20 +02:00
|
|
|
*/
|
|
|
|
#define DEFAULT_GIT_PORT 9418
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
/*
|
|
|
|
* Basic data structures for the directory cache
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define CACHE_SIGNATURE 0x44495243 /* "DIRC" */
|
|
|
|
struct cache_header {
|
2013-08-18 21:41:51 +02:00
|
|
|
uint32_t hdr_signature;
|
|
|
|
uint32_t hdr_version;
|
|
|
|
uint32_t hdr_entries;
|
2005-04-08 00:13:13 +02:00
|
|
|
};
|
|
|
|
|
2012-04-04 18:12:43 +02:00
|
|
|
#define INDEX_FORMAT_LB 2
|
|
|
|
#define INDEX_FORMAT_UB 4
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
/*
|
|
|
|
* The "cache_time" is just the low 32 bits of the
|
|
|
|
* time. It doesn't matter if it overflows - we only
|
|
|
|
* check it for equality in the 32 bits we save.
|
|
|
|
*/
|
|
|
|
struct cache_time {
|
2013-08-18 21:41:51 +02:00
|
|
|
uint32_t sec;
|
|
|
|
uint32_t nsec;
|
2005-04-08 00:13:13 +02:00
|
|
|
};
|
|
|
|
|
2013-06-20 10:37:50 +02:00
|
|
|
struct stat_data {
|
|
|
|
struct cache_time sd_ctime;
|
|
|
|
struct cache_time sd_mtime;
|
|
|
|
unsigned int sd_dev;
|
|
|
|
unsigned int sd_ino;
|
|
|
|
unsigned int sd_uid;
|
|
|
|
unsigned int sd_gid;
|
|
|
|
unsigned int sd_size;
|
|
|
|
};
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
struct cache_entry {
|
2013-11-14 20:21:58 +01:00
|
|
|
struct hashmap_entry ent;
|
2013-06-20 10:37:50 +02:00
|
|
|
struct stat_data ce_stat_data;
|
2005-04-15 19:44:27 +02:00
|
|
|
unsigned int ce_mode;
|
2008-01-15 01:03:17 +01:00
|
|
|
unsigned int ce_flags;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
unsigned int mem_pool_allocated;
|
2012-07-11 11:22:37 +02:00
|
|
|
unsigned int ce_namelen;
|
2014-06-13 14:19:36 +02:00
|
|
|
unsigned int index; /* for link extension */
|
2016-09-05 22:07:52 +02:00
|
|
|
struct object_id oid;
|
2006-01-07 10:33:54 +01:00
|
|
|
char name[FLEX_ARRAY]; /* more */
|
2005-04-08 00:13:13 +02:00
|
|
|
};
|
|
|
|
|
2005-04-16 07:51:44 +02:00
|
|
|
#define CE_STAGEMASK (0x3000)
|
2008-08-17 08:02:08 +02:00
|
|
|
#define CE_EXTENDED (0x4000)
|
2006-02-09 06:15:24 +01:00
|
|
|
#define CE_VALID (0x8000)
|
2005-04-16 17:33:23 +02:00
|
|
|
#define CE_STAGESHIFT 12
|
2005-04-16 07:51:44 +02:00
|
|
|
|
2008-10-01 06:04:01 +02:00
|
|
|
/*
|
2014-06-13 14:19:25 +02:00
|
|
|
* Range 0xFFFF0FFF in ce_flags is divided into
|
2008-10-01 06:04:01 +02:00
|
|
|
* two parts: in-memory flags and on-disk ones.
|
|
|
|
* Flags in CE_EXTENDED_FLAGS will get saved on-disk
|
|
|
|
* if you want to save a new flag, add it in
|
|
|
|
* CE_EXTENDED_FLAGS
|
|
|
|
*
|
|
|
|
* In-memory only flags
|
|
|
|
*/
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_UPDATE (1 << 16)
|
|
|
|
#define CE_REMOVE (1 << 17)
|
|
|
|
#define CE_UPTODATE (1 << 18)
|
|
|
|
#define CE_ADDED (1 << 19)
|
Fix name re-hashing semantics
We handled the case of removing and re-inserting cache entries badly,
which is something that merging commonly needs to do (removing the
different stages, and then re-inserting one of them as the merged
state).
We even had a rather ugly special case for this failure case, where
replace_index_entry() basically turned itself into a no-op if the new
and the old entries were the same, exactly because the hash routines
didn't handle it on their own.
So what this patch does is to not just have the UNHASHED bit, but a
HASHED bit too, and when you insert an entry into the name hash, that
involves:
- clear the UNHASHED bit, because now it's valid again for lookup
(which is really all that UNHASHED meant)
- if we're being lazy, we're done here (but we still want to clear the
UNHASHED bit regardless of lazy mode, since we can become unlazy
later, and so we need the UNHASHED bit to always be set correctly,
even if we never actually insert the entry into the hash list)
- if it was already hashed, we just leave it on the list
- otherwise mark it HASHED and insert it into the list
this all means that unhashing and rehashing a name all just works
automatically. Obviously, you cannot change the name of an entry (that
would be a serious bug), but nothing can validly do that anyway (you'd
have to allocate a new struct cache_entry anyway since the name length
could change), so that's not a new limitation.
The code actually gets simpler in many ways, although the lazy hashing
does mean that there are a few odd cases (ie something can be marked
unhashed even though it was never on the hash in the first place, and
isn't actually marked hashed!).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-23 05:37:40 +01:00
|
|
|
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_HASHED (1 << 20)
|
2017-09-22 18:35:40 +02:00
|
|
|
#define CE_FSMONITOR_VALID (1 << 21)
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_WT_REMOVE (1 << 22) /* remove in work directory */
|
|
|
|
#define CE_CONFLICTED (1 << 23)
|
2008-01-15 01:03:17 +01:00
|
|
|
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_UNPACKED (1 << 24)
|
unpack-trees: move all skip-worktree checks back to unpack_trees()
Earlier, the will_have_skip_worktree() checks are done in various
places, which makes it hard to traverse the index tree-alike, required
by excluded_from_list(). This patch moves all the checks into two
loops in unpack_trees().
Entries in index in this operation can be classified into two
groups: ones already in index before unpack_trees() is called and ones
added to index after traverse_trees() is called.
In both groups, before checking file status on worktree, the future
skip-worktree bit must be checked, so that if an entry will be outside
worktree, worktree should not be checked.
For the first group, the future skip-worktree bit is precomputed and
stored as CE_NEW_SKIP_WORKTREE in the first loop before
traverse_trees() is called so that *way_merge() function does not need
to compute it again.
For the second group, because we don't know what entries will be in
this group until traverse_trees() finishes, operations that need
future skip-worktree check is delayed until CE_NEW_SKIP_WORKTREE is
computed in the second loop. CE_ADDED is used to mark entries in the
second group.
CE_ADDED and CE_NEW_SKIP_WORKTREE are temporary flags used in
unpack_trees(). CE_ADDED is only used by add_to_index(), which should
not be called while unpack_trees() is running.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-27 07:24:04 +01:00
|
|
|
#define CE_NEW_SKIP_WORKTREE (1 << 25)
|
unpack-trees.c: prepare for looking ahead in the index
This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.
A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:
O A B
t t
t-i t-i t-i
t-j t-j
t/1
t/2
The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead. After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".
The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed. When this happens, the pointer o->pos still points at
"t-i" that is the first entry. We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".
While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed. o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 23:59:54 +01:00
|
|
|
|
checkout: avoid unnecessary match_pathspec calls
In checkout_paths() we do this
- for all updated items, call match_pathspec
- for all items, call match_pathspec (inside unmerge_cache)
- for all items, call match_pathspec (for showing "path .. is unmerged)
- for updated items, call match_pathspec and update paths
That's a lot of duplicate match_pathspec(s) and the function is not
exactly cheap to be called so many times, especially on large indexes.
This patch makes it call match_pathspec once per updated index entry,
save the result in ce_flags and reuse the results in the following
loops.
The changes in 0a1283b (checkout $tree $path: do not clobber local
changes in $path not in $tree - 2011-09-30) limit the affected paths
to ones we read from $tree. We do not do anything to other modified
entries in this case, so the "for all items" above could be modified
to "for all updated items". But..
The command's behavior now is modified slightly: unmerged entries that
match $path, but not updated by $tree, are now NOT touched. Although
this should be considered a bug fix, not a regression. A new test is
added for this change.
And while at there, free ps_matched after use.
The following command is tested on webkit, 215k entries. The pattern
is chosen mainly to make match_pathspec sweat:
git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"
before after
real 0m3.493s 0m2.737s
user 0m2.239s 0m1.586s
sys 0m1.252s 0m1.151s
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 06:58:21 +01:00
|
|
|
/* used to temporarily mark paths matched by pathspecs */
|
|
|
|
#define CE_MATCHED (1 << 26)
|
|
|
|
|
2014-06-13 14:19:39 +02:00
|
|
|
#define CE_UPDATE_IN_BASE (1 << 27)
|
2014-06-13 14:19:43 +02:00
|
|
|
#define CE_STRIP_NAME (1 << 28)
|
2014-06-13 14:19:39 +02:00
|
|
|
|
2008-10-01 06:04:01 +02:00
|
|
|
/*
|
|
|
|
* Extended on-disk flags
|
|
|
|
*/
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_INTENT_TO_ADD (1 << 29)
|
|
|
|
#define CE_SKIP_WORKTREE (1 << 30)
|
2008-10-01 06:04:01 +02:00
|
|
|
/* CE_EXTENDED2 is for future extension */
|
2015-12-29 07:35:46 +01:00
|
|
|
#define CE_EXTENDED2 (1U << 31)
|
2008-10-01 06:04:01 +02:00
|
|
|
|
2009-08-20 15:46:57 +02:00
|
|
|
#define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
|
2008-10-01 06:04:01 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Safeguard to avoid saving wrong flags:
|
|
|
|
* - CE_EXTENDED2 won't get saved until its semantic is known
|
|
|
|
* - Bits in 0x0000FFFF have been saved in ce_flags already
|
|
|
|
* - Bits in 0x003F0000 are currently in-memory flags
|
|
|
|
*/
|
|
|
|
#if CE_EXTENDED_FLAGS & 0x803FFFFF
|
|
|
|
#error "CE_EXTENDED_FLAGS out of range"
|
|
|
|
#endif
|
|
|
|
|
2021-03-30 15:10:48 +02:00
|
|
|
#define S_ISSPARSEDIR(m) ((m) == S_IFDIR)
|
|
|
|
|
2016-02-16 23:34:44 +01:00
|
|
|
/* Forward structure decls */
|
2013-07-14 10:35:25 +02:00
|
|
|
struct pathspec;
|
2016-02-16 23:34:44 +01:00
|
|
|
struct child_process;
|
2018-07-01 03:25:00 +02:00
|
|
|
struct tree;
|
2013-07-14 10:35:25 +02:00
|
|
|
|
2008-02-23 05:41:17 +01:00
|
|
|
/*
|
|
|
|
* Copy the sha1 and stat state of a cache entry from one to
|
|
|
|
* another. But we never change the name, or the hash state!
|
|
|
|
*/
|
2013-06-02 17:46:51 +02:00
|
|
|
static inline void copy_cache_entry(struct cache_entry *dst,
|
|
|
|
const struct cache_entry *src)
|
2008-02-23 05:41:17 +01:00
|
|
|
{
|
2013-11-14 20:22:27 +01:00
|
|
|
unsigned int state = dst->ce_flags & CE_HASHED;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
int mem_pool_allocated = dst->mem_pool_allocated;
|
2008-02-23 05:41:17 +01:00
|
|
|
|
|
|
|
/* Don't copy hash chain and name */
|
2013-11-14 20:21:58 +01:00
|
|
|
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
|
|
|
|
offsetof(struct cache_entry, name) -
|
|
|
|
offsetof(struct cache_entry, ce_stat_data));
|
2008-02-23 05:41:17 +01:00
|
|
|
|
|
|
|
/* Restore the hash state */
|
2013-11-14 20:22:27 +01:00
|
|
|
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
|
|
|
|
/* Restore the mem_pool_allocated flag */
|
|
|
|
dst->mem_pool_allocated = mem_pool_allocated;
|
2008-02-23 05:41:17 +01:00
|
|
|
}
|
|
|
|
|
2012-07-11 11:22:37 +02:00
|
|
|
static inline unsigned create_ce_flags(unsigned stage)
|
2008-01-19 08:42:00 +01:00
|
|
|
{
|
2012-07-11 11:22:37 +02:00
|
|
|
return (stage << CE_STAGESHIFT);
|
2008-01-19 08:42:00 +01:00
|
|
|
}
|
|
|
|
|
2012-07-11 11:22:37 +02:00
|
|
|
#define ce_namelen(ce) ((ce)->ce_namelen)
|
2005-04-16 17:33:23 +02:00
|
|
|
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
|
2008-01-15 01:03:17 +01:00
|
|
|
#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
|
2008-01-19 08:45:24 +01:00
|
|
|
#define ce_uptodate(ce) ((ce)->ce_flags & CE_UPTODATE)
|
2009-08-20 15:46:57 +02:00
|
|
|
#define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
|
2008-01-19 08:45:24 +01:00
|
|
|
#define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
|
2015-08-22 03:08:05 +02:00
|
|
|
#define ce_intent_to_add(ce) ((ce)->ce_flags & CE_INTENT_TO_ADD)
|
2005-04-16 17:33:23 +02:00
|
|
|
|
2005-04-17 07:26:31 +02:00
|
|
|
#define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
|
2005-05-05 14:38:25 +02:00
|
|
|
static inline unsigned int create_ce_mode(unsigned int mode)
|
|
|
|
{
|
|
|
|
if (S_ISLNK(mode))
|
2008-01-15 01:03:17 +01:00
|
|
|
return S_IFLNK;
|
sparse-index: convert from full to sparse
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.
For now, we avoid converting the index to a sparse index if:
1. the index is split.
2. the index is already sparse.
3. sparse-checkout is disabled.
4. sparse-checkout does not use cone mode.
Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.
The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.
The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.
Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."
We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.
The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 15:10:55 +02:00
|
|
|
if (S_ISSPARSEDIR(mode))
|
|
|
|
return S_IFDIR;
|
2007-05-21 22:08:28 +02:00
|
|
|
if (S_ISDIR(mode) || S_ISGITLINK(mode))
|
2008-01-15 01:03:17 +01:00
|
|
|
return S_IFGITLINK;
|
|
|
|
return S_IFREG | ce_permissions(mode);
|
2005-05-05 14:38:25 +02:00
|
|
|
}
|
2013-06-02 17:46:51 +02:00
|
|
|
static inline unsigned int ce_mode_from_stat(const struct cache_entry *ce,
|
|
|
|
unsigned int mode)
|
2007-02-17 07:43:48 +01:00
|
|
|
{
|
2007-03-02 22:11:30 +01:00
|
|
|
extern int trust_executable_bit, has_symlinks;
|
|
|
|
if (!has_symlinks && S_ISREG(mode) &&
|
2008-01-15 01:03:17 +01:00
|
|
|
ce && S_ISLNK(ce->ce_mode))
|
2007-03-02 22:11:30 +01:00
|
|
|
return ce->ce_mode;
|
2007-02-17 07:43:48 +01:00
|
|
|
if (!trust_executable_bit && S_ISREG(mode)) {
|
2008-01-15 01:03:17 +01:00
|
|
|
if (ce && S_ISREG(ce->ce_mode))
|
2007-02-17 07:43:48 +01:00
|
|
|
return ce->ce_mode;
|
|
|
|
return create_ce_mode(0666);
|
|
|
|
}
|
|
|
|
return create_ce_mode(mode);
|
|
|
|
}
|
2008-01-31 10:17:48 +01:00
|
|
|
static inline int ce_to_dtype(const struct cache_entry *ce)
|
|
|
|
{
|
|
|
|
unsigned ce_mode = ntohl(ce->ce_mode);
|
|
|
|
if (S_ISREG(ce_mode))
|
|
|
|
return DT_REG;
|
|
|
|
else if (S_ISDIR(ce_mode) || S_ISGITLINK(ce_mode))
|
|
|
|
return DT_DIR;
|
|
|
|
else if (S_ISLNK(ce_mode))
|
|
|
|
return DT_LNK;
|
|
|
|
else
|
|
|
|
return DT_UNKNOWN;
|
|
|
|
}
|
2010-10-04 12:53:11 +02:00
|
|
|
static inline unsigned int canon_mode(unsigned int mode)
|
|
|
|
{
|
|
|
|
if (S_ISREG(mode))
|
|
|
|
return S_IFREG | ce_permissions(mode);
|
|
|
|
if (S_ISLNK(mode))
|
|
|
|
return S_IFLNK;
|
|
|
|
if (S_ISDIR(mode))
|
|
|
|
return S_IFDIR;
|
|
|
|
return S_IFGITLINK;
|
|
|
|
}
|
2005-04-17 07:26:31 +02:00
|
|
|
|
2011-10-25 20:00:04 +02:00
|
|
|
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
|
2005-04-16 06:45:38 +02:00
|
|
|
|
2014-06-13 14:19:27 +02:00
|
|
|
#define SOMETHING_CHANGED (1 << 0) /* unclassified changes go here */
|
|
|
|
#define CE_ENTRY_CHANGED (1 << 1)
|
|
|
|
#define CE_ENTRY_REMOVED (1 << 2)
|
|
|
|
#define CE_ENTRY_ADDED (1 << 3)
|
2014-06-13 14:19:29 +02:00
|
|
|
#define RESOLVE_UNDO_CHANGED (1 << 4)
|
2014-06-13 14:19:31 +02:00
|
|
|
#define CACHE_TREE_CHANGED (1 << 5)
|
2014-06-13 14:19:44 +02:00
|
|
|
#define SPLIT_INDEX_ORDERED (1 << 6)
|
2015-03-08 11:12:39 +01:00
|
|
|
#define UNTRACKED_CHANGED (1 << 7)
|
2017-09-22 18:35:40 +02:00
|
|
|
#define FSMONITOR_CHANGED (1 << 8)
|
2014-06-13 14:19:27 +02:00
|
|
|
|
2014-06-13 14:19:36 +02:00
|
|
|
struct split_index;
|
2015-03-08 11:12:33 +01:00
|
|
|
struct untracked_cache;
|
2019-11-21 23:04:44 +01:00
|
|
|
struct progress;
|
2021-03-30 15:10:53 +02:00
|
|
|
struct pattern_list;
|
2015-03-08 11:12:33 +01:00
|
|
|
|
2022-05-23 15:48:40 +02:00
|
|
|
enum sparse_index_mode {
|
|
|
|
/*
|
|
|
|
* There are no sparse directories in the index at all.
|
|
|
|
*
|
|
|
|
* Repositories that don't use cone-mode sparse-checkout will
|
|
|
|
* always have their indexes in this mode.
|
|
|
|
*/
|
|
|
|
INDEX_EXPANDED = 0,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The index has already been collapsed to sparse directories
|
|
|
|
* whereever possible.
|
|
|
|
*/
|
|
|
|
INDEX_COLLAPSED,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The sparse directories that exist are outside the
|
|
|
|
* sparse-checkout boundary, but it is possible that some file
|
|
|
|
* entries could collapse to sparse directory entries.
|
|
|
|
*/
|
|
|
|
INDEX_PARTIALLY_SPARSE,
|
|
|
|
};
|
|
|
|
|
2007-04-02 03:14:06 +02:00
|
|
|
struct index_state {
|
|
|
|
struct cache_entry **cache;
|
2012-04-04 18:12:43 +02:00
|
|
|
unsigned int version;
|
2007-04-02 03:14:06 +02:00
|
|
|
unsigned int cache_nr, cache_alloc, cache_changed;
|
2009-12-25 09:30:51 +01:00
|
|
|
struct string_list *resolve_undo;
|
2007-04-02 03:14:06 +02:00
|
|
|
struct cache_tree *cache_tree;
|
2014-06-13 14:19:36 +02:00
|
|
|
struct split_index *split_index;
|
make USE_NSEC work as expected
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.
This will make racy git situations less likely to happen. For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not. The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.
Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by. For a fast Solid State Disk, this patch should be helpfull.
Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.
For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19 21:08:29 +01:00
|
|
|
struct cache_time timestamp;
|
unpack_trees(): protect the handcrafted in-core index from read_cache()
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.
The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such. The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().
This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.
A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion. But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.
An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick. In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.
The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 21:57:30 +02:00
|
|
|
unsigned name_hash_initialized : 1,
|
2018-01-07 23:30:14 +01:00
|
|
|
initialized : 1,
|
2019-02-15 18:59:21 +01:00
|
|
|
drop_cache_tree : 1,
|
|
|
|
updated_workdir : 1,
|
2019-05-19 09:45:33 +02:00
|
|
|
updated_skipworktree : 1,
|
2022-05-23 15:48:40 +02:00
|
|
|
fsmonitor_has_run_once : 1;
|
|
|
|
enum sparse_index_mode sparse_index;
|
2013-11-14 20:21:58 +01:00
|
|
|
struct hashmap name_hash;
|
2013-11-14 20:20:58 +01:00
|
|
|
struct hashmap dir_hash;
|
2018-05-02 02:25:44 +02:00
|
|
|
struct object_id oid;
|
2015-03-08 11:12:33 +01:00
|
|
|
struct untracked_cache *untracked;
|
2020-01-07 20:04:28 +01:00
|
|
|
char *fsmonitor_last_update;
|
2017-10-28 01:26:37 +02:00
|
|
|
struct ewah_bitmap *fsmonitor_dirty;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
struct mem_pool *ce_mem_pool;
|
2019-11-21 23:04:44 +01:00
|
|
|
struct progress *progress;
|
2021-01-23 20:58:15 +01:00
|
|
|
struct repository *repo;
|
2021-03-30 15:10:53 +02:00
|
|
|
struct pattern_list *sparse_checkout_patterns;
|
2007-04-02 03:14:06 +02:00
|
|
|
};
|
|
|
|
|
2008-03-21 21:16:24 +01:00
|
|
|
/* Name hashing */
|
2019-04-29 10:28:14 +02:00
|
|
|
int test_lazy_init_name_hash(struct index_state *istate, int try_threaded);
|
|
|
|
void add_name_hash(struct index_state *istate, struct cache_entry *ce);
|
|
|
|
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
|
|
|
|
void free_name_hash(struct index_state *istate);
|
2008-03-21 21:16:24 +01:00
|
|
|
|
2018-07-02 21:49:31 +02:00
|
|
|
/* Cache entry creation and cleanup */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create cache_entry intended for use in the specified index. Caller
|
|
|
|
* is responsible for discarding the cache_entry with
|
|
|
|
* `discard_cache_entry`.
|
|
|
|
*/
|
|
|
|
struct cache_entry *make_cache_entry(struct index_state *istate,
|
|
|
|
unsigned int mode,
|
|
|
|
const struct object_id *oid,
|
|
|
|
const char *path,
|
|
|
|
int stage,
|
|
|
|
unsigned int refresh_options);
|
|
|
|
|
|
|
|
struct cache_entry *make_empty_cache_entry(struct index_state *istate,
|
|
|
|
size_t name_len);
|
|
|
|
|
|
|
|
/*
|
2021-05-04 18:27:28 +02:00
|
|
|
* Create a cache_entry that is not intended to be added to an index. If
|
|
|
|
* `ce_mem_pool` is not NULL, the entry is allocated within the given memory
|
|
|
|
* pool. Caller is responsible for discarding "loose" entries with
|
|
|
|
* `discard_cache_entry()` and the memory pool with
|
|
|
|
* `mem_pool_discard(ce_mem_pool, should_validate_cache_entries())`.
|
2018-07-02 21:49:31 +02:00
|
|
|
*/
|
|
|
|
struct cache_entry *make_transient_cache_entry(unsigned int mode,
|
|
|
|
const struct object_id *oid,
|
|
|
|
const char *path,
|
2021-05-04 18:27:28 +02:00
|
|
|
int stage,
|
|
|
|
struct mem_pool *ce_mem_pool);
|
2018-07-02 21:49:31 +02:00
|
|
|
|
2021-05-04 18:27:28 +02:00
|
|
|
struct cache_entry *make_empty_transient_cache_entry(size_t len,
|
|
|
|
struct mem_pool *ce_mem_pool);
|
2018-07-02 21:49:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Discard cache entry.
|
|
|
|
*/
|
|
|
|
void discard_cache_entry(struct cache_entry *ce);
|
|
|
|
|
2018-07-02 21:49:39 +02:00
|
|
|
/*
|
|
|
|
* Check configuration if we should perform extra validation on cache
|
|
|
|
* entries.
|
|
|
|
*/
|
|
|
|
int should_validate_cache_entries(void);
|
|
|
|
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
/*
|
|
|
|
* Duplicate a cache_entry. Allocate memory for the new entry from a
|
|
|
|
* memory_pool. Takes into account cache_entry fields that are meant
|
|
|
|
* for managing the underlying memory allocation of the cache_entry.
|
|
|
|
*/
|
|
|
|
struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct index_state *istate);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Validate the cache entries in the index. This is an internal
|
|
|
|
* consistency check that the cache_entry structs are allocated from
|
|
|
|
* the expected memory pool.
|
|
|
|
*/
|
|
|
|
void validate_cache_entries(const struct index_state *istate);
|
|
|
|
|
2021-07-23 20:52:22 +02:00
|
|
|
/*
|
|
|
|
* Bulk prefetch all missing cache entries that are not GITLINKs and that match
|
|
|
|
* the given predicate. This function should only be called if
|
|
|
|
* has_promisor_remote() returns true.
|
|
|
|
*/
|
|
|
|
typedef int (*must_prefetch_predicate)(const struct cache_entry *);
|
|
|
|
void prefetch_cache_entries(const struct index_state *istate,
|
|
|
|
must_prefetch_predicate must_prefetch);
|
|
|
|
|
2019-01-24 09:29:12 +01:00
|
|
|
#ifdef USE_THE_INDEX_COMPATIBILITY_MACROS
|
|
|
|
extern struct index_state the_index;
|
|
|
|
|
2007-04-02 03:14:06 +02:00
|
|
|
#define active_cache (the_index.cache)
|
|
|
|
#define active_nr (the_index.cache_nr)
|
|
|
|
#define active_alloc (the_index.cache_alloc)
|
|
|
|
#define active_cache_changed (the_index.cache_changed)
|
|
|
|
#define active_cache_tree (the_index.cache_tree)
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2019-01-12 03:13:26 +01:00
|
|
|
#define read_cache() repo_read_index(the_repository)
|
read-cache: fix reading the shared index for other repos
read_index_from() takes a path argument for the location of the index
file. For reading the shared index in split index mode however it just
ignores that path argument, and reads it from the gitdir of the current
repository.
This works as long as an index in the_repository is read. Once that
changes, such as when we read the index of a submodule, or of a
different working tree than the current one, the gitdir of
the_repository will no longer contain the appropriate shared index,
and git will fail to read it.
For example t3007-ls-files-recurse-submodules.sh was broken with
GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository
object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also
broken in a similar manner, probably by introducing struct repository
there, although I didn't track down the exact commit for that.
be489d02d2 ("revision.c: --indexed-objects add objects from all
worktrees", 2017-08-23) breaks with split index mode in a similar
manner, not erroring out when it can't read the index, but instead
carrying on with pruning, without taking the index of the worktree into
account.
Fix this by passing an additional gitdir parameter to read_index_from,
to indicate where it should look for and read the shared index from.
read_cache_from() defaults to using the gitdir of the_repository. As it
is mostly a convenience macro, having to pass get_git_dir() for every
call seems overkill, and if necessary users can have more control by
using read_index_from().
Helped-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-07 23:30:13 +01:00
|
|
|
#define read_cache_from(path) read_index_from(&the_index, (path), (get_git_dir()))
|
2019-01-12 03:13:26 +01:00
|
|
|
#define read_cache_preload(pathspec) repo_read_index_preload(the_repository, (pathspec), 0)
|
checkout: Fix "initial checkout" detection
Earlier commit 5521883 (checkout: do not lose staged removal, 2008-09-07)
tightened the rule to prevent switching branches from losing local
changes, so that staged removal of paths can be protected, while
attempting to keep a loophole to still allow a special case of switching
out of an un-checked-out state.
However, the loophole was made a bit too tight, and did not allow
switching from one branch (in an un-checked-out state) to check out
another branch.
The change to builtin-checkout.c in this commit loosens it to allow this,
by not insisting the original commit and the new commit to be the same.
It also introduces a new function, is_index_unborn (and an associated
macro, is_cache_unborn), to check if the repository is truly in an
un-checked-out state more reliably, by making sure that $GIT_INDEX_FILE
did not exist when populating the in-core index structure. A few places
the earlier commit 5521883 added the check for the initial checkout
condition are updated to use this function.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-11-12 20:52:35 +01:00
|
|
|
#define is_cache_unborn() is_index_unborn(&the_index)
|
2019-01-12 03:13:26 +01:00
|
|
|
#define read_cache_unmerged() repo_read_index_unmerged(the_repository)
|
2007-04-02 08:26:07 +02:00
|
|
|
#define discard_cache() discard_index(&the_index)
|
2008-02-07 17:40:13 +01:00
|
|
|
#define unmerged_cache() unmerged_index(&the_index)
|
2007-04-02 08:26:07 +02:00
|
|
|
#define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
|
|
|
|
#define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), (option))
|
2008-07-21 02:25:56 +02:00
|
|
|
#define rename_cache_entry_at(pos, new_name) rename_index_entry_at(&the_index, (pos), (new_name))
|
2007-04-02 08:26:07 +02:00
|
|
|
#define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos))
|
|
|
|
#define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
|
2016-09-14 23:07:47 +02:00
|
|
|
#define add_to_cache(path, st, flags) add_to_index(&the_index, (path), (st), (flags))
|
|
|
|
#define add_file_to_cache(path, flags) add_file_to_index(&the_index, (path), (flags))
|
2016-09-14 23:07:46 +02:00
|
|
|
#define chmod_cache_entry(ce, flip) chmod_index_entry(&the_index, (ce), (flip))
|
2009-08-21 10:57:59 +02:00
|
|
|
#define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL)
|
2019-09-11 20:20:25 +02:00
|
|
|
#define refresh_and_write_cache(refresh_flags, write_flags, gentle) repo_refresh_and_write_index(the_repository, (refresh_flags), (write_flags), (gentle), NULL, NULL, NULL)
|
2007-11-10 09:15:03 +01:00
|
|
|
#define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
|
|
|
|
#define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
|
2013-09-17 09:06:14 +02:00
|
|
|
#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen))
|
|
|
|
#define cache_file_exists(name, namelen, igncase) index_file_exists(&the_index, (name), (namelen), (igncase))
|
2008-10-16 17:07:26 +02:00
|
|
|
#define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen))
|
2009-12-25 09:30:51 +01:00
|
|
|
#define resolve_undo_clear() resolve_undo_clear_index(&the_index)
|
2009-12-25 22:40:02 +01:00
|
|
|
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
|
2009-12-25 20:57:11 +01:00
|
|
|
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
|
2013-04-13 15:28:31 +02:00
|
|
|
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
|
2019-01-12 03:13:24 +01:00
|
|
|
#define hold_locked_index(lock_file, flags) repo_hold_locked_index(the_repository, (lock_file), (flags))
|
2007-04-02 08:26:07 +02:00
|
|
|
#endif
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2018-04-14 17:35:01 +02:00
|
|
|
#define TYPE_BITS 3
|
|
|
|
|
2018-05-11 08:55:23 +02:00
|
|
|
/*
|
|
|
|
* Values in this enum (except those outside the 3 bit range) are part
|
|
|
|
* of pack file format. See Documentation/technical/pack-format.txt
|
|
|
|
* for more information.
|
|
|
|
*/
|
2007-02-28 20:45:56 +01:00
|
|
|
enum object_type {
|
|
|
|
OBJ_BAD = -1,
|
|
|
|
OBJ_NONE = 0,
|
|
|
|
OBJ_COMMIT = 1,
|
|
|
|
OBJ_TREE = 2,
|
|
|
|
OBJ_BLOB = 3,
|
|
|
|
OBJ_TAG = 4,
|
|
|
|
/* 5 for future expansion */
|
|
|
|
OBJ_OFS_DELTA = 6,
|
|
|
|
OBJ_REF_DELTA = 7,
|
2008-02-25 22:46:04 +01:00
|
|
|
OBJ_ANY,
|
2010-05-14 11:31:35 +02:00
|
|
|
OBJ_MAX
|
2007-02-28 20:45:56 +01:00
|
|
|
};
|
|
|
|
|
2007-12-01 07:22:38 +01:00
|
|
|
static inline enum object_type object_type(unsigned int mode)
|
|
|
|
{
|
|
|
|
return S_ISDIR(mode) ? OBJ_TREE :
|
|
|
|
S_ISGITLINK(mode) ? OBJ_COMMIT :
|
|
|
|
OBJ_BLOB;
|
|
|
|
}
|
|
|
|
|
2013-03-08 10:29:08 +01:00
|
|
|
/* Double-check local_repo_env below if you add to this list. */
|
2005-05-10 07:57:58 +02:00
|
|
|
#define GIT_DIR_ENVIRONMENT "GIT_DIR"
|
$GIT_COMMON_DIR: a new environment variable
This variable is intended to support multiple working directories
attached to a repository. Such a repository may have a main working
directory, created by either "git init" or "git clone" and one or more
linked working directories. These working directories and the main
repository share the same repository directory.
In linked working directories, $GIT_COMMON_DIR must be defined to point
to the real repository directory and $GIT_DIR points to an unused
subdirectory inside $GIT_COMMON_DIR. File locations inside the
repository are reorganized from the linked worktree view point:
- worktree-specific such as HEAD, logs/HEAD, index, other top-level
refs and unrecognized files are from $GIT_DIR.
- the rest like objects, refs, info, hooks, packed-refs, shallow...
are from $GIT_COMMON_DIR (except info/sparse-checkout, but that's
a separate patch)
Scripts are supposed to retrieve paths in $GIT_DIR with "git rev-parse
--git-path", which will take care of "$GIT_DIR vs $GIT_COMMON_DIR"
business.
The redirection is done by git_path(), git_pathdup() and
strbuf_git_path(). The selected list of paths goes to $GIT_COMMON_DIR,
not the other way around in case a developer adds a new
worktree-specific file and it's accidentally promoted to be shared
across repositories (this includes unknown files added by third party
commands)
The list of known files that belong to $GIT_DIR are:
ADD_EDIT.patch BISECT_ANCESTORS_OK BISECT_EXPECTED_REV BISECT_LOG
BISECT_NAMES CHERRY_PICK_HEAD COMMIT_MSG FETCH_HEAD HEAD MERGE_HEAD
MERGE_MODE MERGE_RR NOTES_EDITMSG NOTES_MERGE_WORKTREE ORIG_HEAD
REVERT_HEAD SQUASH_MSG TAG_EDITMSG fast_import_crash_* logs/HEAD
next-index-* rebase-apply rebase-merge rsync-refs-* sequencer/*
shallow_*
Path mapping is NOT done for git_path_submodule(). Multi-checkouts are
not supported as submodules.
Helped-by: Jens Lehmann <Jens.Lehmann@web.de>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-30 09:24:36 +01:00
|
|
|
#define GIT_COMMON_DIR_ENVIRONMENT "GIT_COMMON_DIR"
|
ref namespaces: infrastructure
Add support for dividing the refs of a single repository into multiple
namespaces, each of which can have its own branches, tags, and HEAD.
Git can expose each namespace as an independent repository to pull from
and push to, while sharing the object store, and exposing all the refs
to operations such as git-gc.
Storing multiple repositories as namespaces of a single repository
avoids storing duplicate copies of the same objects, such as when
storing multiple branches of the same source. The alternates mechanism
provides similar support for avoiding duplicates, but alternates do not
prevent duplication between new objects added to the repositories
without ongoing maintenance, while namespaces do.
To specify a namespace, set the GIT_NAMESPACE environment variable to
the namespace. For each ref namespace, git stores the corresponding
refs in a directory under refs/namespaces/. For example,
GIT_NAMESPACE=foo will store refs under refs/namespaces/foo/. You can
also specify namespaces via the --namespace option to git.
Note that namespaces which include a / will expand to a hierarchy of
namespaces; for example, GIT_NAMESPACE=foo/bar will store refs under
refs/namespaces/foo/refs/namespaces/bar/. This makes paths in
GIT_NAMESPACE behave hierarchically, so that cloning with
GIT_NAMESPACE=foo/bar produces the same result as cloning with
GIT_NAMESPACE=foo and cloning from that repo with GIT_NAMESPACE=bar. It
also avoids ambiguity with strange namespace paths such as
foo/refs/heads/, which could otherwise generate directory/file conflicts
within the refs directory.
Add the infrastructure for ref namespaces: handle the GIT_NAMESPACE
environment variable and --namespace option, and support iterating over
refs in a namespace.
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Jamey Sharp <jamey@minilop.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-05 19:54:44 +02:00
|
|
|
#define GIT_NAMESPACE_ENVIRONMENT "GIT_NAMESPACE"
|
2007-06-06 09:10:42 +02:00
|
|
|
#define GIT_WORK_TREE_ENVIRONMENT "GIT_WORK_TREE"
|
2013-03-08 10:30:25 +01:00
|
|
|
#define GIT_PREFIX_ENVIRONMENT "GIT_PREFIX"
|
2016-10-07 20:18:48 +02:00
|
|
|
#define GIT_SUPER_PREFIX_ENVIRONMENT "GIT_INTERNAL_SUPER_PREFIX"
|
2005-05-10 07:57:58 +02:00
|
|
|
#define DEFAULT_GIT_DIR_ENVIRONMENT ".git"
|
2005-05-10 02:57:56 +02:00
|
|
|
#define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"
|
2005-04-21 19:55:18 +02:00
|
|
|
#define INDEX_ENVIRONMENT "GIT_INDEX_FILE"
|
2005-07-30 09:58:28 +02:00
|
|
|
#define GRAFT_ENVIRONMENT "GIT_GRAFT_FILE"
|
2013-12-05 14:02:45 +01:00
|
|
|
#define GIT_SHALLOW_FILE_ENVIRONMENT "GIT_SHALLOW_FILE"
|
2006-12-19 10:28:15 +01:00
|
|
|
#define TEMPLATE_DIR_ENVIRONMENT "GIT_TEMPLATE_DIR"
|
|
|
|
#define CONFIG_ENVIRONMENT "GIT_CONFIG"
|
2010-08-23 21:16:00 +02:00
|
|
|
#define CONFIG_DATA_ENVIRONMENT "GIT_CONFIG_PARAMETERS"
|
2021-01-12 13:27:14 +01:00
|
|
|
#define CONFIG_COUNT_ENVIRONMENT "GIT_CONFIG_COUNT"
|
2006-12-19 10:28:15 +01:00
|
|
|
#define EXEC_PATH_ENVIRONMENT "GIT_EXEC_PATH"
|
2008-05-20 08:49:26 +02:00
|
|
|
#define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
|
2009-11-18 07:50:58 +01:00
|
|
|
#define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
|
2015-06-11 23:34:59 +02:00
|
|
|
#define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
|
Add basic infrastructure to assign attributes to paths
This adds the basic infrastructure to assign attributes to
paths, in a way similar to what the exclusion mechanism does
based on $GIT_DIR/info/exclude and .gitignore files.
An attribute is just a simple string that does not contain any
whitespace. They can be specified in $GIT_DIR/info/attributes
file, and .gitattributes file in each directory.
Each line in these files defines a pattern matching rule.
Similar to the exclusion mechanism, a later match overrides an
earlier match in the same file, and entries from .gitattributes
file in the same directory takes precedence over the ones from
parent directories. Lines in $GIT_DIR/info/attributes file are
used as the lowest precedence default rules.
A line is either a comment (an empty line, or a line that begins
with a '#'), or a rule, which is a whitespace separated list of
tokens. The first token on the line is a shell glob pattern.
The rest are names of attributes, each of which can optionally
be prefixed with '!'. Such a line means "if a path matches this
glob, this attribute is set (or unset -- if the attribute name
is prefixed with '!'). For glob matching, the same "if the
pattern does not have a slash in it, the basename of the path is
matched with fnmatch(3) against the pattern, otherwise, the path
is matched with the pattern with FNM_PATHNAME" rule as the
exclusion mechanism is used.
This does not define what an attribute means. Tying an
attribute to various effects it has on git operation for paths
that have it will be specified separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-12 10:07:32 +02:00
|
|
|
#define GITATTRIBUTES_FILE ".gitattributes"
|
|
|
|
#define INFOATTRIBUTES_FILE "info/attributes"
|
attribute macro support
This adds "attribute macros" (for lack of better name). So far,
we have low-level attributes such as crlf and diff, which are
defined in operational terms --- setting or unsetting them on a
particular path directly affects what is done to the path. For
example, in order to decline diffs or crlf conversions on a
binary blob, no diffs on PostScript files, and treat all other
files normally, you would have something like these:
* diff crlf
*.ps !diff
proprietary.o !diff !crlf
That is fine as the operation goes, but gets unwieldy rather
rapidly, when we start adding more low-level attributes that are
defined in operational terms. A near-term example of such an
attribute would be 'merge-3way' which would control if git
should attempt the usual 3-way file-level merge internally, or
leave merging to a specialized external program of user's
choice. When it is added, we do _not_ want to force the users
to update the above to:
* diff crlf merge-3way
*.ps !diff
proprietary.o !diff !crlf !merge-3way
The way this patch solves this issue is to realize that the
attributes the user is assigning to paths are not defined in
terms of operations but in terms of what they are.
All of the three low-level attributes usually make sense for
most of the files that sane SCM users have git operate on (these
files are typically called "text'). Only a few cases, such as
binary blob, need exception to decline the "usual treatment
given to text files" -- and people mark them as "binary".
So this allows the $GIT_DIR/info/alternates and .gitattributes
at the toplevel of the project to also specify attributes that
assigns other attributes. The syntax is '[attr]' followed by an
attribute name followed by a list of attribute names:
[attr] binary !diff !crlf !merge-3way
When "binary" attribute is set to a path, if the path has not
got diff/crlf/merge-3way attribute set or unset by other rules,
this rule unsets the three low-level attributes.
It is expected that the user level .gitattributes will be
expressed mostly in terms of attributes based on what the files
are, and the above sample would become like this:
(built-in attribute configuration)
[attr] binary !diff !crlf !merge-3way
* diff crlf merge-3way
(project specific .gitattributes)
proprietary.o binary
(user preference $GIT_DIR/info/attributes)
*.ps !diff
There are a few caveats.
* As described above, you can define these macros only in
$GIT_DIR/info/attributes and toplevel .gitattributes.
* There is no attempt to detect circular definition of macro
attributes, and definitions are evaluated from bottom to top
as usual to fill in other attributes that have not yet got
values. The following would work as expected:
[attr] text diff crlf
[attr] ps text !diff
*.ps ps
while this would most likely not (I haven't tried):
[attr] ps text !diff
[attr] text diff crlf
*.ps ps
* When a macro says "[attr] A B !C", saying that a path does
not have attribute A does not let you tell anything about
attributes B or C. That is, given this:
[attr] text diff crlf
[attr] ps text !diff
*.txt !ps
path hello.txt, which would match "*.txt" pattern, would have
"ps" attribute set to zero, but that does not make text
attribute of hello.txt set to false (nor diff attribute set to
true).
Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-14 17:54:37 +02:00
|
|
|
#define ATTRIBUTE_MACRO_PREFIX "[attr]"
|
2017-08-02 21:49:16 +02:00
|
|
|
#define GITMODULES_FILE ".gitmodules"
|
2018-10-05 15:05:59 +02:00
|
|
|
#define GITMODULES_INDEX ":.gitmodules"
|
|
|
|
#define GITMODULES_HEAD "HEAD:.gitmodules"
|
2009-10-09 12:21:57 +02:00
|
|
|
#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
|
|
|
|
#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
|
2010-03-12 18:04:26 +01:00
|
|
|
#define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF"
|
2010-03-12 18:04:32 +01:00
|
|
|
#define GIT_NOTES_REWRITE_REF_ENVIRONMENT "GIT_NOTES_REWRITE_REF"
|
|
|
|
#define GIT_NOTES_REWRITE_MODE_ENVIRONMENT "GIT_NOTES_REWRITE_MODE"
|
add global --literal-pathspecs option
Git takes pathspec arguments in many places to limit the
scope of an operation. These pathspecs are treated not as
literal paths, but as glob patterns that can be fed to
fnmatch. When a user is giving a specific pattern, this is a
nice feature.
However, when programatically providing pathspecs, it can be
a nuisance. For example, to find the latest revision which
modified "$foo", one can use "git rev-list -- $foo". But if
"$foo" contains glob characters (e.g., "f*"), it will
erroneously match more entries than desired. The caller
needs to quote the characters in $foo, and even then, the
results may not be exactly the same as with a literal
pathspec. For instance, the depth checks in
match_pathspec_depth do not kick in if we match via fnmatch.
This patch introduces a global command-line option (i.e.,
one for "git" itself, not for specific commands) to turn
this behavior off. It also has a matching environment
variable, which can make it easier if you are a script or
porcelain interface that is going to issue many such
commands.
This option cannot turn off globbing for particular
pathspecs. That could eventually be done with a ":(noglob)"
magic pathspec prefix. However, that level of granularity is
more cumbersome to use for many cases, and doing ":(noglob)"
right would mean converting the whole codebase to use
"struct pathspec", as the usual "const char **pathspec"
cannot represent extra per-item flags.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-19 23:37:30 +01:00
|
|
|
#define GIT_LITERAL_PATHSPECS_ENVIRONMENT "GIT_LITERAL_PATHSPECS"
|
2013-07-14 10:36:08 +02:00
|
|
|
#define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
|
|
|
|
#define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
|
2013-07-14 10:36:09 +02:00
|
|
|
#define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
|
2016-10-03 22:49:18 +02:00
|
|
|
#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
|
git: add --no-optional-locks option
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.
But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.
There are a couple possible solutions:
1. Have some kind of "pseudo-lock" that allows other
commands to tell status that they want the lock.
This is likely to be complicated and error-prone to
implement (and maybe even impossible with just
dotlocks to work from, as it requires some
inter-process communication).
2. Avoid background runs of commands like "git status"
that want to do opportunistic updates, preferring
instead plumbing like diff-files, etc.
This is awkward for a couple of reasons. One is that
"status --porcelain" reports a lot more about the
repository state than is available from individual
plumbing commands. And two is that we actually _do_
want to see the refreshed index. We just don't want to
take a lock or write out the result. Whereas commands
like diff-files expect us to refresh the index
separately and write it to disk so that they can depend
on the result. But that write is exactly what we're
trying to avoid.
3. Ask "status" not to lock or write the index.
This is easy to implement. The big downside is that any
work done in refreshing the index for such a call is
lost when the process exits. So a background process
may end up re-hashing a changed file multiple times
until the user runs a command that does an index
refresh themselves.
This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.
The reason there is two-fold:
1. An environment variable is carried through to
sub-processes. And whether an invocation is a
background process or not should apply to the whole
process tree. So you could do "git --no-optional-locks
foo", and if "foo" is a script or alias that calls
"status", you'll still get the effect.
2. There may be other programs that want the same
treatment.
I've punted here on finding more callers to convert,
since "status" is the obvious one to call as a repeated
background job. But "git diff"'s opportunistic refresh
of the index may be a good candidate.
The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:
Note that the regression test added in this commit does
not *really* verify that no index.lock file was written;
that test is not possible in a portable way. Instead, we
verify that .git/index is rewritten *only* when `git
status` is run without `--no-optional-locks`.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 08:54:30 +02:00
|
|
|
#define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
|
2018-04-10 17:05:44 +02:00
|
|
|
#define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
|
2005-04-21 19:55:18 +02:00
|
|
|
|
2017-10-16 19:55:24 +02:00
|
|
|
/*
|
|
|
|
* Environment variable used in handshaking the wire protocol.
|
|
|
|
* Contains a colon ':' separated list of keys with optional values
|
|
|
|
* 'key[=value]'. Presence of unknown keys and values must be
|
|
|
|
* ignored.
|
|
|
|
*/
|
|
|
|
#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
|
2017-10-16 19:55:29 +02:00
|
|
|
/* HTTP header used to handshake the wire protocol */
|
|
|
|
#define GIT_PROTOCOL_HEADER "Git-Protocol"
|
2017-10-16 19:55:24 +02:00
|
|
|
|
2010-02-25 00:34:14 +01:00
|
|
|
/*
|
setup: suppress implicit "." work-tree for bare repos
If an explicit GIT_DIR is given without a working tree, we
implicitly assume that the current working directory should
be used as the working tree. E.g.,:
GIT_DIR=/some/repo.git git status
would compare against the cwd.
Unfortunately, we fool this rule for sub-invocations of git
by setting GIT_DIR internally ourselves. For example:
git init foo
cd foo/.git
git status ;# fails, as we expect
git config alias.st status
git status ;# does not fail, but should
What happens is that we run setup_git_directory when doing
alias lookup (since we need to see the config), set GIT_DIR
as a result, and then leave GIT_WORK_TREE blank (because we
do not have one). Then when we actually run the status
command, we do setup_git_directory again, which sees our
explicit GIT_DIR and uses the cwd as an implicit worktree.
It's tempting to argue that we should be suppressing that
second invocation of setup_git_directory, as it could use
the values we already found in memory. However, the problem
still exists for sub-processes (e.g., if "git status" were
an external command).
You can see another example with the "--bare" option, which
sets GIT_DIR explicitly. For example:
git init foo
cd foo/.git
git status ;# fails
git --bare status ;# does NOT fail
We need some way of telling sub-processes "even though
GIT_DIR is set, do not use cwd as an implicit working tree".
We could do it by putting a special token into
GIT_WORK_TREE, but the obvious choice (an empty string) has
some portability problems.
Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE,
which suppresses the use of cwd as a working tree when
GIT_DIR is set. We trigger the new variable when we know we
are in a bare setting.
The variable is left intentionally undocumented, as this is
an internal detail (for now, anyway). If somebody comes up
with a good alternate use for it, and once we are confident
we have shaken any bugs out of it, we can consider promoting
it further.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 10:32:22 +01:00
|
|
|
* This environment variable is expected to contain a boolean indicating
|
|
|
|
* whether we should or should not treat:
|
|
|
|
*
|
|
|
|
* GIT_DIR=foo.git git ...
|
|
|
|
*
|
|
|
|
* as if GIT_WORK_TREE=. was given. It's not expected that users will make use
|
|
|
|
* of this, but we use it internally to communicate to sub-processes that we
|
|
|
|
* are in a bare repo. If not set, defaults to true.
|
|
|
|
*/
|
|
|
|
#define GIT_IMPLICIT_WORK_TREE_ENVIRONMENT "GIT_IMPLICIT_WORK_TREE"
|
|
|
|
|
2010-02-25 00:34:14 +01:00
|
|
|
/*
|
2013-03-08 10:29:08 +01:00
|
|
|
* Repository-local GIT_* environment variables; these will be cleared
|
|
|
|
* when git spawns a sub-process that runs inside another repository.
|
|
|
|
* The array is NULL-terminated, which makes it easy to pass in the "env"
|
|
|
|
* parameter of a run-command invocation, or to do a simple walk.
|
2010-02-25 00:34:14 +01:00
|
|
|
*/
|
2013-03-08 10:29:08 +01:00
|
|
|
extern const char * const local_repo_env[];
|
2010-02-25 00:34:14 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void setup_git_env(const char *git_dir);
|
2017-06-20 21:19:32 +02:00
|
|
|
|
config: only read .git/config from configured repos
When git_config() runs, it looks in the system, user-wide,
and repo-level config files. It gets the latter by calling
git_pathdup(), which in turn calls get_git_dir(). If we
haven't set up the git repository yet, this may simply
return ".git", and we will look at ".git/config". This
seems like it would be helpful (presumably we haven't set up
the repository yet, so it tries to find it), but it turns
out to be a bad idea for a few reasons:
- it's not sufficient, and therefore hides bugs in a
confusing way. Config will be respected if commands are
run from the top-level of the working tree, but not from
a subdirectory.
- it's not always true that we haven't set up the
repository _yet_; we may not want to do it at all. For
instance, if you run "git init /some/path" from inside
another repository, it should not load config from the
existing repository.
- there might be a path ".git/config", but it is not the
actual repository we would find via setup_git_directory().
This may happen, e.g., if you are storing a git
repository inside another git repository, but have
munged one of the files in such a way that the
inner repository is not valid (e.g., by removing HEAD).
We have at least two bugs of the second type in git-init,
introduced by ae5f677 (lazily load core.sharedrepository,
2016-03-11). It causes init to use git_configset(), which
loads all of the config, including values from the current
repo (if any). This shows up in two ways:
1. If we happen to be in an existing repository directory,
we'll read and respect core.sharedrepository from it,
even though it should have no bearing on the new
repository. A new test in t1301 covers this.
2. Similarly, if we're in an existing repo that sets
core.logallrefupdates, that will cause init to fail to
set it in a newly created repository (because it thinks
that the user's templates already did so). A new test
in t0001 covers this.
We also need to adjust an existing test in t1302, which
gives another example of why this patch is an improvement.
That test creates an embedded repository with a bogus
core.repositoryformatversion of "99". It wants to make sure
that we actually stop at the bogus repo rather than
continuing upward to find the outer repo. So it checks that
"git config core.repositoryformatversion" returns 99. But
that only works because we blindly read ".git/config", even
though we _know_ we're in a repository whose vintage we do
not understand.
After this patch, we avoid reading config from the unknown
vintage repository at all, which is a safer choice. But we
need to tweak the test, since core.repositoryformatversion
will not return 99; it will claim that it could not find the
variable at all.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 05:24:15 +02:00
|
|
|
/*
|
|
|
|
* Returns true iff we have a configured git repository (either via
|
|
|
|
* setup_git_directory, or in the environment via $GIT_DIR).
|
|
|
|
*/
|
|
|
|
int have_git_dir(void);
|
|
|
|
|
2007-01-07 11:00:28 +01:00
|
|
|
extern int is_bare_repository_cfg;
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_bare_repository(void);
|
|
|
|
int is_inside_git_dir(void);
|
Clean up work-tree handling
The old version of work-tree support was an unholy mess, barely readable,
and not to the point.
For example, why do you have to provide a worktree, when it is not used?
As in "git status". Now it works.
Another riddle was: if you can have work trees inside the git dir, why
are some programs complaining that they need a work tree?
IOW it is allowed to call
$ git --git-dir=../ --work-tree=. bla
when you really want to. In this case, you are both in the git directory
and in the working tree. So, programs have to actually test for the right
thing, namely if they are inside a working tree, and not if they are
inside a git directory.
Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was
specified, unless there is a repository in the current working directory.
It does now.
The logic to determine if a repository is bare, or has a work tree
(tertium non datur), is this:
--work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true,
which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR
ends in /.git, which overrides the directory in which .git/ was found.
In related news, a long standing bug was fixed: when in .git/bla/x.git/,
which is a bare repository, git formerly assumed ../.. to be the
appropriate git dir. This problem was reported by Shawn Pearce to have
caused much pain, where a colleague mistakenly ran "git init" in "/" a
long time ago, and bare repositories just would not work.
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
|
|
|
extern char *git_work_tree_cfg;
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_inside_work_tree(void);
|
|
|
|
const char *get_git_dir(void);
|
|
|
|
const char *get_git_common_dir(void);
|
2022-04-25 20:27:14 +02:00
|
|
|
const char *get_object_directory(void);
|
2019-04-29 10:28:14 +02:00
|
|
|
char *get_index_file(void);
|
|
|
|
char *get_graft_file(struct repository *r);
|
2020-03-06 20:03:13 +01:00
|
|
|
void set_git_dir(const char *path, int make_realpath);
|
2019-04-29 10:28:14 +02:00
|
|
|
int get_common_dir_noenv(struct strbuf *sb, const char *gitdir);
|
|
|
|
int get_common_dir(struct strbuf *sb, const char *gitdir);
|
|
|
|
const char *get_git_namespace(void);
|
|
|
|
const char *strip_namespace(const char *namespaced_ref);
|
|
|
|
const char *get_super_prefix(void);
|
|
|
|
const char *get_git_work_tree(void);
|
2015-06-09 20:24:35 +02:00
|
|
|
|
2016-01-22 23:27:33 +01:00
|
|
|
/*
|
|
|
|
* Return true if the given path is a git directory; note that this _just_
|
|
|
|
* looks at the directory itself. If you want to know whether "foo/.git"
|
|
|
|
* is a repository, you must feed that path, not just "foo".
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_git_directory(const char *path);
|
2016-01-22 23:27:33 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return 1 if the given path is the root of a git repository or
|
|
|
|
* submodule, else 0. Will not return 1 for bare repositories with the
|
|
|
|
* exception of creating a bare repository in "foo/.git" and calling
|
|
|
|
* is_git_repository("foo").
|
|
|
|
*
|
|
|
|
* If we run into read errors, we err on the side of saying "yes, it is",
|
|
|
|
* as we usually consider sub-repos precious, and would prefer to err on the
|
|
|
|
* side of not disrupting or deleting them.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_nonbare_repository_dir(struct strbuf *path);
|
2016-01-22 23:27:33 +01:00
|
|
|
|
2015-06-09 20:24:35 +02:00
|
|
|
#define READ_GITFILE_ERR_STAT_FAILED 1
|
|
|
|
#define READ_GITFILE_ERR_NOT_A_FILE 2
|
|
|
|
#define READ_GITFILE_ERR_OPEN_FAILED 3
|
|
|
|
#define READ_GITFILE_ERR_READ_FAILED 4
|
|
|
|
#define READ_GITFILE_ERR_INVALID_FORMAT 5
|
|
|
|
#define READ_GITFILE_ERR_NO_PATH 6
|
|
|
|
#define READ_GITFILE_ERR_NOT_A_REPO 7
|
2015-06-15 21:39:52 +02:00
|
|
|
#define READ_GITFILE_ERR_TOO_LARGE 8
|
2019-04-29 10:28:14 +02:00
|
|
|
void read_gitfile_error_die(int error_code, const char *path, const char *dir);
|
|
|
|
const char *read_gitfile_gently(const char *path, int *return_error_code);
|
2015-06-09 20:24:35 +02:00
|
|
|
#define read_gitfile(path) read_gitfile_gently((path), NULL)
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *resolve_gitdir_gently(const char *suspect, int *return_error_code);
|
2017-01-25 00:56:49 +01:00
|
|
|
#define resolve_gitdir(path) resolve_gitdir_gently((path), NULL)
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void set_git_work_tree(const char *tree);
|
2005-05-10 07:57:58 +02:00
|
|
|
|
|
|
|
#define ALTERNATE_DB_ENVIRONMENT "GIT_ALTERNATE_OBJECT_DIRECTORIES"
|
2005-04-21 19:55:18 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void setup_work_tree(void);
|
2017-03-13 21:10:45 +01:00
|
|
|
/*
|
2017-06-14 20:07:37 +02:00
|
|
|
* Find the commondir and gitdir of the repository that contains the current
|
|
|
|
* working directory, without changing the working directory or other global
|
|
|
|
* state. The result is appended to commondir and gitdir. If the discovered
|
|
|
|
* gitdir does not correspond to a worktree, then 'commondir' and 'gitdir' will
|
|
|
|
* both have the same result appended to the buffer. The return value is
|
|
|
|
* either 0 upon success and non-zero if no repository was found.
|
2017-03-13 21:10:45 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int discover_git_directory(struct strbuf *commondir,
|
2019-04-29 10:28:23 +02:00
|
|
|
struct strbuf *gitdir);
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *setup_git_directory_gently(int *);
|
|
|
|
const char *setup_git_directory(void);
|
|
|
|
char *prefix_path(const char *prefix, int len, const char *path);
|
|
|
|
char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
|
2017-03-21 02:21:27 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Concatenate "prefix" (if len is non-zero) and "path", with no
|
|
|
|
* connecting characters (so "prefix" should end with a "/").
|
|
|
|
* Unlike prefix_path, this should be used if the named file does
|
|
|
|
* not have to interact with index entry; i.e. name of a random file
|
|
|
|
* on the filesystem.
|
|
|
|
*
|
2017-03-21 02:28:49 +01:00
|
|
|
* The return value is always a newly allocated string (even if the
|
|
|
|
* prefix was empty).
|
2017-03-21 02:21:27 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
char *prefix_filename(const char *prefix, const char *path);
|
2017-03-21 02:21:27 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int check_filename(const char *prefix, const char *name);
|
|
|
|
void verify_filename(const char *prefix,
|
2019-04-29 10:28:23 +02:00
|
|
|
const char *name,
|
|
|
|
int diagnose_misspelt_rev);
|
2019-04-29 10:28:14 +02:00
|
|
|
void verify_non_filename(const char *prefix, const char *name);
|
|
|
|
int path_inside_repo(const char *prefix, const char *path);
|
2005-08-17 03:06:34 +02:00
|
|
|
|
2008-04-27 19:39:27 +02:00
|
|
|
#define INIT_DB_QUIET 0x0001
|
2016-09-25 05:14:37 +02:00
|
|
|
#define INIT_DB_EXIST_OK 0x0002
|
2008-04-27 19:39:27 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int init_db(const char *git_dir, const char *real_git_dir,
|
2020-02-22 21:17:38 +01:00
|
|
|
const char *template_dir, int hash_algo,
|
2020-06-24 16:46:32 +02:00
|
|
|
const char *initial_branch, unsigned int flags);
|
builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 00:35:41 +02:00
|
|
|
void initialize_repository_version(int hash_algo, int reinit);
|
2008-04-27 19:39:27 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void sanitize_stdfds(void);
|
|
|
|
int daemonize(void);
|
2013-07-16 11:27:36 +02:00
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
#define alloc_nr(x) (((x)+16)*3/2)
|
|
|
|
|
2019-11-17 22:04:51 +01:00
|
|
|
/**
|
|
|
|
* Dynamically growing an array using realloc() is error prone and boring.
|
|
|
|
*
|
|
|
|
* Define your array with:
|
|
|
|
*
|
|
|
|
* - a pointer (`item`) that points at the array, initialized to `NULL`
|
|
|
|
* (although please name the variable based on its contents, not on its
|
|
|
|
* type);
|
|
|
|
*
|
|
|
|
* - an integer variable (`alloc`) that keeps track of how big the current
|
|
|
|
* allocation is, initialized to `0`;
|
|
|
|
*
|
|
|
|
* - another integer variable (`nr`) to keep track of how many elements the
|
|
|
|
* array currently has, initialized to `0`.
|
|
|
|
*
|
|
|
|
* Then before adding `n`th element to the item, call `ALLOC_GROW(item, n,
|
|
|
|
* alloc)`. This ensures that the array can hold at least `n` elements by
|
|
|
|
* calling `realloc(3)` and adjusting `alloc` variable.
|
|
|
|
*
|
|
|
|
* ------------
|
|
|
|
* sometype *item;
|
|
|
|
* size_t nr;
|
|
|
|
* size_t alloc
|
|
|
|
*
|
|
|
|
* for (i = 0; i < nr; i++)
|
|
|
|
* if (we like item[i] already)
|
|
|
|
* return;
|
|
|
|
*
|
|
|
|
* // we did not like any existing one, so add one
|
|
|
|
* ALLOC_GROW(item, nr + 1, alloc);
|
|
|
|
* item[nr++] = value you like;
|
|
|
|
* ------------
|
|
|
|
*
|
|
|
|
* You are responsible for updating the `nr` variable.
|
|
|
|
*
|
|
|
|
* If you need to specify the number of elements to allocate explicitly
|
|
|
|
* then use the macro `REALLOC_ARRAY(item, alloc)` instead of `ALLOC_GROW`.
|
2007-06-11 15:39:44 +02:00
|
|
|
*
|
2019-06-28 00:54:13 +02:00
|
|
|
* Consider using ALLOC_GROW_BY instead of ALLOC_GROW as it has some
|
|
|
|
* added niceties.
|
|
|
|
*
|
2010-10-08 18:46:59 +02:00
|
|
|
* DO NOT USE any expression with side-effect for 'x', 'nr', or 'alloc'.
|
2007-06-11 15:39:44 +02:00
|
|
|
*/
|
|
|
|
#define ALLOC_GROW(x, nr, alloc) \
|
|
|
|
do { \
|
2007-06-17 00:37:39 +02:00
|
|
|
if ((nr) > alloc) { \
|
Extend --pretty=oneline to cover the first paragraph,
so that an ugly commit message like this can be
handled sanely.
Currently, --pretty=oneline and --pretty=email (hence
format-patch) take and use only the first line of the commit log
message. This changes them to:
- Take the first paragraph, where the definition of the first
paragraph is "skip all blank lines from the beginning, and
then grab everything up to the next empty line".
- Replace all line breaks with a whitespace.
This change would not affect a well-behaved commit message that
adheres to the convention of "single line summary, a blank line,
and then body of message", as its first paragraph always
consists of a single line. Commit messages from different
culture, such as the ones imported from CVS/SVN, can however get
chomped with the existing behaviour at the first linebreak in
the middle of sentence right now, which would become much easier
to see with this change.
The Subject: and --pretty=oneline output would become very long
and unsightly for non-conforming commits, but their messages are
already ugly anyway, and thischange at least avoids the loss of
information.
The Subject: line from a multi-line paragraph is folded using
RFC2822 line folding rules at the places where line breaks were
in the original.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-12 07:10:55 +02:00
|
|
|
if (alloc_nr(alloc) < (nr)) \
|
|
|
|
alloc = (nr); \
|
|
|
|
else \
|
|
|
|
alloc = alloc_nr(alloc); \
|
2014-09-16 20:56:57 +02:00
|
|
|
REALLOC_ARRAY(x, alloc); \
|
2007-06-11 15:39:44 +02:00
|
|
|
} \
|
2010-08-13 00:11:15 +02:00
|
|
|
} while (0)
|
2007-06-11 15:39:44 +02:00
|
|
|
|
2019-06-28 00:54:13 +02:00
|
|
|
/*
|
|
|
|
* Similar to ALLOC_GROW but handles updating of the nr value and
|
|
|
|
* zeroing the bytes of the newly-grown array elements.
|
|
|
|
*
|
|
|
|
* DO NOT USE any expression with side-effect for any of the
|
|
|
|
* arguments.
|
|
|
|
*/
|
|
|
|
#define ALLOC_GROW_BY(x, nr, increase, alloc) \
|
|
|
|
do { \
|
|
|
|
if (increase) { \
|
|
|
|
size_t new_nr = nr + (increase); \
|
|
|
|
if (new_nr < nr) \
|
|
|
|
BUG("negative growth in ALLOC_GROW_BY"); \
|
|
|
|
ALLOC_GROW(x, new_nr, alloc); \
|
|
|
|
memset((x) + nr, 0, sizeof(*(x)) * (increase)); \
|
|
|
|
nr = new_nr; \
|
|
|
|
} \
|
|
|
|
} while (0)
|
|
|
|
|
2005-04-09 18:48:20 +02:00
|
|
|
/* Initialize and use the cache information */
|
2014-06-13 14:19:23 +02:00
|
|
|
struct lock_file;
|
2019-04-29 10:28:14 +02:00
|
|
|
void preload_index(struct index_state *index,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct pathspec *pathspec,
|
|
|
|
unsigned int refresh_flags);
|
2019-04-29 10:28:14 +02:00
|
|
|
int do_read_index(struct index_state *istate, const char *path,
|
2019-04-29 10:28:23 +02:00
|
|
|
int must_exist); /* for testting only! */
|
2019-04-29 10:28:14 +02:00
|
|
|
int read_index_from(struct index_state *, const char *path,
|
2019-04-29 10:28:23 +02:00
|
|
|
const char *gitdir);
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_index_unborn(struct index_state *);
|
2017-10-05 22:32:11 +02:00
|
|
|
|
2021-03-30 15:10:48 +02:00
|
|
|
void ensure_full_index(struct index_state *istate);
|
|
|
|
|
2017-10-05 22:32:11 +02:00
|
|
|
/* For use with `write_locked_index()`. */
|
2014-06-13 14:19:23 +02:00
|
|
|
#define COMMIT_LOCK (1 << 0)
|
2018-03-01 21:40:20 +01:00
|
|
|
#define SKIP_IF_UNCHANGED (1 << 1)
|
2017-10-05 22:32:11 +02:00
|
|
|
|
|
|
|
/*
|
read-cache: drop explicit `CLOSE_LOCK`-flag
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.
We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).
Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.
This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:12:12 +02:00
|
|
|
* Write the index while holding an already-taken lock. Close the lock,
|
|
|
|
* and if `COMMIT_LOCK` is given, commit it.
|
2017-10-05 22:32:11 +02:00
|
|
|
*
|
|
|
|
* Unless a split index is in use, write the index into the lockfile.
|
|
|
|
*
|
|
|
|
* With a split index, write the shared index to a temporary file,
|
|
|
|
* adjust its permissions and rename it into place, then write the
|
|
|
|
* split index to the lockfile. If the temporary file for the shared
|
|
|
|
* index cannot be created, fall back to the behavior described in
|
|
|
|
* the previous paragraph.
|
read-cache: leave lock in right state in `write_locked_index()`
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).
In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.
Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.
Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.
Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).
Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:12:13 +02:00
|
|
|
*
|
|
|
|
* With `COMMIT_LOCK`, the lock is always committed or rolled back.
|
|
|
|
* Without it, the lock is closed, but neither committed nor rolled
|
|
|
|
* back.
|
2018-03-01 21:40:20 +01:00
|
|
|
*
|
|
|
|
* If `SKIP_IF_UNCHANGED` is given and the index is unchanged, nothing
|
|
|
|
* is written (and the lock is rolled back if `COMMIT_LOCK` is given).
|
2017-10-05 22:32:11 +02:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
|
2017-10-05 22:32:11 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int discard_index(struct index_state *);
|
|
|
|
void move_index_extensions(struct index_state *dst, struct index_state *src);
|
|
|
|
int unmerged_index(const struct index_state *);
|
2017-12-21 20:19:06 +01:00
|
|
|
|
|
|
|
/**
|
2018-07-01 03:25:00 +02:00
|
|
|
* Returns 1 if istate differs from tree, 0 otherwise. If tree is NULL,
|
|
|
|
* compares istate to HEAD. If tree is NULL and on an unborn branch,
|
|
|
|
* returns 1 if there are entries in istate, 0 otherwise. If an strbuf is
|
|
|
|
* provided, the space-separated list of files that differ will be appended
|
|
|
|
* to it.
|
2017-12-21 20:19:06 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int repo_index_has_changes(struct repository *repo,
|
2019-04-29 10:28:23 +02:00
|
|
|
struct tree *tree,
|
|
|
|
struct strbuf *sb);
|
2017-12-21 20:19:06 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int verify_path(const char *path, unsigned mode);
|
|
|
|
int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
|
|
|
|
int index_dir_exists(struct index_state *istate, const char *name, int namelen);
|
|
|
|
void adjust_dirname_case(struct index_state *istate, char *name);
|
|
|
|
struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
|
2017-01-19 04:18:51 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Searches for an entry defined by name and namelen in the given index.
|
|
|
|
* If the return value is positive (including 0) it is the position of an
|
|
|
|
* exact match. If the return value is negative, the negated value minus 1
|
|
|
|
* is the position where the entry would be inserted.
|
|
|
|
* Example: The current index consists of these files and its stages:
|
|
|
|
*
|
|
|
|
* b#0, d#0, f#1, f#3
|
|
|
|
*
|
|
|
|
* index_name_pos(&index, "a", 1) -> -1
|
|
|
|
* index_name_pos(&index, "b", 1) -> 0
|
|
|
|
* index_name_pos(&index, "c", 1) -> -2
|
|
|
|
* index_name_pos(&index, "d", 1) -> 1
|
|
|
|
* index_name_pos(&index, "e", 1) -> -3
|
|
|
|
* index_name_pos(&index, "f", 1) -> -3
|
|
|
|
* index_name_pos(&index, "g", 1) -> -5
|
|
|
|
*/
|
2021-04-01 03:49:39 +02:00
|
|
|
int index_name_pos(struct index_state *, const char *name, int namelen);
|
2017-01-19 04:18:51 +01:00
|
|
|
|
2021-11-29 16:52:41 +01:00
|
|
|
/*
|
|
|
|
* Determines whether an entry with the given name exists within the
|
|
|
|
* given index. The return value is 1 if an exact match is found, otherwise
|
|
|
|
* it is 0. Note that, unlike index_name_pos, this function does not expand
|
|
|
|
* the index if it is sparse. If an item exists within the full index but it
|
|
|
|
* is contained within a sparse directory (and not in the sparse index), 0 is
|
|
|
|
* returned.
|
|
|
|
*/
|
|
|
|
int index_entry_exists(struct index_state *, const char *name, int namelen);
|
|
|
|
|
msvc: avoid using minus operator on unsigned types
MSVC complains about this with `-Wall`, which can be taken as a sign
that this is indeed a real bug. The symptom is:
C4146: unary minus operator applied to unsigned type, result
still unsigned
Let's avoid this warning in the minimal way, e.g. writing `-1 -
<unsigned value>` instead of `-<unsigned value> - 1`.
Note that the change in the `estimate_cache_size()` function is
needed because MSVC considers the "return type" of the `sizeof()`
operator to be `size_t`, i.e. unsigned, and therefore it cannot be
negated using the unary minus operator.
Even worse, that arithmetic is doing extra work, in vain. We want to
calculate the entry extra cache size as the difference between the
size of the `cache_entry` structure minus the size of the
`ondisk_cache_entry` structure, padded to the appropriate alignment
boundary.
To that end, we start by assigning that difference to the `per_entry`
variable, and then abuse the `len` parameter of the
`align_padding_size()` macro to take the negative size of the ondisk
entry size. Essentially, we try to avoid passing the already calculated
difference to that macro by passing the operands of that difference
instead, when the macro expects operands of an addition:
#define align_padding_size(size, len) \
((size + (len) + 8) & ~7) - (size + len)
Currently, we pass A and -B to that macro instead of passing A - B and
0, where A - B is already stored in the `per_entry` variable, ready to
be used.
This is neither necessary, nor intuitive. Let's fix this, and have code
that is both easier to read and that also does not trigger MSVC's
warning.
While at it, we take care of reporting overflows (which are unlikely,
but hey, defensive programming is good!).
We _also_ take pains of casting the unsigned value to signed: otherwise,
the signed operand (i.e. the `-1`) would be cast to unsigned before
doing the arithmetic.
Helped-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 17:09:26 +02:00
|
|
|
/*
|
|
|
|
* Some functions return the negative complement of an insert position when a
|
|
|
|
* precise match was not found but a position was found where the entry would
|
|
|
|
* need to be inserted. This helper protects that logic from any integer
|
|
|
|
* underflow.
|
|
|
|
*/
|
|
|
|
static inline int index_pos_to_insert_pos(uintmax_t pos)
|
|
|
|
{
|
|
|
|
if (pos > INT_MAX)
|
|
|
|
die("overflow: -1 - %"PRIuMAX, pos);
|
|
|
|
return -1 - (int)pos;
|
|
|
|
}
|
|
|
|
|
2005-05-08 06:55:21 +02:00
|
|
|
#define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */
|
|
|
|
#define ADD_CACHE_OK_TO_REPLACE 2 /* Ok to replace file/directory */
|
2005-06-25 11:25:29 +02:00
|
|
|
#define ADD_CACHE_SKIP_DFCHECK 4 /* Ok to skip DF conflict checks */
|
2021-03-20 23:37:46 +01:00
|
|
|
#define ADD_CACHE_JUST_APPEND 8 /* Append only */
|
2008-08-21 10:44:53 +02:00
|
|
|
#define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */
|
2014-06-13 14:19:42 +02:00
|
|
|
#define ADD_CACHE_KEEP_CACHE_TREE 32 /* Do not invalidate cache-tree */
|
2019-01-17 17:27:11 +01:00
|
|
|
#define ADD_CACHE_RENORMALIZE 64 /* Pass along HASH_RENORMALIZE */
|
2019-04-29 10:28:14 +02:00
|
|
|
int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
|
|
|
|
void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
|
2017-01-19 04:18:52 +01:00
|
|
|
|
|
|
|
/* Remove entry, return true if there are more entries to go. */
|
2019-04-29 10:28:14 +02:00
|
|
|
int remove_index_entry_at(struct index_state *, int pos);
|
2017-01-19 04:18:52 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void remove_marked_cache_entries(struct index_state *istate, int invalidate);
|
|
|
|
int remove_file_from_index(struct index_state *, const char *path);
|
2008-05-21 21:04:34 +02:00
|
|
|
#define ADD_CACHE_VERBOSE 1
|
|
|
|
#define ADD_CACHE_PRETEND 2
|
2008-05-25 23:03:50 +02:00
|
|
|
#define ADD_CACHE_IGNORE_ERRORS 4
|
2008-07-21 10:24:17 +02:00
|
|
|
#define ADD_CACHE_IGNORE_REMOVAL 8
|
2008-08-21 10:44:53 +02:00
|
|
|
#define ADD_CACHE_INTENT 16
|
2017-01-19 04:18:53 +01:00
|
|
|
/*
|
|
|
|
* These two are used to add the contents of the file at path
|
|
|
|
* to the index, marking the working tree up-to-date by storing
|
|
|
|
* the cached stat info in the resulting cache entry. A caller
|
|
|
|
* that has already run lstat(2) on the path can call
|
|
|
|
* add_to_index(), and all others can call add_file_to_index();
|
|
|
|
* the latter will do necessary lstat(2) internally before
|
|
|
|
* calling the former.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int add_to_index(struct index_state *, const char *path, struct stat *, int flags);
|
|
|
|
int add_file_to_index(struct index_state *, const char *path, int flags);
|
2017-01-19 04:18:53 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
|
|
|
|
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
|
|
|
|
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
|
2021-04-01 03:49:39 +02:00
|
|
|
int index_name_is_other(struct index_state *, const char *, int);
|
|
|
|
void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
|
2007-11-10 09:15:03 +01:00
|
|
|
|
|
|
|
/* do stat comparison even if CE_VALID is true */
|
|
|
|
#define CE_MATCH_IGNORE_VALID 01
|
|
|
|
/* do not check the contents but report dirty on racily-clean entries */
|
2009-12-14 12:43:58 +01:00
|
|
|
#define CE_MATCH_RACY_IS_DIRTY 02
|
|
|
|
/* do stat comparison even if CE_SKIP_WORKTREE is true */
|
|
|
|
#define CE_MATCH_IGNORE_SKIP_WORKTREE 04
|
2014-01-27 15:45:07 +01:00
|
|
|
/* ignore non-existent files during stat update */
|
|
|
|
#define CE_MATCH_IGNORE_MISSING 0x08
|
2014-01-27 15:45:08 +01:00
|
|
|
/* enable stat refresh */
|
|
|
|
#define CE_MATCH_REFRESH 0x10
|
2017-09-22 18:35:40 +02:00
|
|
|
/* don't refresh_fsmonitor state or do stat comparison even if CE_FSMONITOR_VALID is true */
|
|
|
|
#define CE_MATCH_IGNORE_FSMONITOR 0X20
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_racy_timestamp(const struct index_state *istate,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct cache_entry *ce);
|
2022-01-07 12:17:31 +01:00
|
|
|
int has_racy_timestamp(struct index_state *istate);
|
2019-04-29 10:28:14 +02:00
|
|
|
int ie_match_stat(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
|
|
|
|
int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
|
2007-11-10 09:15:03 +01:00
|
|
|
|
2011-05-08 10:47:33 +02:00
|
|
|
#define HASH_WRITE_OBJECT 1
|
|
|
|
#define HASH_FORMAT_CHECK 2
|
2017-11-16 17:38:28 +01:00
|
|
|
#define HASH_RENORMALIZE 4
|
2021-10-12 16:30:49 +02:00
|
|
|
#define HASH_SILENT 8
|
2019-04-29 10:28:14 +02:00
|
|
|
int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
|
|
|
|
int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags);
|
2013-06-20 10:37:50 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Record to sd the data from st that we use to check whether a file
|
|
|
|
* might have changed.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
void fill_stat_data(struct stat_data *sd, struct stat *st);
|
2013-06-20 10:37:50 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return 0 if st is consistent with a file not having been changed
|
|
|
|
* since sd was filled. If there are differences, return a
|
|
|
|
* combination of MTIME_CHANGED, CTIME_CHANGED, OWNER_CHANGED,
|
|
|
|
* INODE_CHANGED, and DATA_CHANGED.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int match_stat_data(const struct stat_data *sd, struct stat *st);
|
|
|
|
int match_stat_data_racy(const struct index_state *istate,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct stat_data *sd, struct stat *st);
|
2013-06-20 10:37:50 +02:00
|
|
|
|
2019-05-24 14:23:47 +02:00
|
|
|
void fill_stat_cache_info(struct index_state *istate, struct cache_entry *ce, struct stat *st);
|
2005-05-15 23:23:12 +02:00
|
|
|
|
2021-04-08 22:41:26 +02:00
|
|
|
#define REFRESH_REALLY (1 << 0) /* ignore_valid */
|
|
|
|
#define REFRESH_UNMERGED (1 << 1) /* allow unmerged */
|
|
|
|
#define REFRESH_QUIET (1 << 2) /* be quiet about it */
|
|
|
|
#define REFRESH_IGNORE_MISSING (1 << 3) /* ignore non-existent */
|
|
|
|
#define REFRESH_IGNORE_SUBMODULES (1 << 4) /* ignore submodules */
|
|
|
|
#define REFRESH_IN_PORCELAIN (1 << 5) /* user friendly output, not "needs update" */
|
|
|
|
#define REFRESH_PROGRESS (1 << 6) /* show progress bar if stderr is tty */
|
|
|
|
#define REFRESH_IGNORE_SKIP_WORKTREE (1 << 7) /* ignore skip_worktree entries */
|
2019-04-29 10:28:14 +02:00
|
|
|
int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg);
|
2019-09-11 20:20:25 +02:00
|
|
|
/*
|
|
|
|
* Refresh the index and write it to disk.
|
|
|
|
*
|
|
|
|
* 'refresh_flags' is passed directly to 'refresh_index()', while
|
|
|
|
* 'COMMIT_LOCK | write_flags' is passed to 'write_locked_index()', so
|
|
|
|
* the lockfile is always either committed or rolled back.
|
|
|
|
*
|
|
|
|
* If 'gentle' is passed, errors locking the index are ignored.
|
|
|
|
*
|
|
|
|
* Return 1 if refreshing the index returns an error, -1 if writing
|
|
|
|
* the index to disk fails, 0 on success.
|
|
|
|
*
|
|
|
|
* Note that if refreshing the index returns an error, we still write
|
|
|
|
* out the index (unless locking fails).
|
|
|
|
*/
|
|
|
|
int repo_refresh_and_write_index(struct repository*, unsigned int refresh_flags, unsigned int write_flags, int gentle, const struct pathspec *, char *seen, const char *header_msg);
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
struct cache_entry *refresh_cache_entry(struct index_state *, struct cache_entry *, unsigned int);
|
2006-05-19 18:56:35 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void set_alternate_index_output(const char *);
|
2014-10-01 12:28:42 +02:00
|
|
|
|
2017-04-14 22:32:21 +02:00
|
|
|
extern int verify_index_checksum;
|
2017-10-18 16:27:25 +02:00
|
|
|
extern int verify_ce_order;
|
2017-04-14 22:32:21 +02:00
|
|
|
|
2006-02-27 23:47:45 +01:00
|
|
|
/* Environment bits from configuration mechanism */
|
2005-10-11 01:31:08 +02:00
|
|
|
extern int trust_executable_bit;
|
2008-07-28 08:31:28 +02:00
|
|
|
extern int trust_ctime;
|
2013-01-22 08:49:22 +01:00
|
|
|
extern int check_stat;
|
2007-06-25 00:11:24 +02:00
|
|
|
extern int quote_path_fully;
|
2007-03-02 22:11:30 +01:00
|
|
|
extern int has_symlinks;
|
2010-10-28 20:28:04 +02:00
|
|
|
extern int minimum_abbrev, default_abbrev;
|
2008-03-22 00:52:46 +01:00
|
|
|
extern int ignore_case;
|
2006-02-09 06:15:24 +01:00
|
|
|
extern int assume_unchanged;
|
2006-05-02 09:40:24 +02:00
|
|
|
extern int prefer_symlink_refs;
|
2006-03-21 03:45:47 +01:00
|
|
|
extern int warn_ambiguous_refs;
|
cat-file: disable object/refname ambiguity check for batch mode
A common use of "cat-file --batch-check" is to feed a list
of objects from "rev-list --objects" or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().
Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35f
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca91993
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.
With no patches, this is the time it takes to run:
$ git rev-list --objects --all >objects
$ time git cat-file --batch-check='%(objectname)' <objects
on the linux.git repository:
real 1m13.494s
user 0m25.924s
sys 0m47.532s
If we revert ca91993, the packed-refs up-to-date check, it
gets a little better:
real 0m54.697s
user 0m21.692s
sys 0m32.916s
but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues). If we revert 798c35f, disabling
the warning entirely, we get a much more reasonable time:
real 0m7.452s
user 0m6.836s
sys 0m0.608s
This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 08:20:05 +02:00
|
|
|
extern int warn_on_object_refname_ambiguity;
|
config: drop git_config_get_string_const()
As evidenced by the leak fixes in the previous commit, the "const" in
git_config_get_string_const() clearly misleads people into thinking that
it does not allocate a copy of the string. We can fix this by renaming
it, but it's easier still to just drop it. Of the four remaining
callers:
- The one in git_config_parse_expiry() still needs to allocate, since
that's what its callers expect. We can just use the non-const
version and cast our pointer. Slightly ugly, but the damage is
contained in one spot.
- The two in apply are writing to global "const char *" variables, and
need to continue allocating. We often mark these as const because we
assign default string literals to them. But in this case we don't do
that, so we can just declare them as real "char *" pointers and use
the non-const version.
- The call in checkout doesn't actually need a copy; it can just use
the non-allocating "tmp" version of the function.
The function is also mentioned in the MyFirstContribution document. We
can swap that call out for the non-allocating "tmp" variant, which fits
well in the example given.
We'll drop the "configset" and "repo" variants, as well (which are
unused).
Note that this frees up the "const" name, so we could rename the "tmp"
variant back to that. But let's give some time for topics in flight to
adapt to the new code before doing so (if we do it too soon, the
function semantics will change but the compiler won't alert us).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 23:33:11 +02:00
|
|
|
extern char *apply_default_whitespace;
|
|
|
|
extern char *apply_default_ignorewhitespace;
|
2011-10-06 20:22:24 +02:00
|
|
|
extern const char *git_attributes_file;
|
2016-05-05 00:58:12 +02:00
|
|
|
extern const char *git_hooks_path;
|
2006-07-03 22:11:47 +02:00
|
|
|
extern int zlib_compression_level;
|
2016-11-16 02:42:40 +01:00
|
|
|
extern int pack_compression_level;
|
2006-12-23 08:34:28 +01:00
|
|
|
extern size_t packed_git_window_size;
|
2006-12-23 08:33:35 +01:00
|
|
|
extern size_t packed_git_limit;
|
2007-03-19 06:14:37 +01:00
|
|
|
extern size_t delta_base_cache_limit;
|
2011-04-05 19:44:11 +02:00
|
|
|
extern unsigned long big_file_threshold;
|
2011-10-28 23:48:40 +02:00
|
|
|
extern unsigned long pack_size_limit_cfg;
|
2014-02-18 12:24:55 +01:00
|
|
|
|
2016-09-13 05:24:23 +02:00
|
|
|
/*
|
|
|
|
* Accessors for the core.sharedrepository config which lazy-load the value
|
|
|
|
* from the config (if not already set). The "reset" function can be
|
|
|
|
* used to unset "set" or cached value, meaning that the value will be loaded
|
|
|
|
* fresh from the config file on the next call to get_shared_repository().
|
|
|
|
*/
|
2016-03-11 23:36:49 +01:00
|
|
|
void set_shared_repository(int value);
|
|
|
|
int get_shared_repository(void);
|
2016-09-13 05:24:23 +02:00
|
|
|
void reset_shared_repository(void);
|
2016-03-11 23:36:49 +01:00
|
|
|
|
2014-02-18 12:24:55 +01:00
|
|
|
/*
|
|
|
|
* Do replace refs need to be checked this run? This variable is
|
|
|
|
* initialized to true unless --no-replace-object is used or
|
|
|
|
* $GIT_NO_REPLACE_OBJECTS is set, but is set to false by some
|
2018-07-18 22:44:49 +02:00
|
|
|
* commands that do not want replace references to be active.
|
2014-02-18 12:24:55 +01:00
|
|
|
*/
|
2018-07-18 22:45:20 +02:00
|
|
|
extern int read_replace_refs;
|
2014-02-18 12:24:55 +01:00
|
|
|
|
2022-03-10 23:43:21 +01:00
|
|
|
/*
|
|
|
|
* These values are used to help identify parts of a repository to fsync.
|
|
|
|
* FSYNC_COMPONENT_NONE identifies data that will not be a persistent part of the
|
|
|
|
* repository and so shouldn't be fsynced.
|
|
|
|
*/
|
|
|
|
enum fsync_component {
|
|
|
|
FSYNC_COMPONENT_NONE,
|
|
|
|
FSYNC_COMPONENT_LOOSE_OBJECT = 1 << 0,
|
|
|
|
FSYNC_COMPONENT_PACK = 1 << 1,
|
|
|
|
FSYNC_COMPONENT_PACK_METADATA = 1 << 2,
|
|
|
|
FSYNC_COMPONENT_COMMIT_GRAPH = 1 << 3,
|
2022-03-10 23:43:23 +01:00
|
|
|
FSYNC_COMPONENT_INDEX = 1 << 4,
|
2022-03-11 10:58:59 +01:00
|
|
|
FSYNC_COMPONENT_REFERENCE = 1 << 5,
|
2022-03-10 23:43:21 +01:00
|
|
|
};
|
|
|
|
|
2022-03-15 20:12:45 +01:00
|
|
|
#define FSYNC_COMPONENTS_OBJECTS (FSYNC_COMPONENT_LOOSE_OBJECT | \
|
|
|
|
FSYNC_COMPONENT_PACK)
|
|
|
|
|
|
|
|
#define FSYNC_COMPONENTS_DERIVED_METADATA (FSYNC_COMPONENT_PACK_METADATA | \
|
|
|
|
FSYNC_COMPONENT_COMMIT_GRAPH)
|
|
|
|
|
2022-03-29 23:41:52 +02:00
|
|
|
#define FSYNC_COMPONENTS_DEFAULT ((FSYNC_COMPONENTS_OBJECTS | \
|
|
|
|
FSYNC_COMPONENTS_DERIVED_METADATA) & \
|
2022-03-15 20:12:45 +01:00
|
|
|
~FSYNC_COMPONENT_LOOSE_OBJECT)
|
|
|
|
|
2022-03-11 10:58:59 +01:00
|
|
|
#define FSYNC_COMPONENTS_COMMITTED (FSYNC_COMPONENTS_OBJECTS | \
|
|
|
|
FSYNC_COMPONENT_REFERENCE)
|
2022-03-15 20:12:45 +01:00
|
|
|
|
|
|
|
#define FSYNC_COMPONENTS_ADDED (FSYNC_COMPONENTS_COMMITTED | \
|
|
|
|
FSYNC_COMPONENT_INDEX)
|
|
|
|
|
|
|
|
#define FSYNC_COMPONENTS_ALL (FSYNC_COMPONENT_LOOSE_OBJECT | \
|
|
|
|
FSYNC_COMPONENT_PACK | \
|
|
|
|
FSYNC_COMPONENT_PACK_METADATA | \
|
|
|
|
FSYNC_COMPONENT_COMMIT_GRAPH | \
|
2022-03-11 10:58:59 +01:00
|
|
|
FSYNC_COMPONENT_INDEX | \
|
|
|
|
FSYNC_COMPONENT_REFERENCE)
|
2022-03-10 23:43:21 +01:00
|
|
|
|
2022-04-05 07:20:14 +02:00
|
|
|
#ifndef FSYNC_COMPONENTS_PLATFORM_DEFAULT
|
|
|
|
#define FSYNC_COMPONENTS_PLATFORM_DEFAULT FSYNC_COMPONENTS_DEFAULT
|
|
|
|
#endif
|
|
|
|
|
2022-03-10 23:43:21 +01:00
|
|
|
/*
|
|
|
|
* A bitmask indicating which components of the repo should be fsynced.
|
|
|
|
*/
|
|
|
|
extern enum fsync_component fsync_components;
|
2008-06-19 00:18:44 +02:00
|
|
|
extern int fsync_object_files;
|
2021-10-29 02:15:52 +02:00
|
|
|
extern int use_fsync;
|
2022-03-10 23:43:20 +01:00
|
|
|
|
|
|
|
enum fsync_method {
|
|
|
|
FSYNC_METHOD_FSYNC,
|
core.fsyncmethod: batched disk flushes for loose-objects
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.
One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.
When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.
At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
another fsync internal to that operation. This is not the default
today, but the user now has the option of syncing the index and there
is a separate patch series to implement syncing of refs.
On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.
Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.
In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.
In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.
_Performance numbers_:
Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
Adding 500 files to the repo with 'git add' Times reported in seconds.
object file syncing | Linux | Mac | Windows
--------------------|-------|-------|--------
disabled | 0.06 | 0.35 | 0.61
fsync | 1.88 | 11.18 | 2.47
batch | 0.15 | 0.41 | 1.53
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-05 07:20:09 +02:00
|
|
|
FSYNC_METHOD_WRITEOUT_ONLY,
|
|
|
|
FSYNC_METHOD_BATCH,
|
2022-03-10 23:43:20 +01:00
|
|
|
};
|
|
|
|
|
|
|
|
extern enum fsync_method fsync_method;
|
2008-11-14 01:36:30 +01:00
|
|
|
extern int core_preload_index;
|
git on Mac OS and precomposed unicode
Mac OS X mangles file names containing unicode on file systems HFS+,
VFAT or SAMBA. When a file using unicode code points outside ASCII
is created on a HFS+ drive, the file name is converted into
decomposed unicode and written to disk. No conversion is done if
the file name is already decomposed unicode.
Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same
result as open("\x41\xcc\x88",...) with a decomposed "Ä".
As a consequence, readdir() returns the file names in decomposed
unicode, even if the user expects precomposed unicode. Unlike on
HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in
precomposed unicode, but readdir() still returns file names in
decomposed unicode. When a git repository is stored on a network
share using SAMBA, file names are send over the wire and written to
disk on the remote system in precomposed unicode, but Mac OS X
readdir() returns decomposed unicode to be compatible with its
behaviour on HFS+ and VFAT.
The unicode decomposition causes many problems:
- The names "git add" and other commands get from the end user may
often be precomposed form (the decomposed form is not easily input
from the keyboard), but when the commands read from the filesystem
to see what it is going to update the index with already is on the
filesystem, readdir() will give decomposed form, which is different.
- Similarly "git log", "git mv" and all other commands that need to
compare pathnames found on the command line (often but not always
precomposed form; a command line input resulting from globbing may
be in decomposed) with pathnames found in the tree objects (should
be precomposed form to be compatible with other systems and for
consistency in general).
- The same for names stored in the index, which should be
precomposed, that may need to be compared with the names read from
readdir().
NFS mounted from Linux is fully transparent and does not suffer from
the above.
As Mac OS X treats precomposed and decomposed file names as equal,
we can
- wrap readdir() on Mac OS X to return the precomposed form, and
- normalize decomposed form given from the command line also to the
precomposed form,
to ensure that all pathnames used in Git are always in the
precomposed form. This behaviour can be requested by setting
"core.precomposedunicode" configuration variable to true.
The code in compat/precomposed_utf8.c implements basically 4 new
functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(),
precomposed_utf8_closedir() and precompose_argv(). The first three
are to wrap opendir(3), readdir(3), and closedir(3) functions.
The argv[] conversion allows to use the TAB filename completion done
by the shell on command line. It tolerates other tools which use
readdir() to feed decomposed file names into git.
When creating a new git repository with "git init" or "git clone",
"core.precomposedunicode" will be set "false".
The user needs to activate this feature manually. She typically
sets core.precomposedunicode to "true" on HFS and VFAT, or file
systems mounted via SAMBA.
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 15:50:25 +02:00
|
|
|
extern int precomposed_unicode;
|
2014-12-16 00:15:20 +01:00
|
|
|
extern int protect_hfs;
|
2014-12-16 23:46:59 +01:00
|
|
|
extern int protect_ntfs;
|
2005-10-11 01:31:08 +02:00
|
|
|
|
2019-12-31 14:17:48 +01:00
|
|
|
extern int core_apply_sparse_checkout;
|
|
|
|
extern int core_sparse_checkout_cone;
|
repo_read_index: add config to expect files outside sparse patterns
Typically with sparse checkouts, we expect files outside the sparsity
patterns to be marked as SKIP_WORKTREE and be missing from the working
tree. Sometimes this expectation would be violated however; including
in cases such as:
* users grabbing files from elsewhere and writing them to the worktree
(perhaps by editing a cached copy in an editor, copying/renaming, or
even untarring)
* various git commands having incomplete or no support for the
SKIP_WORKTREE bit[1,2]
* users attempting to "abort" a sparse-checkout operation with a
not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
working tree is not atomic)[3].
When the SKIP_WORKTREE bit in the index did not reflect the presence of
the file in the working tree, it traditionally caused confusion and was
difficult to detect and recover from. So, in a sparse checkout, since
af6a51875a (repo_read_index: clear SKIP_WORKTREE bit from files present
in worktree, 2022-01-14), Git automatically clears the SKIP_WORKTREE
bit at index read time for entries corresponding to files that are
present in the working tree.
There is another workflow, however, where it is expected that paths
outside the sparsity patterns appear to exist in the working tree and
that they do not lose the SKIP_WORKTREE bit, at least until they get
modified. A Git-aware virtual file system[4] takes advantage of its
position as a file system driver to expose all files in the working
tree, fetch them on demand using partial clone on access, and tell Git
to pay attention to them on demand by updating the sparse checkout
pattern on writes. This means that commands like "git status" only have
to examine files that have potentially been modified, whereas commands
like "ls" are able to show the entire codebase without requiring manual
updates to the sparse checkout pattern.
Thus since af6a51875a, Git with such Git-aware virtual file systems
unsets the SKIP_WORKTREE bit for all files and commands like "git
status" have to fetch and examine them all.
Introduce a configuration setting sparse.expectFilesOutsideOfPatterns to
allow limiting the tracked set of files to a small set once again. A
Git-aware virtual file system or other application that wants to
maintain files outside of the sparse checkout can set this in a
repository to instruct Git not to check for the presence of
SKIP_WORKTREE files. The setting defaults to false, so most users of
sparse checkout will still get the benefit of an automatically updating
index to recover from the variety of difficult issues detailed in
af6a51875a for paths with SKIP_WORKTREE set despite the path being
present.
[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] The three long paragraphs in the middle of
https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[4] such as the vfsd described in
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-26 07:12:22 +01:00
|
|
|
extern int sparse_expect_files_outside_of_patterns;
|
2019-11-21 23:04:40 +01:00
|
|
|
|
git: add --no-optional-locks option
Some tools like IDEs or fancy editors may periodically run
commands like "git status" in the background to keep track
of the state of the repository. Some of these commands may
refresh the index and write out the result in an
opportunistic way: if they can get the index lock, then they
update the on-disk index with any updates they find. And if
not, then their in-core refresh is lost and just has to be
recomputed by the next caller.
But taking the index lock may conflict with other operations
in the repository. Especially ones that the user is doing
themselves, which _aren't_ opportunistic. In other words,
"git status" knows how to back off when somebody else is
holding the lock, but other commands don't know that status
would be happy to drop the lock if somebody else wanted it.
There are a couple possible solutions:
1. Have some kind of "pseudo-lock" that allows other
commands to tell status that they want the lock.
This is likely to be complicated and error-prone to
implement (and maybe even impossible with just
dotlocks to work from, as it requires some
inter-process communication).
2. Avoid background runs of commands like "git status"
that want to do opportunistic updates, preferring
instead plumbing like diff-files, etc.
This is awkward for a couple of reasons. One is that
"status --porcelain" reports a lot more about the
repository state than is available from individual
plumbing commands. And two is that we actually _do_
want to see the refreshed index. We just don't want to
take a lock or write out the result. Whereas commands
like diff-files expect us to refresh the index
separately and write it to disk so that they can depend
on the result. But that write is exactly what we're
trying to avoid.
3. Ask "status" not to lock or write the index.
This is easy to implement. The big downside is that any
work done in refreshing the index for such a call is
lost when the process exits. So a background process
may end up re-hashing a changed file multiple times
until the user runs a command that does an index
refresh themselves.
This patch implements the option 3. The idea (and the test)
is largely stolen from a Git for Windows patch by Johannes
Schindelin, 67e5ce7f63 (status: offer *not* to lock the
index and update it, 2016-08-12). The twist here is that
instead of making this an option to "git status", it becomes
a "git" option and matching environment variable.
The reason there is two-fold:
1. An environment variable is carried through to
sub-processes. And whether an invocation is a
background process or not should apply to the whole
process tree. So you could do "git --no-optional-locks
foo", and if "foo" is a script or alias that calls
"status", you'll still get the effect.
2. There may be other programs that want the same
treatment.
I've punted here on finding more callers to convert,
since "status" is the obvious one to call as a repeated
background job. But "git diff"'s opportunistic refresh
of the index may be a good candidate.
The test is taken from 67e5ce7f63, and it's worth repeating
Johannes's explanation:
Note that the regression test added in this commit does
not *really* verify that no index.lock file was written;
that test is not possible in a portable way. Instead, we
verify that .git/index is rewritten *only* when `git
status` is run without `--no-optional-locks`.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 08:54:30 +02:00
|
|
|
/*
|
|
|
|
* Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
|
|
|
|
*/
|
|
|
|
int use_optional_locks(void);
|
|
|
|
|
2013-01-16 20:18:48 +01:00
|
|
|
/*
|
|
|
|
* The character that begins a commented line in user-editable file
|
|
|
|
* that is subject to stripspace.
|
|
|
|
*/
|
|
|
|
extern char comment_line_char;
|
2014-05-17 03:52:23 +02:00
|
|
|
extern int auto_comment_line_char;
|
2013-01-16 20:18:48 +01:00
|
|
|
|
2017-01-27 11:09:47 +01:00
|
|
|
enum log_refs_config {
|
|
|
|
LOG_REFS_UNSET = -1,
|
|
|
|
LOG_REFS_NONE = 0,
|
|
|
|
LOG_REFS_NORMAL,
|
|
|
|
LOG_REFS_ALWAYS
|
|
|
|
};
|
|
|
|
extern enum log_refs_config log_all_ref_updates;
|
|
|
|
|
2008-05-11 00:36:29 +02:00
|
|
|
enum rebase_setup_type {
|
|
|
|
AUTOREBASE_NEVER = 0,
|
|
|
|
AUTOREBASE_LOCAL,
|
|
|
|
AUTOREBASE_REMOTE,
|
2010-05-14 11:31:35 +02:00
|
|
|
AUTOREBASE_ALWAYS
|
2008-05-11 00:36:29 +02:00
|
|
|
};
|
|
|
|
|
2009-03-16 16:42:51 +01:00
|
|
|
enum push_default_type {
|
|
|
|
PUSH_DEFAULT_NOTHING = 0,
|
|
|
|
PUSH_DEFAULT_MATCHING,
|
2012-04-24 09:50:03 +02:00
|
|
|
PUSH_DEFAULT_SIMPLE,
|
2011-02-16 01:54:24 +01:00
|
|
|
PUSH_DEFAULT_UPSTREAM,
|
push: Provide situational hints for non-fast-forward errors
Pushing a non-fast-forward update to a remote repository will result in
an error, but the hint text doesn't provide the correct resolution in
every case. Give better resolution advice in three push scenarios:
1) If you push your current branch and it triggers a non-fast-forward
error, you should merge remote changes with 'git pull' before pushing
again.
2) If you push to a shared repository others push to, and your local
tracking branches are not kept up to date, the 'matching refs' default
will generate non-fast-forward errors on outdated branches. If this is
your workflow, the 'matching refs' default is not for you. Consider
setting the 'push.default' configuration variable to 'current' or
'upstream' to ensure only your current branch is pushed.
3) If you explicitly specify a ref that is not your current branch or
push matching branches with ':', you will generate a non-fast-forward
error if any pushed branch tip is out of date. You should checkout the
offending branch and merge remote changes before pushing again.
Teach transport.c to recognize these scenarios and configure push.c
to hint for them. If 'git push's default behavior changes or we
discover more scenarios, extension is easy. Standardize on the
advice API and add three new advice variables, 'pushNonFFCurrent',
'pushNonFFDefault', and 'pushNonFFMatching'. Setting any of these
to 'false' will disable their affiliated advice. Setting
'pushNonFastForward' to false will disable all three, thus preserving the
config option for users who already set it, but guaranteeing new
users won't disable push advice accidentally.
Based-on-patch-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Christopher Tiwald <christiwald@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-20 05:31:33 +01:00
|
|
|
PUSH_DEFAULT_CURRENT,
|
|
|
|
PUSH_DEFAULT_UNSPECIFIED
|
2009-03-16 16:42:51 +01:00
|
|
|
};
|
|
|
|
|
2008-05-11 00:36:29 +02:00
|
|
|
extern enum rebase_setup_type autorebase;
|
2009-03-16 16:42:51 +01:00
|
|
|
extern enum push_default_type push_default;
|
2008-02-19 17:24:37 +01:00
|
|
|
|
2009-04-28 00:32:25 +02:00
|
|
|
enum object_creation_mode {
|
|
|
|
OBJECT_CREATION_USES_HARDLINKS = 0,
|
2010-05-14 11:31:35 +02:00
|
|
|
OBJECT_CREATION_USES_RENAMES = 1
|
2009-04-28 00:32:25 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
extern enum object_creation_mode object_creation_mode;
|
2009-04-25 11:57:14 +02:00
|
|
|
|
2009-10-09 12:21:57 +02:00
|
|
|
extern char *notes_ref_name;
|
|
|
|
|
2009-07-23 17:33:49 +02:00
|
|
|
extern int grafts_replace_parents;
|
|
|
|
|
introduce "extensions" form of core.repositoryformatversion
Normally we try to avoid bumps of the whole-repository
core.repositoryformatversion field. However, it is
unavoidable if we want to safely change certain aspects of
git in a backwards-incompatible way (e.g., modifying the set
of ref tips that we must traverse to generate a list of
unreachable, safe-to-prune objects).
If we were to bump the repository version for every such
change, then any implementation understanding version `X`
would also have to understand `X-1`, `X-2`, and so forth,
even though the incompatibilities may be in orthogonal parts
of the system, and there is otherwise no reason we cannot
implement one without the other (or more importantly, that
the user cannot choose to use one feature without the other,
weighing the tradeoff in compatibility only for that
particular feature).
This patch documents the existing repositoryformatversion
strategy and introduces a new format, "1", which lets a
repository specify that it must run with an arbitrary set of
extensions. This can be used, for example:
- to inform git that the objects should not be pruned based
only on the reachability of the ref tips (e.g, because it
has "clone --shared" children)
- that the refs are stored in a format besides the usual
"refs" and "packed-refs" directories
Because we bump to format "1", and because format "1"
requires that a running git knows about any extensions
mentioned, we know that older versions of the code will not
do something dangerous when confronted with these new
formats.
For example, if the user chooses to use database storage for
refs, they may set the "extensions.refbackend" config to
"db". Older versions of git will not understand format "1"
and bail. Versions of git which understand "1" but do not
know about "refbackend", or which know about "refbackend"
but not about the "db" backend, will refuse to run. This is
annoying, of course, but much better than the alternative of
claiming that there are no refs in the repository, or
writing to a location that other implementations will not
read.
Note that we are only defining the rules for format 1 here.
We do not ever write format 1 ourselves; it is a tool that
is meant to be used by users and future extensions to
provide safety with older implementations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
|
|
|
/*
|
|
|
|
* GIT_REPO_VERSION is the version we write by default. The
|
|
|
|
* _READ variant is the highest number we know how to
|
|
|
|
* handle.
|
|
|
|
*/
|
2005-11-26 00:59:09 +01:00
|
|
|
#define GIT_REPO_VERSION 0
|
introduce "extensions" form of core.repositoryformatversion
Normally we try to avoid bumps of the whole-repository
core.repositoryformatversion field. However, it is
unavoidable if we want to safely change certain aspects of
git in a backwards-incompatible way (e.g., modifying the set
of ref tips that we must traverse to generate a list of
unreachable, safe-to-prune objects).
If we were to bump the repository version for every such
change, then any implementation understanding version `X`
would also have to understand `X-1`, `X-2`, and so forth,
even though the incompatibilities may be in orthogonal parts
of the system, and there is otherwise no reason we cannot
implement one without the other (or more importantly, that
the user cannot choose to use one feature without the other,
weighing the tradeoff in compatibility only for that
particular feature).
This patch documents the existing repositoryformatversion
strategy and introduces a new format, "1", which lets a
repository specify that it must run with an arbitrary set of
extensions. This can be used, for example:
- to inform git that the objects should not be pruned based
only on the reachability of the ref tips (e.g, because it
has "clone --shared" children)
- that the refs are stored in a format besides the usual
"refs" and "packed-refs" directories
Because we bump to format "1", and because format "1"
requires that a running git knows about any extensions
mentioned, we know that older versions of the code will not
do something dangerous when confronted with these new
formats.
For example, if the user chooses to use database storage for
refs, they may set the "extensions.refbackend" config to
"db". Older versions of git will not understand format "1"
and bail. Versions of git which understand "1" but do not
know about "refbackend", or which know about "refbackend"
but not about the "db" backend, will refuse to run. This is
annoying, of course, but much better than the alternative of
claiming that there are no refs in the repository, or
writing to a location that other implementations will not
read.
Note that we are only defining the rules for format 1 here.
We do not ever write format 1 ourselves; it is a tool that
is meant to be used by users and future extensions to
provide safety with older implementations.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
|
|
|
#define GIT_REPO_VERSION_READ 1
|
2015-06-23 12:54:11 +02:00
|
|
|
extern int repository_format_precious_objects;
|
2018-10-21 16:02:28 +02:00
|
|
|
extern int repository_format_worktree_config;
|
2016-03-11 23:36:45 +01:00
|
|
|
|
setup: fix memory leaks with `struct repository_format`
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.
Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.
Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)
We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
|
|
|
/*
|
|
|
|
* You _have_ to initialize a `struct repository_format` using
|
|
|
|
* `= REPOSITORY_FORMAT_INIT` before calling `read_repository_format()`.
|
|
|
|
*/
|
2016-03-11 23:37:07 +01:00
|
|
|
struct repository_format {
|
|
|
|
int version;
|
|
|
|
int precious_objects;
|
2017-12-05 17:58:43 +01:00
|
|
|
char *partial_clone; /* value of extensions.partialclone */
|
2018-10-21 16:02:28 +02:00
|
|
|
int worktree_config;
|
2016-03-11 23:37:07 +01:00
|
|
|
int is_bare;
|
2017-11-12 22:28:53 +01:00
|
|
|
int hash_algo;
|
2021-03-30 15:10:59 +02:00
|
|
|
int sparse_index;
|
2016-03-11 23:37:07 +01:00
|
|
|
char *work_tree;
|
|
|
|
struct string_list unknown_extensions;
|
2020-07-16 14:25:13 +02:00
|
|
|
struct string_list v1_only_extensions;
|
2016-03-11 23:37:07 +01:00
|
|
|
};
|
|
|
|
|
setup: fix memory leaks with `struct repository_format`
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.
Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.
Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)
We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
|
|
|
/*
|
|
|
|
* Always use this to initialize a `struct repository_format`
|
|
|
|
* to a well-defined, default state before calling
|
|
|
|
* `read_repository()`.
|
|
|
|
*/
|
|
|
|
#define REPOSITORY_FORMAT_INIT \
|
|
|
|
{ \
|
|
|
|
.version = -1, \
|
|
|
|
.is_bare = -1, \
|
|
|
|
.hash_algo = GIT_HASH_SHA1, \
|
|
|
|
.unknown_extensions = STRING_LIST_INIT_DUP, \
|
2020-07-16 14:25:13 +02:00
|
|
|
.v1_only_extensions = STRING_LIST_INIT_DUP, \
|
setup: fix memory leaks with `struct repository_format`
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.
Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.
Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)
We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
|
|
|
}
|
|
|
|
|
2016-03-11 23:37:07 +01:00
|
|
|
/*
|
|
|
|
* Read the repository format characteristics from the config file "path" into
|
setup: fix memory leaks with `struct repository_format`
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.
Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.
Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)
We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
|
|
|
* "format" struct. Returns the numeric version. On error, or if no version is
|
|
|
|
* found in the configuration, -1 is returned, format->version is set to -1,
|
|
|
|
* and all other fields in the struct are set to the default configuration
|
|
|
|
* (REPOSITORY_FORMAT_INIT). Always initialize the struct using
|
|
|
|
* REPOSITORY_FORMAT_INIT before calling this function.
|
2016-03-11 23:37:07 +01:00
|
|
|
*/
|
|
|
|
int read_repository_format(struct repository_format *format, const char *path);
|
|
|
|
|
setup: fix memory leaks with `struct repository_format`
After we set up a `struct repository_format`, it owns various pieces of
allocated memory. We then either use those members, because we decide we
want to use the "candidate" repository format, or we discard the
candidate / scratch space. In the first case, we transfer ownership of
the memory to a few global variables. In the latter case, we just
silently drop the struct and end up leaking memory.
Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a
function `clear_repository_format()`, to be used on each side of
`read_repository_format()`. To have a clear and simple memory ownership,
let all users of `struct repository_format` duplicate the strings that
they take from it, rather than stealing the pointers.
Call `clear_...()` at the start of `read_...()` instead of just zeroing
the struct, since we sometimes enter the function multiple times. Thus,
it is important to initialize the struct before calling `read_...()`, so
document that. It's also important because we might not even call
`read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c.
Teach `read_...()` to clear the struct on error, so that it is reset to
a safe state, and document this. (In `setup_git_directory_gently()`, we
look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we
weren't actually supposed to do per the API. After this commit, that's
ok.)
We inherit the existing code's combining "error" and "no version found".
Both are signalled through `version == -1` and now both cause us to
clear any partial configuration we have picked up. For "extensions.*",
that's fine, since they require a positive version number. For
"core.bare" and "core.worktree", we're already verifying that we have a
non-negative version number before using them.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
|
|
|
/*
|
|
|
|
* Free the memory held onto by `format`, but not the struct itself.
|
|
|
|
* (No need to use this after `read_repository_format()` fails.)
|
|
|
|
*/
|
|
|
|
void clear_repository_format(struct repository_format *format);
|
|
|
|
|
2016-03-11 23:37:07 +01:00
|
|
|
/*
|
|
|
|
* Verify that the repository described by repository_format is something we
|
|
|
|
* can read. If it is, return 0. Otherwise, return -1, and "err" will describe
|
|
|
|
* any errors encountered.
|
|
|
|
*/
|
|
|
|
int verify_repository_format(const struct repository_format *format,
|
|
|
|
struct strbuf *err);
|
|
|
|
|
2016-03-11 23:36:45 +01:00
|
|
|
/*
|
|
|
|
* Check the repository format version in the path found in get_git_dir(),
|
|
|
|
* and die if it is a version we don't understand. Generally one would
|
|
|
|
* set_git_dir() before calling this, and use it only for "are we in a valid
|
|
|
|
* repo?".
|
2020-02-22 21:17:37 +01:00
|
|
|
*
|
|
|
|
* If successful and fmt is not NULL, fill fmt with data.
|
2016-03-11 23:36:45 +01:00
|
|
|
*/
|
2020-02-22 21:17:37 +01:00
|
|
|
void check_repository_format(struct repository_format *fmt);
|
2005-11-26 00:59:09 +01:00
|
|
|
|
2005-04-09 18:48:20 +02:00
|
|
|
#define MTIME_CHANGED 0x0001
|
|
|
|
#define CTIME_CHANGED 0x0002
|
|
|
|
#define OWNER_CHANGED 0x0004
|
|
|
|
#define MODE_CHANGED 0x0008
|
|
|
|
#define INODE_CHANGED 0x0010
|
|
|
|
#define DATA_CHANGED 0x0020
|
2005-05-05 14:38:25 +02:00
|
|
|
#define TYPE_CHANGED 0x0040
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2015-09-24 23:05:45 +02:00
|
|
|
/*
|
|
|
|
* Return an abbreviated sha1 unique within this repository's object database.
|
|
|
|
* The result will be at least `len` characters long, and will be NUL
|
|
|
|
* terminated.
|
|
|
|
*
|
2016-10-20 08:19:19 +02:00
|
|
|
* The non-`_r` version returns a static buffer which remains valid until 4
|
|
|
|
* more calls to find_unique_abbrev are made.
|
2015-09-24 23:05:45 +02:00
|
|
|
*
|
|
|
|
* The `_r` variant writes to a buffer supplied by the caller, which must be at
|
2018-03-12 03:27:30 +01:00
|
|
|
* least `GIT_MAX_HEXSZ + 1` bytes. The return value is the number of bytes
|
2015-09-24 23:05:45 +02:00
|
|
|
* written (excluding the NUL terminator).
|
|
|
|
*
|
|
|
|
* Note that while this version avoids the static buffer, it is not fully
|
|
|
|
* reentrant, as it calls into other non-reentrant git code.
|
|
|
|
*/
|
2019-04-16 11:33:22 +02:00
|
|
|
const char *repo_find_unique_abbrev(struct repository *r, const struct object_id *oid, int len);
|
|
|
|
#define find_unique_abbrev(oid, len) repo_find_unique_abbrev(the_repository, oid, len)
|
|
|
|
int repo_find_unique_abbrev_r(struct repository *r, char *hex, const struct object_id *oid, int len);
|
|
|
|
#define find_unique_abbrev_r(hex, oid, len) repo_find_unique_abbrev_r(the_repository, hex, oid, len)
|
2015-09-24 23:05:45 +02:00
|
|
|
|
2010-02-22 23:32:13 +01:00
|
|
|
/* set default permissions by passing mode arguments to open(2) */
|
|
|
|
int git_mkstemps_mode(char *pattern, int suffix_len, int mode);
|
|
|
|
int git_mkstemp_mode(char *pattern, int mode);
|
|
|
|
|
2008-04-16 10:34:24 +02:00
|
|
|
/*
|
|
|
|
* NOTE NOTE NOTE!!
|
|
|
|
*
|
|
|
|
* PERM_UMASK, OLD_PERM_GROUP and OLD_PERM_EVERYBODY enumerations must
|
|
|
|
* not be changed. Old repositories have core.sharedrepository written in
|
|
|
|
* numeric format, and therefore these values are preserved for compatibility
|
|
|
|
* reasons.
|
|
|
|
*/
|
2006-06-10 08:09:49 +02:00
|
|
|
enum sharedrepo {
|
2008-04-16 10:34:24 +02:00
|
|
|
PERM_UMASK = 0,
|
|
|
|
OLD_PERM_GROUP = 1,
|
|
|
|
OLD_PERM_EVERYBODY = 2,
|
|
|
|
PERM_GROUP = 0660,
|
2010-05-14 11:31:35 +02:00
|
|
|
PERM_EVERYBODY = 0664
|
2006-06-10 08:09:49 +02:00
|
|
|
};
|
|
|
|
int git_config_perm(const char *var, const char *value);
|
2013-03-30 10:53:32 +01:00
|
|
|
int adjust_shared_perm(const char *path);
|
2014-01-06 14:45:25 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Create the directory containing the named path, using care to be
|
2017-01-06 17:22:25 +01:00
|
|
|
* somewhat safe against races. Return one of the scld_error values to
|
|
|
|
* indicate success/failure. On error, set errno to describe the
|
|
|
|
* problem.
|
2014-01-06 14:45:27 +01:00
|
|
|
*
|
|
|
|
* SCLD_VANISHED indicates that one of the ancestor directories of the
|
|
|
|
* path existed at one point during the function call and then
|
|
|
|
* suddenly vanished, probably because another process pruned the
|
|
|
|
* directory while we were working. To be robust against this kind of
|
|
|
|
* race, callers might want to try invoking the function again when it
|
|
|
|
* returns SCLD_VANISHED.
|
2016-04-24 04:34:12 +02:00
|
|
|
*
|
|
|
|
* safe_create_leading_directories() temporarily changes path while it
|
|
|
|
* is working but restores it before returning.
|
|
|
|
* safe_create_leading_directories_const() doesn't modify path, even
|
2020-12-02 00:45:04 +01:00
|
|
|
* temporarily. Both these variants adjust the permissions of the
|
|
|
|
* created directories to honor core.sharedRepository, so they are best
|
|
|
|
* suited for files inside the git dir. For working tree files, use
|
|
|
|
* safe_create_leading_directories_no_share() instead, as it ignores
|
|
|
|
* the core.sharedRepository setting.
|
2014-01-06 14:45:25 +01:00
|
|
|
*/
|
|
|
|
enum scld_error {
|
|
|
|
SCLD_OK = 0,
|
|
|
|
SCLD_FAILED = -1,
|
|
|
|
SCLD_PERMS = -2,
|
2014-01-06 14:45:27 +01:00
|
|
|
SCLD_EXISTS = -3,
|
|
|
|
SCLD_VANISHED = -4
|
2014-01-06 14:45:25 +01:00
|
|
|
};
|
|
|
|
enum scld_error safe_create_leading_directories(char *path);
|
|
|
|
enum scld_error safe_create_leading_directories_const(const char *path);
|
2020-12-02 00:45:04 +01:00
|
|
|
enum scld_error safe_create_leading_directories_no_share(char *path);
|
2014-01-06 14:45:25 +01:00
|
|
|
|
2011-03-11 01:02:50 +01:00
|
|
|
int mkdir_in_gitdir(const char *path);
|
2021-07-25 00:06:52 +02:00
|
|
|
char *interpolate_path(const char *path, int real_home);
|
2021-07-26 23:55:05 +02:00
|
|
|
/* NEEDSWORK: remove this synonym once in-flight topics have migrated */
|
|
|
|
#define expand_user_path interpolate_path
|
2011-10-04 22:02:00 +02:00
|
|
|
const char *enter_repo(const char *path, int strict);
|
2007-08-01 02:28:59 +02:00
|
|
|
static inline int is_absolute_path(const char *path)
|
|
|
|
{
|
2011-05-27 18:00:38 +02:00
|
|
|
return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
|
2007-08-01 02:28:59 +02:00
|
|
|
}
|
2008-09-09 10:27:07 +02:00
|
|
|
int is_directory(const char *);
|
2016-12-12 19:16:53 +01:00
|
|
|
char *strbuf_realpath(struct strbuf *resolved, const char *path,
|
|
|
|
int die_on_error);
|
abspath: add a function to resolve paths with missing components
Currently, we have a function to resolve paths, strbuf_realpath. This
function canonicalizes paths like realpath(3), but permits a trailing
component to be absent from the file system. In other words, this is
the behavior of the GNU realpath(1) without any arguments.
In the future, we'll need this same behavior, except that we want to
allow for any number of missing trailing components, which is the
behavior of GNU realpath(1) with the -m option. This is useful because
we'll want to canonicalize a path that may point to a not yet present
path under the .git directory. For example, a user may want to know
where an arbitrary ref would be stored if it existed in the file system.
Let's refactor strbuf_realpath to move most of the code to an internal
function and then pass it two flags to control its behavior. We'll add
a strbuf_realpath_forgiving function that has our new behavior, and
leave strbuf_realpath with the older, stricter behavior.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-13 01:25:28 +01:00
|
|
|
char *strbuf_realpath_forgiving(struct strbuf *resolved, const char *path,
|
|
|
|
int die_on_error);
|
2017-03-08 16:43:40 +01:00
|
|
|
char *real_pathdup(const char *path, int die_on_error);
|
2011-03-17 12:26:46 +01:00
|
|
|
const char *absolute_path(const char *path);
|
2017-01-26 18:47:45 +01:00
|
|
|
char *absolute_pathdup(const char *path);
|
2013-10-14 04:29:40 +02:00
|
|
|
const char *remove_leading_path(const char *in, const char *prefix);
|
2013-06-25 17:53:43 +02:00
|
|
|
const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
|
2013-07-14 10:36:03 +02:00
|
|
|
int normalize_path_copy_len(char *dst, const char *src, int *prefix_len);
|
2009-02-07 16:08:28 +01:00
|
|
|
int normalize_path_copy(char *dst, const char *src);
|
2012-10-28 17:16:24 +01:00
|
|
|
int longest_ancestor_length(const char *path, struct string_list *prefixes);
|
2009-02-19 20:10:49 +01:00
|
|
|
char *strip_path_suffix(const char *path, const char *suffix);
|
2009-11-09 20:26:43 +01:00
|
|
|
int daemon_avoid_alias(const char *path);
|
is_ntfs_dotgit: match other .git files
When we started to catch NTFS short names that clash with .git, we only
looked for GIT~1. This is sufficient because we only ever clone into an
empty directory, so .git is guaranteed to be the first subdirectory or
file in that directory.
However, even with a fresh clone, .gitmodules is *not* necessarily the
first file to be written that would want the NTFS short name GITMOD~1: a
malicious repository can add .gitmodul0000 and friends, which sorts
before `.gitmodules` and is therefore checked out *first*. For that
reason, we have to test not only for ~1 short names, but for others,
too.
It's hard to just adapt the existing checks in is_ntfs_dotgit(): since
Windows 2000 (i.e., in all Windows versions still supported by Git),
NTFS short names are only generated in the <prefix>~<number> form up to
number 4. After that, a *different* prefix is used, calculated from the
long file name using an undocumented, but stable algorithm.
For example, the short name of .gitmodules would be GITMOD~1, but if it
is taken, and all of ~2, ~3 and ~4 are taken, too, the short name
GI7EBA~1 will be used. From there, collisions are handled by
incrementing the number, shortening the prefix as needed (until ~9999999
is reached, in which case NTFS will not allow the file to be created).
We'd also want to handle .gitignore and .gitattributes, which suffer
from a similar problem, using the fall-back short names GI250A~1 and
GI7D29~1, respectively.
To accommodate for that, we could reimplement the hashing algorithm, but
it is just safer and simpler to provide the known prefixes. This
algorithm has been reverse-engineered and described at
https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but
still available via https://web.archive.org/.
These can be recomputed by running the following Perl script:
-- snip --
use warnings;
use strict;
sub compute_short_name_hash ($) {
my $checksum = 0;
foreach (split('', $_[0])) {
$checksum = ($checksum * 0x25 + ord($_)) & 0xffff;
}
$checksum = ($checksum * 314159269) & 0xffffffff;
$checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000);
$checksum -= (($checksum * 1152921497) >> 60) * 1000000007;
return scalar reverse sprintf("%x", $checksum & 0xffff);
}
print compute_short_name_hash($ARGV[0]);
-- snap --
E.g., running that with the argument ".gitignore" will
result in "250a" (which then becomes "gi250a" in the code).
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff King <peff@peff.net>
2018-05-11 16:03:54 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* These functions match their is_hfs_dotgit() counterparts; see utf8.h for
|
|
|
|
* details.
|
|
|
|
*/
|
|
|
|
int is_ntfs_dotgit(const char *name);
|
|
|
|
int is_ntfs_dotgitmodules(const char *name);
|
|
|
|
int is_ntfs_dotgitignore(const char *name);
|
|
|
|
int is_ntfs_dotgitattributes(const char *name);
|
t0060: test ntfs/hfs-obscured dotfiles
We have tests that cover various filesystem-specific spellings of
".gitmodules", because we need to reliably identify that path for some
security checks. These are from dc2d9ba318 (is_{hfs,ntfs}_dotgitmodules:
add tests, 2018-05-12), with the actual code coming from e7cb0b4455
(is_ntfs_dotgit: match other .git files, 2018-05-11) and 0fc333ba20
(is_hfs_dotgit: match other .git files, 2018-05-02).
Those latter two commits also added similar matching functions for
.gitattributes and .gitignore. These ended up not being used in the
final series, and are currently dead code. But in preparation for them
being used in some fsck checks, let's make sure they actually work by
throwing a few basic tests at them. Likewise, let's cover .mailmap
(which does need matching code added).
I didn't bother with the whole battery of tests that we cover for
.gitmodules. These functions are all based on the same generic matcher,
so it's sufficient to test most of the corner cases just once.
Note that the ntfs magic prefix names in the tests come from the
algorithm described in e7cb0b4455 (and are different for each file).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-03 22:43:22 +02:00
|
|
|
int is_ntfs_dotmailmap(const char *name);
|
2005-07-06 10:11:52 +02:00
|
|
|
|
2017-07-28 21:25:45 +02:00
|
|
|
/*
|
|
|
|
* Returns true iff "str" could be confused as a command-line option when
|
|
|
|
* passed to a sub-program like "ssh". Note that this has nothing to do with
|
|
|
|
* shell-quoting, which should be handled separately; we're assuming here that
|
|
|
|
* the string makes it verbatim to the sub-program.
|
|
|
|
*/
|
|
|
|
int looks_like_command_line_option(const char *str);
|
|
|
|
|
2021-09-04 22:54:58 +02:00
|
|
|
/**
|
|
|
|
* Return a newly allocated string with the evaluation of
|
|
|
|
* "$XDG_CONFIG_HOME/$subdir/$filename" if $XDG_CONFIG_HOME is non-empty, otherwise
|
|
|
|
* "$HOME/.config/$subdir/$filename". Return NULL upon error.
|
|
|
|
*/
|
|
|
|
char *xdg_config_home_for(const char *subdir, const char *filename);
|
|
|
|
|
2015-04-21 06:06:27 +02:00
|
|
|
/**
|
|
|
|
* Return a newly allocated string with the evaluation of
|
|
|
|
* "$XDG_CONFIG_HOME/git/$filename" if $XDG_CONFIG_HOME is non-empty, otherwise
|
|
|
|
* "$HOME/.config/git/$filename". Return NULL upon error.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
char *xdg_config_home(const char *filename);
|
2015-04-21 06:06:27 +02:00
|
|
|
|
2017-03-13 21:43:54 +01:00
|
|
|
/**
|
|
|
|
* Return a newly allocated string with the evaluation of
|
|
|
|
* "$XDG_CACHE_HOME/git/$filename" if $XDG_CACHE_HOME is non-empty, otherwise
|
|
|
|
* "$HOME/.cache/git/$filename". Return NULL upon error.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
char *xdg_cache_home(const char *filename);
|
2017-03-13 21:43:54 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int git_open_cloexec(const char *name, int flags);
|
sha1_file: stop opening files with O_NOATIME
When we open object files, we try to do so with O_NOATIME.
This dates back to 144bde78e9 (Use O_NOATIME when opening
the sha1 files., 2005-04-23), which is an optimization to
avoid creating a bunch of dirty inodes when we're accessing
many objects. But a few things have changed since then:
1. In June 2005, git learned about packfiles, which means
we would do a lot fewer atime updates (rather than one
per object access, we'd generally get one per packfile).
2. In late 2006, Linux learned about "relatime", which is
generally the default on modern installs. So
performance around atimes updates is a non-issue there
these days.
All the world isn't Linux, but as it turns out, Linux
is the only platform to implement O_NOATIME in the
first place.
So it's very unlikely that this code is helping anybody
these days.
Helped-by: Jeff King <peff@peff.net>
[jc: took idea and log message from peff]
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-10-28 15:29:27 +02:00
|
|
|
#define git_open(name) git_open_cloexec(name, O_RDONLY)
|
2021-10-01 11:16:48 +02:00
|
|
|
|
|
|
|
/**
|
|
|
|
* unpack_loose_header() initializes the data stream needed to unpack
|
|
|
|
* a loose object header.
|
|
|
|
*
|
2021-10-01 11:16:49 +02:00
|
|
|
* Returns:
|
|
|
|
*
|
|
|
|
* - ULHR_OK on success
|
|
|
|
* - ULHR_BAD on error
|
2021-10-01 11:16:50 +02:00
|
|
|
* - ULHR_TOO_LONG if the header was too long
|
2021-10-01 11:16:48 +02:00
|
|
|
*
|
|
|
|
* It will only parse up to MAX_HEADER_LEN bytes unless an optional
|
|
|
|
* "hdrbuf" argument is non-NULL. This is intended for use with
|
|
|
|
* OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
|
|
|
|
* reporting. The full header will be extracted to "hdrbuf" for use
|
2021-10-01 11:16:50 +02:00
|
|
|
* with parse_loose_header(), ULHR_TOO_LONG will still be returned
|
|
|
|
* from this function to indicate that the header was too long.
|
2021-10-01 11:16:48 +02:00
|
|
|
*/
|
2021-10-01 11:16:49 +02:00
|
|
|
enum unpack_loose_header_result {
|
|
|
|
ULHR_OK,
|
|
|
|
ULHR_BAD,
|
2021-10-01 11:16:50 +02:00
|
|
|
ULHR_TOO_LONG,
|
2021-10-01 11:16:49 +02:00
|
|
|
};
|
|
|
|
enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
|
|
|
|
unsigned char *map,
|
|
|
|
unsigned long mapsize,
|
|
|
|
void *buffer,
|
|
|
|
unsigned long bufsiz,
|
|
|
|
struct strbuf *hdrbuf);
|
|
|
|
|
object-file.c: stop dying in parse_loose_header()
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 11:16:51 +02:00
|
|
|
/**
|
|
|
|
* parse_loose_header() parses the starting "<type> <len>\0" of an
|
|
|
|
* object. If it doesn't follow that format -1 is returned. To check
|
|
|
|
* the validity of the <type> populate the "typep" in the "struct
|
|
|
|
* object_info". It will be OBJ_BAD if the object type is unknown. The
|
|
|
|
* parsed <len> can be retrieved via "oi->sizep", and from there
|
|
|
|
* passed to unpack_loose_rest().
|
|
|
|
*/
|
2021-10-01 11:16:47 +02:00
|
|
|
struct object_info;
|
object-file.c: stop dying in parse_loose_header()
Make parse_loose_header() return error codes and data instead of
invoking die() by itself.
For now we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller. In a subsequent
commit we'll make read_loose_object() return an error code instead of
dying. We should also address the "allow_unknown" case (should be
moved to builtin/cat-file.c), but for now I'll be leaving it.
For making parse_loose_header() not die() change its prototype to
accept a "struct object_info *" instead of the "unsigned long *sizep"
it accepted before. Its callers can now check the populated populated
"oi->typep".
Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().
This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.
Since 93cff9a978e (sha1_loose_object_info: return error for corrupted
objects, 2017-04-01) the return value of loose_object_info() (then
named sha1_loose_object_info()) had been a "status" variable that be
any negative value, as we were expecting to return the "enum
object_type".
The only negative type happens to be OBJ_BAD, but the code still
assumed that more might be added. This was then used later in
e.g. c84a1f3ed4d (sha1_file: refactor read_object, 2017-06-21). Now
that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.
Since parse_loose_header() doesn't need to return an arbitrary
"status" we only need to treat its "ret < 0" specially, but can
idiomatically overwrite it with our own error() return. This along
with having made unpack_loose_header() return an "enum
unpack_loose_header_result" in an earlier commit means that we can
move the previously nested if/else cases mostly into the "ULHR_OK"
branch of the "switch" statement.
We should be less silent if we reach that "status = -1" branch, which
happens if we've got trailing garbage in loose objects, see
f6371f92104 (sha1_file: add read_loose_object() function, 2017-01-13)
for a better way to handle it. For now let's punt on it, a subsequent
commit will address that edge case.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-01 11:16:51 +02:00
|
|
|
int parse_loose_header(const char *hdr, struct object_info *oi);
|
2005-04-24 03:47:23 +02:00
|
|
|
|
2022-02-05 00:48:28 +01:00
|
|
|
/**
|
|
|
|
* With in-core object data in "buf", rehash it to make sure the
|
|
|
|
* object name actually matches "oid" to detect object corruption.
|
2022-02-05 00:48:29 +01:00
|
|
|
*
|
|
|
|
* A negative value indicates an error, usually that the OID is not
|
|
|
|
* what we expected, but it might also indicate another error.
|
2022-02-05 00:48:28 +01:00
|
|
|
*/
|
2020-01-30 21:32:23 +01:00
|
|
|
int check_object_signature(struct repository *r, const struct object_id *oid,
|
2022-02-05 00:48:32 +01:00
|
|
|
void *map, unsigned long size,
|
|
|
|
enum object_type type);
|
2022-02-05 00:48:30 +01:00
|
|
|
|
|
|
|
/**
|
|
|
|
* A streaming version of check_object_signature().
|
|
|
|
* Try reading the object named with "oid" using
|
|
|
|
* the streaming interface and rehash it to do the same.
|
|
|
|
*/
|
|
|
|
int stream_object_signature(struct repository *r, const struct object_id *oid);
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int finalize_object_file(const char *tmpfile, const char *filename);
|
2005-04-24 03:47:23 +02:00
|
|
|
|
2017-02-27 19:00:11 +01:00
|
|
|
/* Helper to check and "touch" a file */
|
2019-04-29 10:28:14 +02:00
|
|
|
int check_and_freshen_file(const char *fn, int freshen);
|
2017-02-27 19:00:11 +01:00
|
|
|
|
2007-05-30 19:32:19 +02:00
|
|
|
extern const signed char hexval_table[256];
|
|
|
|
static inline unsigned int hexval(unsigned char c)
|
2006-09-21 01:04:46 +02:00
|
|
|
{
|
|
|
|
return hexval_table[c];
|
|
|
|
}
|
|
|
|
|
2016-09-03 17:59:20 +02:00
|
|
|
/*
|
|
|
|
* Convert two consecutive hexadecimal digits into a char. Return a
|
|
|
|
* negative value on error. Don't run over the end of short strings.
|
|
|
|
*/
|
|
|
|
static inline int hex2chr(const char *s)
|
|
|
|
{
|
2017-09-21 18:48:38 +02:00
|
|
|
unsigned int val = hexval(s[0]);
|
|
|
|
return (val & ~0xf) ? val : (val << 4) | hexval(s[1]);
|
2016-09-03 17:59:20 +02:00
|
|
|
}
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
/* Convert to/from hex/sha1 representation */
|
2010-10-28 20:28:04 +02:00
|
|
|
#define MINIMUM_ABBREV minimum_abbrev
|
|
|
|
#define DEFAULT_ABBREV default_abbrev
|
2006-01-25 10:03:18 +01:00
|
|
|
|
2016-10-01 02:19:35 +02:00
|
|
|
/* used when the code does not know or care what the default abbrev is */
|
|
|
|
#define FALLBACK_DEFAULT_ABBREV 7
|
|
|
|
|
2010-06-09 19:02:06 +02:00
|
|
|
struct object_context {
|
2019-04-05 17:00:12 +02:00
|
|
|
unsigned short mode;
|
2015-05-20 19:03:39 +02:00
|
|
|
/*
|
|
|
|
* symlink_path is only used by get_tree_entry_follow_symlinks,
|
|
|
|
* and only for symlinks that point outside the repository.
|
|
|
|
*/
|
|
|
|
struct strbuf symlink_path;
|
2017-05-19 14:54:43 +02:00
|
|
|
/*
|
2017-07-14 01:49:29 +02:00
|
|
|
* If GET_OID_RECORD_PATH is set, this will record path (if any)
|
2017-05-19 14:54:43 +02:00
|
|
|
* found when resolving the name. The caller is responsible for
|
|
|
|
* releasing the memory.
|
|
|
|
*/
|
|
|
|
char *path;
|
2010-06-09 19:02:06 +02:00
|
|
|
};
|
|
|
|
|
2017-07-14 01:49:29 +02:00
|
|
|
#define GET_OID_QUIETLY 01
|
|
|
|
#define GET_OID_COMMIT 02
|
|
|
|
#define GET_OID_COMMITTISH 04
|
|
|
|
#define GET_OID_TREE 010
|
|
|
|
#define GET_OID_TREEISH 020
|
|
|
|
#define GET_OID_BLOB 040
|
|
|
|
#define GET_OID_FOLLOW_SYMLINKS 0100
|
|
|
|
#define GET_OID_RECORD_PATH 0200
|
|
|
|
#define GET_OID_ONLY_TO_DIE 04000
|
2021-12-28 14:28:50 +01:00
|
|
|
#define GET_OID_REQUIRE_PATH 010000
|
2017-07-14 01:49:29 +02:00
|
|
|
|
|
|
|
#define GET_OID_DISAMBIGUATORS \
|
|
|
|
(GET_OID_COMMIT | GET_OID_COMMITTISH | \
|
|
|
|
GET_OID_TREE | GET_OID_TREEISH | \
|
|
|
|
GET_OID_BLOB)
|
2011-09-23 15:38:36 +02:00
|
|
|
|
2019-01-18 05:19:43 +01:00
|
|
|
enum get_oid_result {
|
|
|
|
FOUND = 0,
|
|
|
|
MISSING_OBJECT = -1, /* The requested object is missing */
|
|
|
|
SHORT_NAME_AMBIGUOUS = -2,
|
|
|
|
/* The following only apply when symlinks are followed */
|
|
|
|
DANGLING_SYMLINK = -4, /*
|
|
|
|
* The initial symlink is there, but
|
|
|
|
* (transitively) points to a missing
|
|
|
|
* in-tree file
|
|
|
|
*/
|
|
|
|
SYMLINK_LOOP = -5,
|
|
|
|
NOT_DIR = -6, /*
|
|
|
|
* Somewhere along the symlink chain, a path is
|
|
|
|
* requested which contains a file as a
|
|
|
|
* non-final element.
|
|
|
|
*/
|
|
|
|
};
|
|
|
|
|
2019-04-16 11:33:37 +02:00
|
|
|
int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
|
2021-07-13 10:05:19 +02:00
|
|
|
__attribute__((format (printf, 2, 3)))
|
2019-05-08 17:37:24 +02:00
|
|
|
int get_oidf(struct object_id *oid, const char *fmt, ...);
|
2019-04-16 11:33:40 +02:00
|
|
|
int repo_get_oid_commit(struct repository *r, const char *str, struct object_id *oid);
|
|
|
|
int repo_get_oid_committish(struct repository *r, const char *str, struct object_id *oid);
|
|
|
|
int repo_get_oid_tree(struct repository *r, const char *str, struct object_id *oid);
|
|
|
|
int repo_get_oid_treeish(struct repository *r, const char *str, struct object_id *oid);
|
|
|
|
int repo_get_oid_blob(struct repository *r, const char *str, struct object_id *oid);
|
2019-04-16 11:33:41 +02:00
|
|
|
int repo_get_oid_mb(struct repository *r, const char *str, struct object_id *oid);
|
2019-04-16 11:33:39 +02:00
|
|
|
void maybe_die_on_misspelt_object_name(struct repository *repo,
|
|
|
|
const char *name,
|
|
|
|
const char *prefix);
|
2019-04-29 10:28:14 +02:00
|
|
|
enum get_oid_result get_oid_with_context(struct repository *repo, const char *str,
|
2019-04-29 10:28:23 +02:00
|
|
|
unsigned flags, struct object_id *oid,
|
|
|
|
struct object_context *oc);
|
2016-04-18 01:10:36 +02:00
|
|
|
|
2019-04-16 11:33:40 +02:00
|
|
|
#define get_oid(str, oid) repo_get_oid(the_repository, str, oid)
|
|
|
|
#define get_oid_commit(str, oid) repo_get_oid_commit(the_repository, str, oid)
|
|
|
|
#define get_oid_committish(str, oid) repo_get_oid_committish(the_repository, str, oid)
|
|
|
|
#define get_oid_tree(str, oid) repo_get_oid_tree(the_repository, str, oid)
|
|
|
|
#define get_oid_treeish(str, oid) repo_get_oid_treeish(the_repository, str, oid)
|
|
|
|
#define get_oid_blob(str, oid) repo_get_oid_blob(the_repository, str, oid)
|
2019-04-16 11:33:41 +02:00
|
|
|
#define get_oid_mb(str, oid) repo_get_oid_mb(the_repository, str, oid)
|
2019-04-16 11:33:40 +02:00
|
|
|
|
2017-03-31 03:39:59 +02:00
|
|
|
typedef int each_abbrev_fn(const struct object_id *oid, void *);
|
2019-04-16 11:33:24 +02:00
|
|
|
int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
|
|
|
|
#define for_each_abbrev(prefix, fn, data) repo_for_each_abbrev(the_repository, prefix, fn, data)
|
2011-09-23 15:38:36 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int set_disambiguate_hint_config(const char *var, const char *value);
|
2016-09-27 14:38:01 +02:00
|
|
|
|
2011-09-23 15:38:36 +02:00
|
|
|
/*
|
|
|
|
* Try to read a SHA1 in hexadecimal format from the 40 characters
|
|
|
|
* starting at hex. Write the 20-byte result to sha1 in binary form.
|
|
|
|
* Return 0 on success. Reading stops if a NUL is encountered in the
|
|
|
|
* input, so it is safe to pass this function an arbitrary
|
|
|
|
* null-terminated string.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int get_sha1_hex(const char *hex, unsigned char *sha1);
|
|
|
|
int get_oid_hex(const char *hex, struct object_id *sha1);
|
2011-09-23 15:38:36 +02:00
|
|
|
|
2020-02-22 21:17:28 +01:00
|
|
|
/* Like get_oid_hex, but for an arbitrary hash algorithm. */
|
|
|
|
int get_oid_hex_algop(const char *hex, struct object_id *oid, const struct git_hash_algo *algop);
|
|
|
|
|
2017-10-31 14:46:49 +01:00
|
|
|
/*
|
|
|
|
* Read `len` pairs of hexadecimal digits from `hex` and write the
|
|
|
|
* values to `binary` as `len` bytes. Return 0 on success, or -1 if
|
|
|
|
* the input does not consist of hex digits).
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
|
2017-10-31 14:46:49 +01:00
|
|
|
|
2015-09-24 23:05:45 +02:00
|
|
|
/*
|
hex: drop sha1_to_hex()
There's only a single caller left of sha1_to_hex(), since everybody
that has an object name in "unsigned char[]" now uses hash_to_hex()
instead.
This case is in the sha1dc wrapper, where we print a hex sha1 when
we find a collision. This one will always be sha1, regardless of the
current hash algorithm, so we can't use hash_to_hex() here. In
practice we'd probably not be running sha1 at all if it isn't the
current algorithm, but it's possible we might still occasionally
need to compute a sha1 in a post-sha256 world.
Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's
call that ourselves. There's value in getting rid of the sha1-specific
wrapper to de-clutter the global namespace, and to make sure nobody uses
it (and as with sha1_to_hex_r() in the previous patch, we'll drop the
coccinelle transformations, too).
The sha1_to_hex() function is mentioned in a comment; we can easily
swap that out for oid_to_hex() to give a better example. Also
update the comment that was left stale when we added "struct
object_id *" as a way to name an object and added functions to
convert it to hex.
The function is also mentioned in some test vectors in t4100, but
that's not runnable code, so there's no point in trying to clean it
up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-11 10:04:18 +01:00
|
|
|
* Convert a binary hash in "unsigned char []" or an object name in
|
|
|
|
* "struct object_id *" to its hex equivalent. The `_r` variant is reentrant,
|
2015-09-24 23:05:45 +02:00
|
|
|
* and writes the NUL-terminated output to the buffer `out`, which must be at
|
2018-11-14 05:09:29 +01:00
|
|
|
* least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
|
2015-09-24 23:05:45 +02:00
|
|
|
* convenience.
|
|
|
|
*
|
|
|
|
* The non-`_r` variant returns a static buffer, but uses a ring of 4
|
|
|
|
* buffers, making it safe to make multiple calls for a single statement, like:
|
|
|
|
*
|
hex: drop sha1_to_hex()
There's only a single caller left of sha1_to_hex(), since everybody
that has an object name in "unsigned char[]" now uses hash_to_hex()
instead.
This case is in the sha1dc wrapper, where we print a hex sha1 when
we find a collision. This one will always be sha1, regardless of the
current hash algorithm, so we can't use hash_to_hex() here. In
practice we'd probably not be running sha1 at all if it isn't the
current algorithm, but it's possible we might still occasionally
need to compute a sha1 in a post-sha256 world.
Since sha1_to_hex() is just a wrapper for hash_to_hex_algop(), let's
call that ourselves. There's value in getting rid of the sha1-specific
wrapper to de-clutter the global namespace, and to make sure nobody uses
it (and as with sha1_to_hex_r() in the previous patch, we'll drop the
coccinelle transformations, too).
The sha1_to_hex() function is mentioned in a comment; we can easily
swap that out for oid_to_hex() to give a better example. Also
update the comment that was left stale when we added "struct
object_id *" as a way to name an object and added functions to
convert it to hex.
The function is also mentioned in some test vectors in t4100, but
that's not runnable code, so there's no point in trying to clean it
up.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-11 10:04:18 +01:00
|
|
|
* printf("%s -> %s", hash_to_hex(one), hash_to_hex(two));
|
|
|
|
* printf("%s -> %s", oid_to_hex(one), oid_to_hex(two));
|
2015-09-24 23:05:45 +02:00
|
|
|
*/
|
2018-11-14 05:09:29 +01:00
|
|
|
char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
|
|
|
|
char *oid_to_hex_r(char *out, const struct object_id *oid);
|
|
|
|
char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *); /* static buffer result! */
|
|
|
|
char *hash_to_hex(const unsigned char *hash); /* same static buffer */
|
|
|
|
char *oid_to_hex(const struct object_id *oid); /* same static buffer */
|
2011-09-15 23:10:42 +02:00
|
|
|
|
2017-02-20 01:10:13 +01:00
|
|
|
/*
|
|
|
|
* Parse a 40-character hexadecimal object ID starting from hex, updating the
|
|
|
|
* pointer specified by end when parsing stops. The resulting object ID is
|
|
|
|
* stored in oid. Returns 0 on success. Parsing will stop on the first NUL or
|
|
|
|
* other invalid character. end is only updated on success; otherwise, it is
|
|
|
|
* unmodified.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
|
2017-02-20 01:10:13 +01:00
|
|
|
|
2020-02-22 21:17:28 +01:00
|
|
|
/* Like parse_oid_hex, but for an arbitrary hash algorithm. */
|
|
|
|
int parse_oid_hex_algop(const char *hex, struct object_id *oid, const char **end,
|
|
|
|
const struct git_hash_algo *algo);
|
|
|
|
|
2020-02-22 21:17:29 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* These functions work like get_oid_hex and parse_oid_hex, but they will parse
|
|
|
|
* a hex value for any algorithm. The algorithm is detected based on the length
|
|
|
|
* and the algorithm in use is returned. If this is not a hex object ID in any
|
|
|
|
* algorithm, returns GIT_HASH_UNKNOWN.
|
|
|
|
*/
|
|
|
|
int get_oid_hex_any(const char *hex, struct object_id *oid);
|
|
|
|
int parse_oid_hex_any(const char *hex, struct object_id *oid, const char **end);
|
2017-02-20 01:10:13 +01:00
|
|
|
|
2017-03-02 09:21:23 +01:00
|
|
|
/*
|
|
|
|
* This reads short-hand syntax that not only evaluates to a commit
|
|
|
|
* object name, but also can act as if the end user spelled the name
|
|
|
|
* of the branch from the command line.
|
|
|
|
*
|
|
|
|
* - "@{-N}" finds the name of the Nth previous branch we were on, and
|
|
|
|
* places the name of the branch in the given buf and returns the
|
|
|
|
* number of characters parsed if successful.
|
|
|
|
*
|
|
|
|
* - "<branch>@{upstream}" finds the name of the other ref that
|
|
|
|
* <branch> is configured to merge with (missing <branch> defaults
|
|
|
|
* to the current branch), and places the name of the branch in the
|
|
|
|
* given buf and returns the number of characters parsed if
|
|
|
|
* successful.
|
|
|
|
*
|
|
|
|
* If the input is not of the accepted format, it returns a negative
|
|
|
|
* number to signal an error.
|
|
|
|
*
|
|
|
|
* If the input was ok but there are not N branch switches in the
|
|
|
|
* reflog, it returns 0.
|
|
|
|
*/
|
interpret_branch_name: allow callers to restrict expansions
The interpret_branch_name() function converts names like
@{-1} and @{upstream} into branch names. The expanded ref
names are not fully qualified, and may be outside of the
refs/heads/ namespace (e.g., "@" expands to "HEAD", and
"@{upstream}" is likely to be in "refs/remotes/").
This is OK for callers like dwim_ref() which are primarily
interested in resolving the resulting name, no matter where
it is. But callers like "git branch" treat the result as a
branch name in refs/heads/. When we expand to a ref outside
that namespace, the results are very confusing (e.g., "git
branch @" tries to create refs/heads/HEAD, which is
nonsense).
Callers can't know from the returned string how the
expansion happened (e.g., did the user really ask for a
branch named "HEAD", or did we do a bogus expansion?). One
fix would be to return some out-parameters describing the
types of expansion that occurred. This has the benefit that
the caller can generate precise error messages ("I
understood @{upstream} to mean origin/master, but that is a
remote tracking branch, so you cannot create it as a local
name").
However, out-parameters make the function interface somewhat
cumbersome. Instead, let's do the opposite: let the caller
tell us which elements to expand. That's easier to pass in,
and none of the callers give more precise error messages
than "@{upstream} isn't a valid branch name" anyway (which
should be sufficient).
The strbuf_branchname() function needs a similar parameter,
as most of the callers access interpret_branch_name()
through it.
We can break the callers down into two groups:
1. Callers that are happy with any kind of ref in the
result. We pass "0" here, so they continue to work
without restrictions. This includes merge_name(),
the reflog handling in add_pending_object_with_path(),
and substitute_branch_name(). This last is what powers
dwim_ref().
2. Callers that have funny corner cases (mostly in
git-branch and git-checkout). These need to make use of
the new parameter, but I've left them as "0" in this
patch, and will address them individually in follow-on
patches.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 09:23:01 +01:00
|
|
|
#define INTERPRET_BRANCH_LOCAL (1<<0)
|
|
|
|
#define INTERPRET_BRANCH_REMOTE (1<<1)
|
|
|
|
#define INTERPRET_BRANCH_HEAD (1<<2)
|
2020-09-02 00:28:07 +02:00
|
|
|
struct interpret_branch_name_options {
|
|
|
|
/*
|
|
|
|
* If "allowed" is non-zero, it is a treated as a bitfield of allowable
|
|
|
|
* expansions: local branches ("refs/heads/"), remote branches
|
|
|
|
* ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion is
|
|
|
|
* allowed, even ones to refs outside of those namespaces.
|
|
|
|
*/
|
|
|
|
unsigned allowed;
|
2020-09-02 00:28:09 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If ^{upstream} or ^{push} (or equivalent) is requested, and the
|
|
|
|
* branch in question does not have such a reference, return -1 instead
|
|
|
|
* of die()-ing.
|
|
|
|
*/
|
|
|
|
unsigned nonfatal_dangling_mark : 1;
|
2020-09-02 00:28:07 +02:00
|
|
|
};
|
2019-04-06 13:34:26 +02:00
|
|
|
int repo_interpret_branch_name(struct repository *r,
|
|
|
|
const char *str, int len,
|
|
|
|
struct strbuf *buf,
|
2020-09-02 00:28:07 +02:00
|
|
|
const struct interpret_branch_name_options *options);
|
|
|
|
#define interpret_branch_name(str, len, buf, options) \
|
|
|
|
repo_interpret_branch_name(the_repository, str, len, buf, options)
|
2007-01-19 10:15:15 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int validate_headref(const char *ref);
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int base_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
|
|
|
|
int df_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
|
|
|
|
int name_compare(const char *name1, size_t len1, const char *name2, size_t len2);
|
|
|
|
int cache_name_stage_compare(const char *name1, int len1, int stage1, const char *name2, int len2, int stage2);
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2019-06-27 11:28:47 +02:00
|
|
|
void *read_object_with_reference(struct repository *r,
|
|
|
|
const struct object_id *oid,
|
2022-02-05 00:48:34 +01:00
|
|
|
enum object_type required_type,
|
2019-04-29 10:28:23 +02:00
|
|
|
unsigned long *size,
|
|
|
|
struct object_id *oid_ret);
|
2005-04-21 03:06:49 +02:00
|
|
|
|
2019-04-16 11:33:32 +02:00
|
|
|
struct object *repo_peel_to_type(struct repository *r,
|
|
|
|
const char *name, int namelen,
|
|
|
|
struct object *o, enum object_type);
|
|
|
|
#define peel_to_type(name, namelen, obj, type) \
|
|
|
|
repo_peel_to_type(the_repository, name, namelen, obj, type)
|
2007-12-24 09:51:01 +01:00
|
|
|
|
2012-05-25 01:28:40 +02:00
|
|
|
#define IDENT_STRICT 1
|
2012-05-22 01:10:11 +02:00
|
|
|
#define IDENT_NO_DATE 2
|
ident: let callers omit name with fmt_indent
Most callers want to see all of "$name <$email> $date", but
a few want only limited parts, omitting the date, or even
the name. We already have IDENT_NO_DATE to handle the date
part, but there's not a good option for getting just the
email. Callers have to done one of:
1. Call ident_default_email; this does not respect
environment variables, nor does it promise to trim
whitespace or other crud from the result.
2. Call git_{committer,author}_info; this returns the name
and email, leaving the caller to parse out the wanted
bits.
This patch adds IDENT_NO_NAME; it stops short of adding
IDENT_NO_EMAIL, as no callers want it (nor are likely to),
and it complicates the error handling of the function.
When no name is requested, the angle brackets (<>) around
the email address are also omitted.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-05-25 01:27:24 +02:00
|
|
|
#define IDENT_NO_NAME 4
|
2019-02-04 19:48:50 +01:00
|
|
|
|
|
|
|
enum want_ident {
|
|
|
|
WANT_BLANK_IDENT,
|
|
|
|
WANT_AUTHOR_IDENT,
|
|
|
|
WANT_COMMITTER_IDENT
|
|
|
|
};
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *git_author_info(int);
|
|
|
|
const char *git_committer_info(int);
|
|
|
|
const char *fmt_ident(const char *name, const char *email,
|
2019-04-29 10:28:23 +02:00
|
|
|
enum want_ident whose_ident,
|
|
|
|
const char *date_str, int);
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *fmt_name(enum want_ident);
|
|
|
|
const char *ident_default_name(void);
|
|
|
|
const char *ident_default_email(void);
|
|
|
|
const char *git_editor(void);
|
|
|
|
const char *git_sequence_editor(void);
|
|
|
|
const char *git_pager(int stdout_is_tty);
|
|
|
|
int is_terminal_dumb(void);
|
|
|
|
int git_ident_config(const char *, const char *, void *);
|
2019-02-26 00:16:08 +01:00
|
|
|
/*
|
|
|
|
* Prepare an ident to fall back on if the user didn't configure it.
|
|
|
|
*/
|
|
|
|
void prepare_fallback_ident(const char *name, const char *email);
|
2019-04-29 10:28:14 +02:00
|
|
|
void reset_ident_date(void);
|
2005-07-12 20:49:27 +02:00
|
|
|
|
2012-03-11 10:25:43 +01:00
|
|
|
struct ident_split {
|
|
|
|
const char *name_begin;
|
|
|
|
const char *name_end;
|
|
|
|
const char *mail_begin;
|
|
|
|
const char *mail_end;
|
|
|
|
const char *date_begin;
|
|
|
|
const char *date_end;
|
|
|
|
const char *tz_begin;
|
|
|
|
const char *tz_end;
|
|
|
|
};
|
|
|
|
/*
|
|
|
|
* Signals an success with 0, but time part of the result may be NULL
|
|
|
|
* if the input lacks timestamp and zone
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int split_ident_line(struct ident_split *, const char *, int);
|
2012-03-11 10:25:43 +01:00
|
|
|
|
2013-09-20 12:16:28 +02:00
|
|
|
/*
|
|
|
|
* Compare split idents for equality or strict ordering. Note that we
|
|
|
|
* compare only the ident part of the line, ignoring any timestamp.
|
|
|
|
*
|
|
|
|
* Because there are two fields, we must choose one as the primary key; we
|
|
|
|
* currently arbitrarily pick the email.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int ident_cmp(const struct ident_split *, const struct ident_split *);
|
2013-09-20 12:16:28 +02:00
|
|
|
|
2009-07-09 22:35:31 +02:00
|
|
|
struct cache_def {
|
2014-07-05 00:41:46 +02:00
|
|
|
struct strbuf path;
|
2009-07-09 22:35:31 +02:00
|
|
|
int flags;
|
|
|
|
int track_flags;
|
|
|
|
int prefix_len_stat_func;
|
|
|
|
};
|
2021-09-27 14:54:27 +02:00
|
|
|
#define CACHE_DEF_INIT { \
|
|
|
|
.path = STRBUF_INIT, \
|
|
|
|
}
|
2014-07-12 01:02:34 +02:00
|
|
|
static inline void cache_def_clear(struct cache_def *cache)
|
2014-07-05 00:41:46 +02:00
|
|
|
{
|
|
|
|
strbuf_release(&cache->path);
|
|
|
|
}
|
2009-07-09 22:35:31 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int has_symlink_leading_path(const char *name, int len);
|
|
|
|
int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
|
checkout: don't follow symlinks when removing entries
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.
Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().
For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.
As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18 19:43:47 +01:00
|
|
|
int check_leading_path(const char *name, int len, int warn_on_lstat_err);
|
2019-04-29 10:28:14 +02:00
|
|
|
int has_dirs_only_path(const char *name, int len, int prefix_len);
|
2021-02-12 15:49:41 +01:00
|
|
|
void invalidate_lstat_cache(void);
|
2019-04-29 10:28:14 +02:00
|
|
|
void schedule_dir_for_removal(const char *name, int len);
|
|
|
|
void remove_scheduled_dirs(void);
|
2005-06-06 06:59:54 +02:00
|
|
|
|
2006-12-23 08:33:44 +01:00
|
|
|
struct pack_window {
|
|
|
|
struct pack_window *next;
|
|
|
|
unsigned char *base;
|
|
|
|
off_t offset;
|
|
|
|
size_t len;
|
|
|
|
unsigned int last_used;
|
|
|
|
unsigned int inuse_cnt;
|
|
|
|
};
|
|
|
|
|
2005-07-01 02:15:39 +02:00
|
|
|
struct pack_entry {
|
2007-03-07 02:44:30 +01:00
|
|
|
off_t offset;
|
2005-07-01 02:15:39 +02:00
|
|
|
struct packed_git *p;
|
|
|
|
};
|
|
|
|
|
2017-03-16 15:27:00 +01:00
|
|
|
/*
|
2017-03-28 21:45:25 +02:00
|
|
|
* Create a temporary file rooted in the object database directory, or
|
|
|
|
* die on failure. The filename is taken from "pattern", which should have the
|
|
|
|
* usual "XXXXXX" trailer, and the resulting filename is written into the
|
|
|
|
* "template" buffer. Returns the open descriptor.
|
2017-03-16 15:27:00 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int odb_mkstemp(struct strbuf *temp_filename, const char *pattern);
|
2017-03-16 15:27:00 +01:00
|
|
|
|
|
|
|
/*
|
2017-03-16 15:27:12 +01:00
|
|
|
* Create a pack .keep file named "name" (which should generally be the output
|
|
|
|
* of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
|
|
|
|
* error.
|
2017-03-16 15:27:00 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int odb_pack_keep(const char *name);
|
2017-03-16 15:27:00 +01:00
|
|
|
|
2017-12-08 16:27:14 +01:00
|
|
|
/*
|
2019-01-07 09:34:12 +01:00
|
|
|
* Set this to 0 to prevent oid_object_info_extended() from fetching missing
|
2017-12-08 16:27:14 +01:00
|
|
|
* blobs. This has a difference only if extensions.partialClone is set.
|
|
|
|
*
|
|
|
|
* Its default value is 1.
|
|
|
|
*/
|
|
|
|
extern int fetch_if_missing;
|
|
|
|
|
[PATCH] Add update-server-info.
The git-update-server-info command prepares informational files
to help clients discover the contents of a repository, and pull
from it via a dumb transport protocols. Currently, the
following files are produced.
- The $repo/info/refs file lists the name of heads and tags
available in the $repo/refs/ directory, along with their
SHA1. This can be used by git-ls-remote command running on
the client side.
- The $repo/info/rev-cache file describes the commit ancestry
reachable from references in the $repo/refs/ directory. This
file is in an append-only binary format to make the server
side friendly to rsync mirroring scheme, and can be read by
git-show-rev-cache command.
- The $repo/objects/info/pack file lists the name of the packs
available, the interdependencies among them, and the head
commits and tags contained in them. Along with the other two
files, this is designed to help clients to make smart pull
decisions.
The git-receive-pack command is changed to invoke it at the end,
so just after a push to a public repository finishes via "git
push", the server info is automatically updated.
In addition, building of the rev-cache file can be done by a
standalone git-build-rev-cache command separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-24 02:54:41 +02:00
|
|
|
/* Dumb servers support */
|
2019-04-29 10:28:14 +02:00
|
|
|
int update_server_info(int);
|
[PATCH] Add update-server-info.
The git-update-server-info command prepares informational files
to help clients discover the contents of a repository, and pull
from it via a dumb transport protocols. Currently, the
following files are produced.
- The $repo/info/refs file lists the name of heads and tags
available in the $repo/refs/ directory, along with their
SHA1. This can be used by git-ls-remote command running on
the client side.
- The $repo/info/rev-cache file describes the commit ancestry
reachable from references in the $repo/refs/ directory. This
file is in an append-only binary format to make the server
side friendly to rsync mirroring scheme, and can be read by
git-show-rev-cache command.
- The $repo/objects/info/pack file lists the name of the packs
available, the interdependencies among them, and the head
commits and tags contained in them. Along with the other two
files, this is designed to help clients to make smart pull
decisions.
The git-receive-pack command is changed to invoke it at the end,
so just after a push to a public repository finishes via "git
push", the server info is automatically updated.
In addition, building of the rev-cache file can be done by a
standalone git-build-rev-cache command separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-24 02:54:41 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
const char *get_log_output_encoding(void);
|
|
|
|
const char *get_commit_output_encoding(void);
|
2010-11-02 20:59:07 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int committer_ident_sufficiently_given(void);
|
|
|
|
int author_ident_sufficiently_given(void);
|
2005-10-12 03:47:34 +02:00
|
|
|
|
2007-03-12 20:33:18 +01:00
|
|
|
extern const char *git_commit_encoding;
|
2007-03-07 02:44:17 +01:00
|
|
|
extern const char *git_log_output_encoding;
|
2009-02-08 15:34:27 +01:00
|
|
|
extern const char *git_mailmap_file;
|
2012-12-12 12:04:04 +01:00
|
|
|
extern const char *git_mailmap_blob;
|
2005-11-28 01:09:40 +01:00
|
|
|
|
2007-06-29 19:40:46 +02:00
|
|
|
/* IO helper functions */
|
2019-04-29 10:28:14 +02:00
|
|
|
void maybe_flush_or_die(FILE *, const char *);
|
2014-09-10 12:03:52 +02:00
|
|
|
__attribute__((format (printf, 2, 3)))
|
2019-04-29 10:28:20 +02:00
|
|
|
void fprintf_or_die(FILE *, const char *fmt, ...);
|
2021-09-01 14:54:41 +02:00
|
|
|
void fwrite_or_die(FILE *f, const void *buf, size_t count);
|
|
|
|
void fflush_or_die(FILE *f);
|
2015-05-19 19:55:16 +02:00
|
|
|
|
|
|
|
#define COPY_READ_ERROR (-2)
|
|
|
|
#define COPY_WRITE_ERROR (-3)
|
2019-04-29 10:28:14 +02:00
|
|
|
int copy_fd(int ifd, int ofd);
|
|
|
|
int copy_file(const char *dst, const char *src, int mode);
|
|
|
|
int copy_file_with_time(const char *dst, const char *src, int mode);
|
2015-05-19 19:55:16 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void write_or_die(int fd, const void *buf, size_t count);
|
|
|
|
void fsync_or_die(int fd, const char *);
|
2022-03-10 23:43:21 +01:00
|
|
|
int fsync_component(enum fsync_component component, int fd);
|
|
|
|
void fsync_component_or_die(enum fsync_component component, int fd, const char *msg);
|
2005-12-15 07:17:38 +01:00
|
|
|
|
core.fsyncmethod: batched disk flushes for loose-objects
When adding many objects to a repo with `core.fsync=loose-object`,
the cost of fsync'ing each object file can become prohibitive.
One major source of the cost of fsync is the implied flush of the
hardware writeback cache within the disk drive. This commit introduces
a new `core.fsyncMethod=batch` option that batches up hardware flushes.
It hooks into the bulk-checkin odb-transaction functionality, takes
advantage of tmp-objdir, and uses the writeout-only support code.
When the new mode is enabled, we do the following for each new object:
1a. Create the object in a tmp-objdir.
2a. Issue a pagecache writeback request and wait for it to complete.
At the end of the entire transaction when unplugging bulk checkin:
1b. Issue an fsync against a dummy file to flush the log and hardware
writeback cache, which should by now have seen the tmp-objdir writes.
2b. Rename all of the tmp-objdir files to their final names.
3b. When updating the index and/or refs, we assume that Git will issue
another fsync internal to that operation. This is not the default
today, but the user now has the option of syncing the index and there
is a separate patch series to implement syncing of refs.
On a filesystem with a singular journal that is updated during name
operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS
we would expect the fsync to trigger a journal writeout so that this
sequence is enough to ensure that the user's data is durable by the time
the git command returns. This sequence also ensures that no object files
appear in the main object store unless they are fsync-durable.
Batch mode is only enabled if core.fsync includes loose-objects. If
the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does
not include loose-objects, we will use file-by-file fsyncing.
In step (1a) of the sequence, the tmp-objdir is created lazily to avoid
work if no loose objects are ever added to the ODB. We use a tmp-objdir
to maintain the invariant that no loose-objects are visible in the main
ODB unless they are properly fsync-durable. This is important since
future ODB operations that try to create an object with specific
contents will silently drop the new data if an object with the target
hash exists without checking that the loose-object contents match the
hash. Only a full git-fsck would restore the ODB to a functional state
where dataloss doesn't occur.
In step (1b) of the sequence, we issue a fsync against a dummy file
created specifically for the purpose. This method has a little higher
cost than using one of the input object files, but makes adding new
callers of this mechanism easier, since we don't need to figure out
which object file is "last" or risk sharing violations by caching the fd
of the last object file.
_Performance numbers_:
Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD.
Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD.
Windows - Same host as Linux, a preview version of Windows 11.
Adding 500 files to the repo with 'git add' Times reported in seconds.
object file syncing | Linux | Mac | Windows
--------------------|-------|-------|--------
disabled | 0.06 | 0.35 | 0.61
fsync | 1.88 | 11.18 | 2.47
batch | 0.15 | 0.41 | 1.53
Signed-off-by: Neeraj Singh <neerajsi@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-05 07:20:09 +02:00
|
|
|
static inline int batch_fsync_enabled(enum fsync_component component)
|
|
|
|
{
|
|
|
|
return (fsync_components & component) && (fsync_method == FSYNC_METHOD_BATCH);
|
|
|
|
}
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
ssize_t read_in_full(int fd, void *buf, size_t count);
|
|
|
|
ssize_t write_in_full(int fd, const void *buf, size_t count);
|
|
|
|
ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
|
2014-04-10 20:31:21 +02:00
|
|
|
|
use write_str_in_full helper to avoid literal string lengths
In 2d14d65 (Use a clearer style to issue commands to remote helpers,
2009-09-03) I happened to notice two changes like this:
- write_in_full(helper->in, "list\n", 5);
+
+ strbuf_addstr(&buf, "list\n");
+ write_in_full(helper->in, buf.buf, buf.len);
+ strbuf_reset(&buf);
IMHO, it would be better to define a new function,
static inline ssize_t write_str_in_full(int fd, const char *str)
{
return write_in_full(fd, str, strlen(str));
}
and then use it like this:
- strbuf_addstr(&buf, "list\n");
- write_in_full(helper->in, buf.buf, buf.len);
- strbuf_reset(&buf);
+ write_str_in_full(helper->in, "list\n");
Thus not requiring the added allocation, and still avoiding
the maintenance risk of literal string lengths.
These days, compilers are good enough that strlen("literal")
imposes no run-time cost.
Transformed via this:
perl -pi -e \
's/write_in_full\((.*?), (".*?"), \d+\)/write_str_in_full($1, $2)/'\
$(git grep -l 'write_in_full.*"')
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-12 10:54:32 +02:00
|
|
|
static inline ssize_t write_str_in_full(int fd, const char *str)
|
|
|
|
{
|
|
|
|
return write_in_full(fd, str, strlen(str));
|
|
|
|
}
|
2015-08-24 22:03:07 +02:00
|
|
|
|
write_file: add pointer+len variant
There are many callsites which could use write_file, but for
which it is a little awkward because they have a strbuf or
other pointer/len combo. Specifically:
1. write_file() takes a format string, so we have to use
"%s" or "%.*s", which are ugly.
2. Using any form of "%s" does not handle embedded NULs in
the output. That probably doesn't matter for our
call-sites, but it's nicer not to have to worry.
3. It's less efficient; we format into another strbuf
just to do the write. That's probably not measurably
slow for our uses, but it's simply inelegant.
We can fix this by providing a helper to write out the
formatted buffer, and just calling it from write_file().
Note that we don't do the usual "complete with a newline"
that write_file does. If the caller has their own buffer,
there's a reasonable chance they're doing something more
complicated than a single line, and they can call
strbuf_complete_line() themselves.
We could go even further and add strbuf_write_file(), but it
doesn't save much:
- write_file_buf(path, sb.buf, sb.len);
+ strbuf_write_file(&sb, path);
It would also be somewhat asymmetric with strbuf_read_file,
which actually returns errors rather than dying (and the
error handling is most of the benefit of write_file() in the
first place).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-08 11:12:22 +02:00
|
|
|
/**
|
|
|
|
* Open (and truncate) the file at path, write the contents of buf to it,
|
|
|
|
* and close it. Dies if any errors are encountered.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
void write_file_buf(const char *path, const char *buf, size_t len);
|
write_file: add pointer+len variant
There are many callsites which could use write_file, but for
which it is a little awkward because they have a strbuf or
other pointer/len combo. Specifically:
1. write_file() takes a format string, so we have to use
"%s" or "%.*s", which are ugly.
2. Using any form of "%s" does not handle embedded NULs in
the output. That probably doesn't matter for our
call-sites, but it's nicer not to have to worry.
3. It's less efficient; we format into another strbuf
just to do the write. That's probably not measurably
slow for our uses, but it's simply inelegant.
We can fix this by providing a helper to write out the
formatted buffer, and just calling it from write_file().
Note that we don't do the usual "complete with a newline"
that write_file does. If the caller has their own buffer,
there's a reasonable chance they're doing something more
complicated than a single line, and they can call
strbuf_complete_line() themselves.
We could go even further and add strbuf_write_file(), but it
doesn't save much:
- write_file_buf(path, sb.buf, sb.len);
+ strbuf_write_file(&sb, path);
It would also be somewhat asymmetric with strbuf_read_file,
which actually returns errors rather than dying (and the
error handling is most of the benefit of write_file() in the
first place).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-08 11:12:22 +02:00
|
|
|
|
2016-07-08 11:12:42 +02:00
|
|
|
/**
|
|
|
|
* Like write_file_buf(), but format the contents into a buffer first.
|
|
|
|
* Additionally, write_file() will append a newline if one is not already
|
|
|
|
* present, making it convenient to write text files:
|
|
|
|
*
|
|
|
|
* write_file(path, "counter: %d", ctr);
|
|
|
|
*/
|
|
|
|
__attribute__((format (printf, 2, 3)))
|
2019-04-29 10:28:20 +02:00
|
|
|
void write_file(const char *path, const char *fmt, ...);
|
use write_str_in_full helper to avoid literal string lengths
In 2d14d65 (Use a clearer style to issue commands to remote helpers,
2009-09-03) I happened to notice two changes like this:
- write_in_full(helper->in, "list\n", 5);
+
+ strbuf_addstr(&buf, "list\n");
+ write_in_full(helper->in, buf.buf, buf.len);
+ strbuf_reset(&buf);
IMHO, it would be better to define a new function,
static inline ssize_t write_str_in_full(int fd, const char *str)
{
return write_in_full(fd, str, strlen(str));
}
and then use it like this:
- strbuf_addstr(&buf, "list\n");
- write_in_full(helper->in, buf.buf, buf.len);
- strbuf_reset(&buf);
+ write_str_in_full(helper->in, "list\n");
Thus not requiring the added allocation, and still avoiding
the maintenance risk of literal string lengths.
These days, compilers are good enough that strlen("literal")
imposes no run-time cost.
Transformed via this:
perl -pi -e \
's/write_in_full\((.*?), (".*?"), \d+\)/write_str_in_full($1, $2)/'\
$(git grep -l 'write_in_full.*"')
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-12 10:54:32 +02:00
|
|
|
|
2006-02-28 20:26:21 +01:00
|
|
|
/* pager.c */
|
2019-04-29 10:28:14 +02:00
|
|
|
void setup_pager(void);
|
|
|
|
int pager_in_use(void);
|
2006-07-30 00:27:43 +02:00
|
|
|
extern int pager_use_color;
|
2019-04-29 10:28:14 +02:00
|
|
|
int term_columns(void);
|
pager: add a helper function to clear the last line in the terminal
There are a couple of places where we want to clear the last line on
the terminal, e.g. when a progress bar line is overwritten by a
shorter line, then the end of that progress line would remain visible,
unless we cover it up.
In 'progress.c' we did this by always appending a fixed number of
space characters to the next line (even if it was not shorter than the
previous), but as it turned out that fixed number was not quite large
enough, see the fix in 9f1fd84e15 (progress: clear previous progress
update dynamically, 2019-04-12). From then on we've been keeping
track of the length of the last displayed progress line and appending
the appropriate number of space characters to the next line, if
necessary, but, alas, this approach turned out to be error prone, see
the fix in 1aed1a5f25 (progress: avoid empty line when breaking the
progress line, 2019-05-19). The next patch in this series is about to
fix a case where we don't clear the last line, and on occasion do end
up with such garbage at the end of the line. It would be great if we
could do that without the need to deal with that without meticulously
computing the necessary number of space characters.
So add a helper function to clear the last line on the terminal using
an ANSI escape sequence, which has the advantage to clear the whole
line no matter how wide it is, even after the terminal width changed.
Such an escape sequence is not available on dumb terminals, though, so
in that case fall back to simply print a whole terminal width (as
reported by term_columns()) worth of space characters.
In 'editor.c' launch_specified_editor() already used this ANSI escape
sequence, so replace it with a call to this function.
Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-24 20:13:16 +02:00
|
|
|
void term_clear_line(void);
|
2019-04-29 10:28:14 +02:00
|
|
|
int decimal_width(uintmax_t);
|
|
|
|
int check_pager_config(const char *cmd);
|
|
|
|
void prepare_pager_args(struct child_process *, const char *pager);
|
2006-02-28 20:26:21 +01:00
|
|
|
|
2008-02-16 06:01:41 +01:00
|
|
|
extern const char *editor_program;
|
2010-08-30 15:38:38 +02:00
|
|
|
extern const char *askpass_program;
|
2008-02-16 06:01:59 +01:00
|
|
|
extern const char *excludes_file;
|
2007-07-20 14:06:09 +02:00
|
|
|
|
binary patch.
This adds "binary patch" to the diff output and teaches apply
what to do with them.
On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line. This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case. However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.
This patch allows the diff to generate "binary" patch when
operating under --full-index option. The binary patch follows
the usual extended git diff headers, and looks like this:
"GIT binary patch\n"
<length byte><data>"\n"
...
"\n"
Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding. Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters. The payload is the
same diff-delta as we use in the packfiles.
On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
|
|
|
/* base85 */
|
2007-04-10 00:56:33 +02:00
|
|
|
int decode_85(char *dst, const char *line, int linelen);
|
|
|
|
void encode_85(char *buf, const unsigned char *data, int bytes);
|
binary patch.
This adds "binary patch" to the diff output and teaches apply
what to do with them.
On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line. This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case. However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.
This patch allows the diff to generate "binary" patch when
operating under --full-index option. The binary patch follows
the usual extended git diff headers, and looks like this:
"GIT binary patch\n"
<length byte><data>"\n"
...
"\n"
Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding. Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters. The payload is the
same diff-delta as we use in the packfiles.
On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
|
|
|
|
2014-06-11 09:56:49 +02:00
|
|
|
/* pkt-line.c */
|
2011-02-24 15:30:19 +01:00
|
|
|
void packet_trace_identity(const char *prog);
|
2006-09-02 18:23:48 +02:00
|
|
|
|
2007-11-18 10:12:04 +01:00
|
|
|
/* add */
|
2008-05-12 19:58:10 +02:00
|
|
|
/*
|
|
|
|
* return 0 if success, 1 - if addition of a file failed and
|
|
|
|
* ADD_FILES_IGNORE_ERRORS was specified in flags
|
|
|
|
*/
|
2016-09-14 23:07:47 +02:00
|
|
|
int add_files_to_cache(const char *prefix, const struct pathspec *pathspec, int flags);
|
2007-11-18 10:12:04 +01:00
|
|
|
|
2007-08-31 22:13:42 +02:00
|
|
|
/* diff.c */
|
|
|
|
extern int diff_auto_refresh_index;
|
|
|
|
|
2007-02-16 01:32:45 +01:00
|
|
|
/* match-trees.c */
|
2019-06-27 11:28:51 +02:00
|
|
|
void shift_tree(struct repository *, const struct object_id *, const struct object_id *, struct object_id *, int);
|
|
|
|
void shift_tree_by(struct repository *, const struct object_id *, const struct object_id *, struct object_id *, const char *);
|
2007-02-16 01:32:45 +01:00
|
|
|
|
2007-11-02 08:24:27 +01:00
|
|
|
/*
|
|
|
|
* whitespace rules.
|
|
|
|
* used by both diff and apply
|
2010-11-30 09:29:11 +01:00
|
|
|
* last two digits are tab width
|
2007-11-02 08:24:27 +01:00
|
|
|
*/
|
2010-11-30 09:29:11 +01:00
|
|
|
#define WS_BLANK_AT_EOL 0100
|
|
|
|
#define WS_SPACE_BEFORE_TAB 0200
|
|
|
|
#define WS_INDENT_WITH_NON_TAB 0400
|
|
|
|
#define WS_CR_AT_EOL 01000
|
|
|
|
#define WS_BLANK_AT_EOF 02000
|
|
|
|
#define WS_TAB_IN_INDENT 04000
|
2009-09-06 07:21:17 +02:00
|
|
|
#define WS_TRAILING_SPACE (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
|
2010-11-30 09:29:11 +01:00
|
|
|
#define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
|
|
|
|
#define WS_TAB_WIDTH_MASK 077
|
2017-06-30 02:06:53 +02:00
|
|
|
/* All WS_* -- when extended, adapt diff.c emit_symbol */
|
|
|
|
#define WS_RULE_MASK 07777
|
2007-12-06 09:14:14 +01:00
|
|
|
extern unsigned whitespace_rule_cfg;
|
2019-04-29 10:28:14 +02:00
|
|
|
unsigned whitespace_rule(struct index_state *, const char *);
|
|
|
|
unsigned parse_whitespace_rule(const char *);
|
|
|
|
unsigned ws_check(const char *line, int len, unsigned ws_rule);
|
|
|
|
void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE *stream, const char *set, const char *reset, const char *ws);
|
|
|
|
char *whitespace_error_string(unsigned ws);
|
|
|
|
void ws_fix_copy(struct strbuf *, const char *, int, unsigned, int *);
|
|
|
|
int ws_blank_line(const char *line, int len, unsigned ws_rule);
|
2010-11-30 09:29:11 +01:00
|
|
|
#define ws_tab_width(rule) ((rule) & WS_TAB_WIDTH_MASK)
|
2007-11-02 08:24:27 +01:00
|
|
|
|
2007-11-18 10:13:32 +01:00
|
|
|
/* ls-files */
|
2017-06-13 00:13:58 +02:00
|
|
|
void overlay_tree_on_index(struct index_state *istate,
|
|
|
|
const char *tree_name, const char *prefix);
|
2007-11-18 10:13:32 +01:00
|
|
|
|
setup: make startup_info available everywhere
Commit a60645f (setup: remember whether repository was
found, 2010-08-05) introduced the startup_info structure,
which records some parts of the setup_git_directory()
process (notably, whether we actually found a repository or
not).
One of the uses of this data is for functions to behave
appropriately based on whether we are in a repo. But the
startup_info struct is just a pointer to storage provided by
the main program, and the only program that sets it up is
the git.c wrapper. Thus builtins have access to
startup_info, but externally linked programs do not.
Worse, library code which is accessible from both has to be
careful about accessing startup_info. This can be used to
trigger a die("BUG") via get_sha1():
$ git fast-import <<-\EOF
tag foo
from HEAD:./whatever
EOF
fatal: BUG: startup_info struct is not initialized.
Obviously that's fairly nonsensical input to feed to
fast-import, but we should never hit a die("BUG"). And there
may be other ways to trigger it if other non-builtins
resolve sha1s.
So let's point the storage for startup_info to a static
variable in setup.c, making it available to all users of the
library code. We _could_ turn startup_info into a regular
extern struct, but doing so would mean tweaking all of the
existing use sites. So let's leave the pointer indirection
in place. We can, however, drop any checks for NULL, as
they will always be false (and likewise, we can drop the
test covering this case, which was a rather artificial
situation using one of the test-* programs).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-05 23:10:27 +01:00
|
|
|
/* setup.c */
|
2010-08-06 04:40:35 +02:00
|
|
|
struct startup_info {
|
2010-08-06 04:46:33 +02:00
|
|
|
int have_repository;
|
2010-12-02 00:33:22 +01:00
|
|
|
const char *prefix;
|
2021-12-09 06:08:26 +01:00
|
|
|
const char *original_cwd;
|
2010-08-06 04:40:35 +02:00
|
|
|
};
|
|
|
|
extern struct startup_info *startup_info;
|
2021-12-09 06:08:26 +01:00
|
|
|
extern const char *tmp_original_cwd;
|
2010-08-06 04:40:35 +02:00
|
|
|
|
2012-10-26 17:53:49 +02:00
|
|
|
/* merge.c */
|
|
|
|
struct commit_list;
|
2018-09-21 17:57:29 +02:00
|
|
|
int try_merge_command(struct repository *r,
|
|
|
|
const char *strategy, size_t xopts_nr,
|
2012-10-26 17:53:49 +02:00
|
|
|
const char **xopts, struct commit_list *common,
|
|
|
|
const char *head_arg, struct commit_list *remotes);
|
2018-09-21 17:57:29 +02:00
|
|
|
int checkout_fast_forward(struct repository *r,
|
|
|
|
const struct object_id *from,
|
2017-05-07 00:10:33 +02:00
|
|
|
const struct object_id *to,
|
2012-10-26 17:53:49 +02:00
|
|
|
int overwrite_ignore);
|
|
|
|
|
2010-03-06 21:34:41 +01:00
|
|
|
|
2012-03-30 09:52:18 +02:00
|
|
|
int sane_execvp(const char *file, char *const argv[]);
|
|
|
|
|
2013-06-20 10:37:51 +02:00
|
|
|
/*
|
|
|
|
* A struct to encapsulate the concept of whether a file has changed
|
|
|
|
* since we last checked it. This uses criteria similar to those used
|
|
|
|
* for the index.
|
|
|
|
*/
|
|
|
|
struct stat_validity {
|
|
|
|
struct stat_data *sd;
|
|
|
|
};
|
|
|
|
|
|
|
|
void stat_validity_clear(struct stat_validity *sv);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns 1 if the path is a regular file (or a symlink to a regular
|
|
|
|
* file) and matches the saved stat_validity, 0 otherwise. A missing
|
|
|
|
* or inaccessible file is considered a match if the struct was just
|
|
|
|
* initialized, or if the previous update found an inaccessible file.
|
|
|
|
*/
|
|
|
|
int stat_validity_check(struct stat_validity *sv, const char *path);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Update the stat_validity from a file opened at descriptor fd. If
|
|
|
|
* the file is missing, inaccessible, or not a regular file, then
|
|
|
|
* future calls to stat_validity_check will match iff one of those
|
|
|
|
* conditions continues to be true.
|
|
|
|
*/
|
|
|
|
void stat_validity_update(struct stat_validity *sv, int fd);
|
|
|
|
|
2014-02-27 13:56:52 +01:00
|
|
|
int versioncmp(const char *s1, const char *s2);
|
|
|
|
|
2015-11-10 12:42:38 +01:00
|
|
|
/*
|
|
|
|
* Create a directory and (if share is nonzero) adjust its permissions
|
|
|
|
* according to the shared_repository setting. Only use this for
|
|
|
|
* directories under $GIT_DIR. Don't use it for working tree
|
|
|
|
* directories.
|
|
|
|
*/
|
|
|
|
void safe_create_dir(const char *dir, int share);
|
|
|
|
|
2017-12-03 22:27:39 +01:00
|
|
|
/*
|
|
|
|
* Should we print an ellipsis after an abbreviated SHA-1 value
|
|
|
|
* when doing diff-raw output or indicating a detached HEAD?
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int print_sha1_ellipsis(void);
|
2017-12-03 22:27:39 +01:00
|
|
|
|
2019-01-02 16:38:32 +01:00
|
|
|
/* Return 1 if the file is empty or does not exists, 0 otherwise. */
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_empty_or_missing_file(const char *filename);
|
2019-01-02 16:38:32 +01:00
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
#endif /* CACHE_H */
|