2005-04-08 00:13:13 +02:00
|
|
|
#ifndef CACHE_H
|
|
|
|
#define CACHE_H
|
|
|
|
|
2005-12-05 20:54:29 +01:00
|
|
|
#include "git-compat-util.h"
|
Rewrite convert_to_{git,working_tree} to use strbuf's.
* Now, those functions take an "out" strbuf argument, where they store their
result if any. In that case, it also returns 1, else it returns 0.
* those functions support "in place" editing, in the sense that it's OK to
call them this way:
convert_to_git(path, sb->buf, sb->len, sb);
When doable, conversions are done in place for real, else the strbuf
content is just replaced with the new one, transparentely for the caller.
If you want to create a new filter working this way, being the accumulation
of filter1, filter2, ... filtern, then your meta_filter would be:
int meta_filter(..., const char *src, size_t len, struct strbuf *sb)
{
int ret = 0;
ret |= filter1(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
ret |= filter2(...., src, len, sb);
if (ret) {
src = sb->buf;
len = sb->len;
}
....
return ret | filtern(..., src, len, sb);
}
That's why subfilters the convert_to_* functions called were also rewritten
to work this way.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-09-16 15:51:04 +02:00
|
|
|
#include "strbuf.h"
|
2013-11-14 20:20:58 +01:00
|
|
|
#include "hashmap.h"
|
2011-02-23 00:41:20 +01:00
|
|
|
#include "gettext.h"
|
2014-08-07 13:59:17 +02:00
|
|
|
#include "string-list.h"
|
2023-02-24 01:09:31 +01:00
|
|
|
#include "pathspec.h"
|
2023-02-24 01:09:30 +01:00
|
|
|
#include "object.h"
|
2023-02-24 01:09:31 +01:00
|
|
|
#include "statinfo.h"
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2006-02-26 16:13:46 +01:00
|
|
|
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DTYPE(de) ((de)->d_type)
|
|
|
|
#else
|
2006-01-20 22:33:20 +01:00
|
|
|
#undef DT_UNKNOWN
|
|
|
|
#undef DT_DIR
|
|
|
|
#undef DT_REG
|
|
|
|
#undef DT_LNK
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DT_UNKNOWN 0
|
|
|
|
#define DT_DIR 1
|
|
|
|
#define DT_REG 2
|
2005-05-13 02:16:04 +02:00
|
|
|
#define DT_LNK 3
|
2005-04-30 18:51:03 +02:00
|
|
|
#define DTYPE(de) DT_UNKNOWN
|
|
|
|
#endif
|
|
|
|
|
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well
Previously diff_tree(), which is now named ll_diff_tree_sha1(), was
generating diff_filepair(s) for two trees t1 and t2, and that was
usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes
a commit introduces.
In Git, however, we have fundamentally built flexibility in that a
commit can have many parents - 1 for a plain commit, 2 for a simple merge,
but also more than 2 for merging several heads at once.
For merges there is a so called combine-diff, which shows diff, a merge
introduces by itself, omitting changes done by any parent. That works
through first finding paths, that are different to all parents, and then
showing generalized diff, with separate columns for +/- for each parent.
The code lives in combine-diff.c .
There is an impedance mismatch, however, in that a commit could
generally have any number of parents, and that while diffing trees, we
divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there
is no special casing for multiple parents commits in e.g.
revision-walker .
That impedance mismatch *hurts* *performance* *badly* for generating
combined diffs - in "combine-diff: optimize combine_diff_path
sets intersection" I've already removed some slowness from it, but from
the timings provided there, it could be seen, that combined diffs still
cost more than an order of magnitude more cpu time, compared to diff for
usual commits, and that would only be an optimistic estimate, if we take
into account that for e.g. linux.git there is only one merge for several
dozens of plain commits.
That slowness comes from the fact that currently, while generating
combined diff, a lot of time is spent computing diff(commit,commit^2)
just to only then intersect that huge diff to almost small set of files
from diff(commit,commit^1).
That's because at present, to compute combine-diff, for first finding
paths, that "every parent touches", we use the following combine-diff
property/definition:
D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths)
where
D(A,P1...Pn) is combined diff between commit A, and parents Pi
and
D(A,Pi) is usual two-tree diff Pi..A
So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n
1-parent diffs and intersecting results will be slow.
And usually, for linux.git and other topic-based workflows, that
D(A,P2) is huge, because, if merge-base of A and P2, is several dozens
of merges (from A, via first parent) below, that D(A,P2) will be diffing
sum of merges from several subsystems to 1 subsystem.
The solution is to avoid computing n 1-parent diffs, and to find
changed-to-all-parents paths via scanning A's and all Pi's trees
simultaneously, at each step comparing their entries, and based on that
comparison, populate paths result, and deduce we could *skip*
*recursing* into subdirectories, if at least for 1 parent, sha1 of that
dir tree is the same as in A. That would save us from doing significant
amount of needless work.
Such approach is very similar to what diff_tree() does, only there we
deal with scanning only 2 trees simultaneously, and for n+1 tree, the
logic is a bit more complex:
D(T,P1...Pn) calculation scheme
-------------------------------
D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set)
D(T,Pj) - diff between T..Pj
D(T,P1...Pn) - combined diff from T to parents P1,...,Pn
We start from all trees, which are sorted, and compare their entries in
lock-step:
T P1 Pn
- - -
|t| |p1| |pn|
|-| |--| ... |--| imin = argmin(p1...pn)
| | | | | |
|-| |--| |--|
|.| |. | |. |
. . .
. . .
at any time there could be 3 cases:
1) t < p[imin];
2) t > p[imin];
3) t = p[imin].
Schematic deduction of what every case means, and what to do, follows:
1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓
2) t > p[imin]
2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓
2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓
3) t = p[imin]
3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate
3.2) pi = p[imin] -> investigate δ(t,pi)
|
|
v
3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø ->
⎧δ(t,pi) - if pi=p[imin]
-> D += ⎨
⎩"+t" - if pi>p[imin]
in any case t↓ ∀ pi=p[imin] pi↓
~
For comparison, here is how diff_tree() works:
D(A,B) calculation scheme
-------------------------
A B
- -
|a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓
|-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓
| | | | a = b -> investigate δ(a,b) a↓ b↓
|-| |-|
|.| |.|
. .
. .
~~~~~~~~
This patch generalizes diff tree-walker to work with arbitrary number of
parents as described above - i.e. now there is a resulting tree t, and
some parents trees tp[i] i=[0..nparent). The generalization builds on
the fact that usual diff
D(A,B)
is by definition the same as combined diff
D(A,[B]),
so if we could rework the code for common case and make it be not slower
for nparent=1 case, usual diff(t1,t2) generation will not be slower, and
multiparent diff tree-walker would greatly benefit generating
combine-diff.
What we do is as follows:
1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be
a paths generator (new name diff_tree_paths()), with each generated path
being `struct combine_diff_path` with info for path, new sha1,mode and for
every parent which sha1,mode it was in it.
2) From that info, we can still generate usual diff queue with
struct diff_filepairs, via "exporting" generated
combine_diff_path, if we know we run for nparent=1 case.
(see emit_diff() which is now named emit_diff_first_parent_only())
3) In order for diff_can_quit_early(), which checks
DIFF_OPT_TST(opt, HAS_CHANGES))
to work, that exporting have to be happening not in bulk, but
incrementally, one diff path at a time.
For such consumers, there is a new callback in diff_options
introduced:
->pathchange(opt, struct combine_diff_path *)
which, if set to !NULL, is called for every generated path.
(see new compat ll_diff_tree_sha1() wrapper around new paths
generator for setup)
4) The paths generation itself, is reworked from previous
ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation
scheme" provided above:
On the start we allocate [nparent] arrays in place what was
earlier just for one parent tree.
then we just generalize loops, and comparison according to the
algorithm.
Some notes(*):
1) alloca(), for small arrays, is used for "runs not slower for
nparent=1 case than before" goal - if we change it to xmalloc()/free()
the timings get ~1% worse. For alloca() we use just-introduced
xalloca/xalloca_free compatibility wrappers, so it should not be a
portability problem.
2) For every parent tree, we need to keep a tag, whether entry from that
parent equals to entry from minimal parent. For performance reasons I'm
keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ.
Not doing so, we'd need to alloca another [nparent] array, which hurts
performance.
3) For emitted paths, memory could be reused, if we know the path was
processed via callback and will not be needed later. We use efficient
hand-made realloc-style path_appendnew(), that saves us from ~1-1.5%
of potential additional slowdown.
4) goto(s) are used in several places, as the code executes a little bit
faster with lowered register pressure.
Also
- we should now check for FIND_COPIES_HARDER not only when two entries
names are the same, and their hashes are equal, but also for a case,
when a path was removed from some of all parents having it.
The reason is, if we don't, that path won't be emitted at all (see
"a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants
all paths - with diff or without - to be emitted, to be later analyzed
for being copies sources.
The new check is only necessary for nparent >1, as for nparent=1 case
xmin_eqtotal always =1 =nparent, and a path is always added to diff as
removal.
~~~~~~~~
Timings for
# without -c, i.e. testing only nparent=1 case
`git log --raw --no-abbrev --no-renames`
before and after the patch are as follows:
navy.git linux.git v3.10..v3.11
before 0.611s 1.889s
after 0.619s 1.907s
slowdown 1.3% 0.9%
This timings show we did no harm to usual diff(tree1,tree2) generation.
From the table we can see that we actually did ~1% slowdown, but I think
I've "earned" that 1% in the previous patch ("tree-diff: reuse base
str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case,
net timings stays approximately the same.
The output also stayed the same.
(*) If we revert 1)-4) to more usual techniques, for nparent=1 case,
we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as
"do no harm for nparent=1 case" rule.
For linux.git, combined diff will run an order of magnitude faster and
appropriate timings will be provided in the next commit, as we'll be
taking advantage of the new diff tree-walker for combined-diff
generation there.
P.S. and combined diff is not some exotic/for-play-only stuff - for
example for a program I write to represent Git archives as readonly
filesystem, there is initial scan with
`git log --reverse --raw --no-abbrev --no-renames -c`
to extract log of what was created/changed when, as a result building a
map
{} sha1 -> in which commit (and date) a content was added
that `-c` means also show combined diff for merges, and without them, if
a merge is non-trivial (merges changes from two parents with both having
separate changes to a file), or an evil one, the map will not be full,
i.e. some valid sha1 would be absent from it.
That case was my initial motivation for combined diffs speedup.
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-06 23:46:26 +02:00
|
|
|
/*
|
|
|
|
* Some mode bits are also used internally for computations.
|
|
|
|
*
|
|
|
|
* They *must* not overlap with any valid modes, and they *must* not be emitted
|
|
|
|
* to outside world - i.e. appear on disk or network. In other words, it's just
|
|
|
|
* temporary fields, which we internally use, but they have to stay in-house.
|
|
|
|
*
|
|
|
|
* ( such approach is valid, as standard S_IF* fits into 16 bits, and in Git
|
|
|
|
* codebase mode is `unsigned int` which is assumed to be at least 32 bits )
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* used internally in tree-diff */
|
|
|
|
#define S_DIFFTREE_IFXMIN_NEQ 0x80000000
|
|
|
|
|
|
|
|
|
2005-07-14 03:46:20 +02:00
|
|
|
/*
|
|
|
|
* Intensive research over the course of many years has shown that
|
|
|
|
* port 9418 is totally unused by anything else. Or
|
|
|
|
*
|
|
|
|
* Your search - "port 9418" - did not match any documents.
|
|
|
|
*
|
|
|
|
* as www.google.com puts it.
|
2005-09-12 20:23:00 +02:00
|
|
|
*
|
|
|
|
* This port has been properly assigned for git use by IANA:
|
|
|
|
* git (Assigned-9418) [I06-050728-0001].
|
|
|
|
*
|
|
|
|
* git 9418/tcp git pack transfer service
|
|
|
|
* git 9418/udp git pack transfer service
|
|
|
|
*
|
|
|
|
* with Linus Torvalds <torvalds@osdl.org> as the point of
|
|
|
|
* contact. September 2005.
|
|
|
|
*
|
|
|
|
* See http://www.iana.org/assignments/port-numbers
|
2005-07-14 03:46:20 +02:00
|
|
|
*/
|
|
|
|
#define DEFAULT_GIT_PORT 9418
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
/*
|
|
|
|
* Basic data structures for the directory cache
|
|
|
|
*/
|
|
|
|
|
|
|
|
#define CACHE_SIGNATURE 0x44495243 /* "DIRC" */
|
|
|
|
struct cache_header {
|
2013-08-18 21:41:51 +02:00
|
|
|
uint32_t hdr_signature;
|
|
|
|
uint32_t hdr_version;
|
|
|
|
uint32_t hdr_entries;
|
2005-04-08 00:13:13 +02:00
|
|
|
};
|
|
|
|
|
2012-04-04 18:12:43 +02:00
|
|
|
#define INDEX_FORMAT_LB 2
|
|
|
|
#define INDEX_FORMAT_UB 4
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
struct cache_entry {
|
2013-11-14 20:21:58 +01:00
|
|
|
struct hashmap_entry ent;
|
2013-06-20 10:37:50 +02:00
|
|
|
struct stat_data ce_stat_data;
|
2005-04-15 19:44:27 +02:00
|
|
|
unsigned int ce_mode;
|
2008-01-15 01:03:17 +01:00
|
|
|
unsigned int ce_flags;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
unsigned int mem_pool_allocated;
|
2012-07-11 11:22:37 +02:00
|
|
|
unsigned int ce_namelen;
|
2014-06-13 14:19:36 +02:00
|
|
|
unsigned int index; /* for link extension */
|
2016-09-05 22:07:52 +02:00
|
|
|
struct object_id oid;
|
2006-01-07 10:33:54 +01:00
|
|
|
char name[FLEX_ARRAY]; /* more */
|
2005-04-08 00:13:13 +02:00
|
|
|
};
|
|
|
|
|
2005-04-16 07:51:44 +02:00
|
|
|
#define CE_STAGEMASK (0x3000)
|
2008-08-17 08:02:08 +02:00
|
|
|
#define CE_EXTENDED (0x4000)
|
2006-02-09 06:15:24 +01:00
|
|
|
#define CE_VALID (0x8000)
|
2005-04-16 17:33:23 +02:00
|
|
|
#define CE_STAGESHIFT 12
|
2005-04-16 07:51:44 +02:00
|
|
|
|
2008-10-01 06:04:01 +02:00
|
|
|
/*
|
2014-06-13 14:19:25 +02:00
|
|
|
* Range 0xFFFF0FFF in ce_flags is divided into
|
2008-10-01 06:04:01 +02:00
|
|
|
* two parts: in-memory flags and on-disk ones.
|
|
|
|
* Flags in CE_EXTENDED_FLAGS will get saved on-disk
|
|
|
|
* if you want to save a new flag, add it in
|
|
|
|
* CE_EXTENDED_FLAGS
|
|
|
|
*
|
|
|
|
* In-memory only flags
|
|
|
|
*/
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_UPDATE (1 << 16)
|
|
|
|
#define CE_REMOVE (1 << 17)
|
|
|
|
#define CE_UPTODATE (1 << 18)
|
|
|
|
#define CE_ADDED (1 << 19)
|
Fix name re-hashing semantics
We handled the case of removing and re-inserting cache entries badly,
which is something that merging commonly needs to do (removing the
different stages, and then re-inserting one of them as the merged
state).
We even had a rather ugly special case for this failure case, where
replace_index_entry() basically turned itself into a no-op if the new
and the old entries were the same, exactly because the hash routines
didn't handle it on their own.
So what this patch does is to not just have the UNHASHED bit, but a
HASHED bit too, and when you insert an entry into the name hash, that
involves:
- clear the UNHASHED bit, because now it's valid again for lookup
(which is really all that UNHASHED meant)
- if we're being lazy, we're done here (but we still want to clear the
UNHASHED bit regardless of lazy mode, since we can become unlazy
later, and so we need the UNHASHED bit to always be set correctly,
even if we never actually insert the entry into the hash list)
- if it was already hashed, we just leave it on the list
- otherwise mark it HASHED and insert it into the list
this all means that unhashing and rehashing a name all just works
automatically. Obviously, you cannot change the name of an entry (that
would be a serious bug), but nothing can validly do that anyway (you'd
have to allocate a new struct cache_entry anyway since the name length
could change), so that's not a new limitation.
The code actually gets simpler in many ways, although the lazy hashing
does mean that there are a few odd cases (ie something can be marked
unhashed even though it was never on the hash in the first place, and
isn't actually marked hashed!).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-23 05:37:40 +01:00
|
|
|
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_HASHED (1 << 20)
|
2017-09-22 18:35:40 +02:00
|
|
|
#define CE_FSMONITOR_VALID (1 << 21)
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_WT_REMOVE (1 << 22) /* remove in work directory */
|
|
|
|
#define CE_CONFLICTED (1 << 23)
|
2008-01-15 01:03:17 +01:00
|
|
|
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_UNPACKED (1 << 24)
|
unpack-trees: move all skip-worktree checks back to unpack_trees()
Earlier, the will_have_skip_worktree() checks are done in various
places, which makes it hard to traverse the index tree-alike, required
by excluded_from_list(). This patch moves all the checks into two
loops in unpack_trees().
Entries in index in this operation can be classified into two
groups: ones already in index before unpack_trees() is called and ones
added to index after traverse_trees() is called.
In both groups, before checking file status on worktree, the future
skip-worktree bit must be checked, so that if an entry will be outside
worktree, worktree should not be checked.
For the first group, the future skip-worktree bit is precomputed and
stored as CE_NEW_SKIP_WORKTREE in the first loop before
traverse_trees() is called so that *way_merge() function does not need
to compute it again.
For the second group, because we don't know what entries will be in
this group until traverse_trees() finishes, operations that need
future skip-worktree check is delayed until CE_NEW_SKIP_WORKTREE is
computed in the second loop. CE_ADDED is used to mark entries in the
second group.
CE_ADDED and CE_NEW_SKIP_WORKTREE are temporary flags used in
unpack_trees(). CE_ADDED is only used by add_to_index(), which should
not be called while unpack_trees() is running.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-27 07:24:04 +01:00
|
|
|
#define CE_NEW_SKIP_WORKTREE (1 << 25)
|
unpack-trees.c: prepare for looking ahead in the index
This prepares but does not yet implement a look-ahead in the index entries
when traverse-trees.c decides to give us tree entries in an order that
does not match what is in the index.
A case where a look-ahead in the index is necessary happens when merging
branch B into branch A while the index matches the current branch A, using
a tree O as their common ancestor, and these three trees looks like this:
O A B
t t
t-i t-i t-i
t-j t-j
t/1
t/2
The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and
B first, and notices that A may have a matching "t" behind "t-i" and "t-j"
(indeed it does), and tells A to give that entry instead. After unpacking
blob "t" from tree B (as it hasn't changed since O in B and A removed it,
it will result in its removal), it descends into directory "t/".
The side that walked index in parallel to the tree traversal used to be
implemented with one pointer, o->pos, that points at the next index entry
to be processed. When this happens, the pointer o->pos still points at
"t-i" that is the first entry. We should be able to skip "t-i" and "t-j"
and locate "t/1" from the index while the recursive invocation of
traverse_trees() walks and match entries found there, and later come back
to process "t-i".
While that look-ahead is not implemented yet, this adds a flag bit,
CE_UNPACKED, to mark the entries in the index that has already been
processed. o->pos pointer has been renamed to o->cache_bottom and it
points at the first entry that may still need to be processed.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 23:59:54 +01:00
|
|
|
|
checkout: avoid unnecessary match_pathspec calls
In checkout_paths() we do this
- for all updated items, call match_pathspec
- for all items, call match_pathspec (inside unmerge_cache)
- for all items, call match_pathspec (for showing "path .. is unmerged)
- for updated items, call match_pathspec and update paths
That's a lot of duplicate match_pathspec(s) and the function is not
exactly cheap to be called so many times, especially on large indexes.
This patch makes it call match_pathspec once per updated index entry,
save the result in ce_flags and reuse the results in the following
loops.
The changes in 0a1283b (checkout $tree $path: do not clobber local
changes in $path not in $tree - 2011-09-30) limit the affected paths
to ones we read from $tree. We do not do anything to other modified
entries in this case, so the "for all items" above could be modified
to "for all updated items". But..
The command's behavior now is modified slightly: unmerged entries that
match $path, but not updated by $tree, are now NOT touched. Although
this should be considered a bug fix, not a regression. A new test is
added for this change.
And while at there, free ps_matched after use.
The following command is tested on webkit, 215k entries. The pattern
is chosen mainly to make match_pathspec sweat:
git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*"
before after
real 0m3.493s 0m2.737s
user 0m2.239s 0m1.586s
sys 0m1.252s 0m1.151s
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 06:58:21 +01:00
|
|
|
/* used to temporarily mark paths matched by pathspecs */
|
|
|
|
#define CE_MATCHED (1 << 26)
|
|
|
|
|
2014-06-13 14:19:39 +02:00
|
|
|
#define CE_UPDATE_IN_BASE (1 << 27)
|
2014-06-13 14:19:43 +02:00
|
|
|
#define CE_STRIP_NAME (1 << 28)
|
2014-06-13 14:19:39 +02:00
|
|
|
|
2008-10-01 06:04:01 +02:00
|
|
|
/*
|
|
|
|
* Extended on-disk flags
|
|
|
|
*/
|
2010-11-27 07:22:16 +01:00
|
|
|
#define CE_INTENT_TO_ADD (1 << 29)
|
|
|
|
#define CE_SKIP_WORKTREE (1 << 30)
|
2008-10-01 06:04:01 +02:00
|
|
|
/* CE_EXTENDED2 is for future extension */
|
2015-12-29 07:35:46 +01:00
|
|
|
#define CE_EXTENDED2 (1U << 31)
|
2008-10-01 06:04:01 +02:00
|
|
|
|
2009-08-20 15:46:57 +02:00
|
|
|
#define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
|
2008-10-01 06:04:01 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Safeguard to avoid saving wrong flags:
|
|
|
|
* - CE_EXTENDED2 won't get saved until its semantic is known
|
|
|
|
* - Bits in 0x0000FFFF have been saved in ce_flags already
|
|
|
|
* - Bits in 0x003F0000 are currently in-memory flags
|
|
|
|
*/
|
|
|
|
#if CE_EXTENDED_FLAGS & 0x803FFFFF
|
|
|
|
#error "CE_EXTENDED_FLAGS out of range"
|
|
|
|
#endif
|
|
|
|
|
2016-02-16 23:34:44 +01:00
|
|
|
/* Forward structure decls */
|
2013-07-14 10:35:25 +02:00
|
|
|
struct pathspec;
|
2018-07-01 03:25:00 +02:00
|
|
|
struct tree;
|
2013-07-14 10:35:25 +02:00
|
|
|
|
2008-02-23 05:41:17 +01:00
|
|
|
/*
|
|
|
|
* Copy the sha1 and stat state of a cache entry from one to
|
|
|
|
* another. But we never change the name, or the hash state!
|
|
|
|
*/
|
2013-06-02 17:46:51 +02:00
|
|
|
static inline void copy_cache_entry(struct cache_entry *dst,
|
|
|
|
const struct cache_entry *src)
|
2008-02-23 05:41:17 +01:00
|
|
|
{
|
2013-11-14 20:22:27 +01:00
|
|
|
unsigned int state = dst->ce_flags & CE_HASHED;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
int mem_pool_allocated = dst->mem_pool_allocated;
|
2008-02-23 05:41:17 +01:00
|
|
|
|
|
|
|
/* Don't copy hash chain and name */
|
2013-11-14 20:21:58 +01:00
|
|
|
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
|
|
|
|
offsetof(struct cache_entry, name) -
|
|
|
|
offsetof(struct cache_entry, ce_stat_data));
|
2008-02-23 05:41:17 +01:00
|
|
|
|
|
|
|
/* Restore the hash state */
|
2013-11-14 20:22:27 +01:00
|
|
|
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
|
|
|
|
/* Restore the mem_pool_allocated flag */
|
|
|
|
dst->mem_pool_allocated = mem_pool_allocated;
|
2008-02-23 05:41:17 +01:00
|
|
|
}
|
|
|
|
|
2012-07-11 11:22:37 +02:00
|
|
|
static inline unsigned create_ce_flags(unsigned stage)
|
2008-01-19 08:42:00 +01:00
|
|
|
{
|
2012-07-11 11:22:37 +02:00
|
|
|
return (stage << CE_STAGESHIFT);
|
2008-01-19 08:42:00 +01:00
|
|
|
}
|
|
|
|
|
2012-07-11 11:22:37 +02:00
|
|
|
#define ce_namelen(ce) ((ce)->ce_namelen)
|
2005-04-16 17:33:23 +02:00
|
|
|
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
|
2008-01-15 01:03:17 +01:00
|
|
|
#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
|
2008-01-19 08:45:24 +01:00
|
|
|
#define ce_uptodate(ce) ((ce)->ce_flags & CE_UPTODATE)
|
2009-08-20 15:46:57 +02:00
|
|
|
#define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
|
2008-01-19 08:45:24 +01:00
|
|
|
#define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
|
2015-08-22 03:08:05 +02:00
|
|
|
#define ce_intent_to_add(ce) ((ce)->ce_flags & CE_INTENT_TO_ADD)
|
2005-04-16 17:33:23 +02:00
|
|
|
|
2013-06-02 17:46:51 +02:00
|
|
|
static inline unsigned int ce_mode_from_stat(const struct cache_entry *ce,
|
|
|
|
unsigned int mode)
|
2007-02-17 07:43:48 +01:00
|
|
|
{
|
2007-03-02 22:11:30 +01:00
|
|
|
extern int trust_executable_bit, has_symlinks;
|
|
|
|
if (!has_symlinks && S_ISREG(mode) &&
|
2008-01-15 01:03:17 +01:00
|
|
|
ce && S_ISLNK(ce->ce_mode))
|
2007-03-02 22:11:30 +01:00
|
|
|
return ce->ce_mode;
|
2007-02-17 07:43:48 +01:00
|
|
|
if (!trust_executable_bit && S_ISREG(mode)) {
|
2008-01-15 01:03:17 +01:00
|
|
|
if (ce && S_ISREG(ce->ce_mode))
|
2007-02-17 07:43:48 +01:00
|
|
|
return ce->ce_mode;
|
|
|
|
return create_ce_mode(0666);
|
|
|
|
}
|
|
|
|
return create_ce_mode(mode);
|
|
|
|
}
|
2008-01-31 10:17:48 +01:00
|
|
|
static inline int ce_to_dtype(const struct cache_entry *ce)
|
|
|
|
{
|
|
|
|
unsigned ce_mode = ntohl(ce->ce_mode);
|
|
|
|
if (S_ISREG(ce_mode))
|
|
|
|
return DT_REG;
|
|
|
|
else if (S_ISDIR(ce_mode) || S_ISGITLINK(ce_mode))
|
|
|
|
return DT_DIR;
|
|
|
|
else if (S_ISLNK(ce_mode))
|
|
|
|
return DT_LNK;
|
|
|
|
else
|
|
|
|
return DT_UNKNOWN;
|
|
|
|
}
|
2005-04-17 07:26:31 +02:00
|
|
|
|
2023-02-24 01:09:31 +01:00
|
|
|
static inline int ce_path_match(struct index_state *istate,
|
|
|
|
const struct cache_entry *ce,
|
|
|
|
const struct pathspec *pathspec,
|
|
|
|
char *seen)
|
|
|
|
{
|
|
|
|
return match_pathspec(istate, pathspec, ce->name, ce_namelen(ce), 0, seen,
|
|
|
|
S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode));
|
|
|
|
}
|
|
|
|
|
2011-10-25 20:00:04 +02:00
|
|
|
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
|
2005-04-16 06:45:38 +02:00
|
|
|
|
2014-06-13 14:19:27 +02:00
|
|
|
#define SOMETHING_CHANGED (1 << 0) /* unclassified changes go here */
|
|
|
|
#define CE_ENTRY_CHANGED (1 << 1)
|
|
|
|
#define CE_ENTRY_REMOVED (1 << 2)
|
|
|
|
#define CE_ENTRY_ADDED (1 << 3)
|
2014-06-13 14:19:29 +02:00
|
|
|
#define RESOLVE_UNDO_CHANGED (1 << 4)
|
2014-06-13 14:19:31 +02:00
|
|
|
#define CACHE_TREE_CHANGED (1 << 5)
|
2014-06-13 14:19:44 +02:00
|
|
|
#define SPLIT_INDEX_ORDERED (1 << 6)
|
2015-03-08 11:12:39 +01:00
|
|
|
#define UNTRACKED_CHANGED (1 << 7)
|
2017-09-22 18:35:40 +02:00
|
|
|
#define FSMONITOR_CHANGED (1 << 8)
|
2014-06-13 14:19:27 +02:00
|
|
|
|
2014-06-13 14:19:36 +02:00
|
|
|
struct split_index;
|
2015-03-08 11:12:33 +01:00
|
|
|
struct untracked_cache;
|
2019-11-21 23:04:44 +01:00
|
|
|
struct progress;
|
2021-03-30 15:10:53 +02:00
|
|
|
struct pattern_list;
|
2015-03-08 11:12:33 +01:00
|
|
|
|
2022-05-23 15:48:40 +02:00
|
|
|
enum sparse_index_mode {
|
|
|
|
/*
|
|
|
|
* There are no sparse directories in the index at all.
|
|
|
|
*
|
|
|
|
* Repositories that don't use cone-mode sparse-checkout will
|
|
|
|
* always have their indexes in this mode.
|
|
|
|
*/
|
|
|
|
INDEX_EXPANDED = 0,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The index has already been collapsed to sparse directories
|
|
|
|
* whereever possible.
|
|
|
|
*/
|
|
|
|
INDEX_COLLAPSED,
|
|
|
|
|
|
|
|
/*
|
|
|
|
* The sparse directories that exist are outside the
|
|
|
|
* sparse-checkout boundary, but it is possible that some file
|
|
|
|
* entries could collapse to sparse directory entries.
|
|
|
|
*/
|
|
|
|
INDEX_PARTIALLY_SPARSE,
|
|
|
|
};
|
|
|
|
|
2007-04-02 03:14:06 +02:00
|
|
|
struct index_state {
|
|
|
|
struct cache_entry **cache;
|
2012-04-04 18:12:43 +02:00
|
|
|
unsigned int version;
|
2007-04-02 03:14:06 +02:00
|
|
|
unsigned int cache_nr, cache_alloc, cache_changed;
|
2009-12-25 09:30:51 +01:00
|
|
|
struct string_list *resolve_undo;
|
2007-04-02 03:14:06 +02:00
|
|
|
struct cache_tree *cache_tree;
|
2014-06-13 14:19:36 +02:00
|
|
|
struct split_index *split_index;
|
make USE_NSEC work as expected
Since the filesystem ext4 is now defined as stable in Linux v2.6.28,
and ext4 supports nanonsecond resolution timestamps natively, it is
time to make USE_NSEC work as expected.
This will make racy git situations less likely to happen. For 'git
checkout' this means it will be less likely that we have to open, read
the contents of the file into RAM, and check if file is really
modified or not. The result sould be a litle less used CPU time, less
pagefaults and a litle faster program, at least for 'git checkout'.
Since the number of possible racy git situations would increase when
disks gets faster, this patch would be more and more helpfull as times
go by. For a fast Solid State Disk, this patch should be helpfull.
Note that, when file operations starts to take less than 1 nanosecond,
one would again start to get more racy git situations.
For more info on racy git, see Documentation/technical/racy-git.txt
For more info on ext4, see http://kernelnewbies.org/Ext4
Signed-off-by: Kjetil Barvik <barvik@broadpark.no>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-19 21:08:29 +01:00
|
|
|
struct cache_time timestamp;
|
unpack_trees(): protect the handcrafted in-core index from read_cache()
unpack_trees() rebuilds the in-core index from scratch by allocating a new
structure and finishing it off by copying the built one to the final
index.
The resulting in-core index is Ok for most use, but read_cache() does not
recognize it as such. The function is meant to be no-op if you already
have loaded the index, until you call discard_cache().
This change the way read_cache() detects an already initialized in-core
index, by introducing an extra bit, and marks the handcrafted in-core
index as initialized, to avoid this problem.
A better fix in the longer term would be to change the read_cache() API so
that it will always discard and re-read from the on-disk index to avoid
confusion. But there are higher level API that have relied on the current
semantics, and they and their users all need to get converted, which is
outside the scope of 'maint' track.
An example of such a higher level API is write_cache_as_tree(), which is
used by git-write-tree as well as later Porcelains like git-merge, revert
and cherry-pick. In the longer term, we should remove read_cache() from
there and add one to cmd_write_tree(); other callers expect that the
in-core index they prepared is what gets written as a tree so no other
change is necessary for this particular codepath.
The original version of this patch marked the index by pointing an
otherwise wasted malloc'ed memory with o->result.alloc, but this version
uses Linus's idea to use a new "initialized" bit, which is conceptually
much cleaner.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 21:57:30 +02:00
|
|
|
unsigned name_hash_initialized : 1,
|
2018-01-07 23:30:14 +01:00
|
|
|
initialized : 1,
|
2019-02-15 18:59:21 +01:00
|
|
|
drop_cache_tree : 1,
|
|
|
|
updated_workdir : 1,
|
2019-05-19 09:45:33 +02:00
|
|
|
updated_skipworktree : 1,
|
2022-05-23 15:48:40 +02:00
|
|
|
fsmonitor_has_run_once : 1;
|
|
|
|
enum sparse_index_mode sparse_index;
|
2013-11-14 20:21:58 +01:00
|
|
|
struct hashmap name_hash;
|
2013-11-14 20:20:58 +01:00
|
|
|
struct hashmap dir_hash;
|
2018-05-02 02:25:44 +02:00
|
|
|
struct object_id oid;
|
2015-03-08 11:12:33 +01:00
|
|
|
struct untracked_cache *untracked;
|
2020-01-07 20:04:28 +01:00
|
|
|
char *fsmonitor_last_update;
|
2017-10-28 01:26:37 +02:00
|
|
|
struct ewah_bitmap *fsmonitor_dirty;
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
struct mem_pool *ce_mem_pool;
|
2019-11-21 23:04:44 +01:00
|
|
|
struct progress *progress;
|
2021-01-23 20:58:15 +01:00
|
|
|
struct repository *repo;
|
2021-03-30 15:10:53 +02:00
|
|
|
struct pattern_list *sparse_checkout_patterns;
|
2007-04-02 03:14:06 +02:00
|
|
|
};
|
|
|
|
|
2023-01-12 13:55:27 +01:00
|
|
|
/**
|
|
|
|
* A "struct index_state istate" must be initialized with
|
|
|
|
* INDEX_STATE_INIT or the corresponding index_state_init().
|
|
|
|
*
|
|
|
|
* If the variable won't be used again, use release_index() to free()
|
|
|
|
* its resources. If it needs to be used again use discard_index(),
|
|
|
|
* which does the same thing, but will use use index_state_init() at
|
treewide: always have a valid "index_state.repo" member
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.
Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".
This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".
We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.
Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().
For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).
This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".
I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.
1. 1fd9ae517c4 (repository: add repo reference to index_state,
2021-01-23)
2. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function,
add release_index(), 2023-01-12)
4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific,
2022-03-25)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17 14:57:00 +01:00
|
|
|
* the end. The discard_index() will use its own "istate->repo" as the
|
|
|
|
* "r" argument to index_state_init() in that case.
|
2023-01-12 13:55:27 +01:00
|
|
|
*/
|
treewide: always have a valid "index_state.repo" member
When the "repo" member was added to "the_index" in [1] the
repo_read_index() was made to populate it, but the unpopulated
"the_index" variable didn't get the same treatment.
Let's do that in initialize_the_repository() when we set it up, and
likewise for all of the current callers initialized an empty "struct
index_state".
This simplifies code that needs to deal with "the_index" or a custom
"struct index_state", we no longer need to second-guess this part of
the "index_state" deep in the stack. A recent example of such
second-guessing is the "istate->repo ? istate->repo : the_repository"
code in [2]. We can now simply use "istate->repo".
We're doing this by making use of the INDEX_STATE_INIT() macro (and
corresponding function) added in [3], which now have mandatory "repo"
arguments.
Because we now call index_state_init() in repository.c's
initialize_the_repository() we don't need to handle the case where we
have a "repo->index" whose "repo" member doesn't match the "repo"
we're setting up, i.e. the "Complete the double-reference" code in
repo_read_index() being altered here. That logic was originally added
in [1], and was working around the lack of what we now have in
initialize_the_repository().
For "fsmonitor-settings.c" we can remove the initialization of a NULL
"r" argument to "the_repository". This was added back in [4], and was
needed at the time for callers that would pass us the "r" from an
"istate->repo". Before this change such a change to
"fsmonitor-settings.c" would segfault all over the test suite (e.g. in
t0002-gitfile.sh).
This change has wider eventual implications for
"fsmonitor-settings.c". The reason the other lazy loading behavior in
it is required (starting with "if (!r->settings.fsmonitor) ..." is
because of the previously passed "r" being "NULL".
I have other local changes on top of this which move its configuration
reading to "prepare_repo_settings()" in "repo-settings.c", as we could
now start to rely on it being called for our "r". But let's leave all
of that for now, and narrowly remove this particular part of the
lazy-loading.
1. 1fd9ae517c4 (repository: add repo reference to index_state,
2021-01-23)
2. ee1f0c242ef (read-cache: add index.skipHash config option,
2023-01-06)
3. 2f6b1eb794e (cache API: add a "INDEX_STATE_INIT" macro/function,
add release_index(), 2023-01-12)
4. 1e0ea5c4316 (fsmonitor: config settings are repository-specific,
2022-03-25)
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-17 14:57:00 +01:00
|
|
|
#define INDEX_STATE_INIT(r) { \
|
|
|
|
.repo = (r), \
|
|
|
|
}
|
|
|
|
void index_state_init(struct index_state *istate, struct repository *r);
|
2023-01-12 13:55:27 +01:00
|
|
|
void release_index(struct index_state *istate);
|
|
|
|
|
2008-03-21 21:16:24 +01:00
|
|
|
/* Name hashing */
|
2019-04-29 10:28:14 +02:00
|
|
|
int test_lazy_init_name_hash(struct index_state *istate, int try_threaded);
|
|
|
|
void add_name_hash(struct index_state *istate, struct cache_entry *ce);
|
|
|
|
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
|
|
|
|
void free_name_hash(struct index_state *istate);
|
2008-03-21 21:16:24 +01:00
|
|
|
|
2018-07-02 21:49:31 +02:00
|
|
|
/* Cache entry creation and cleanup */
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Create cache_entry intended for use in the specified index. Caller
|
|
|
|
* is responsible for discarding the cache_entry with
|
|
|
|
* `discard_cache_entry`.
|
|
|
|
*/
|
|
|
|
struct cache_entry *make_cache_entry(struct index_state *istate,
|
|
|
|
unsigned int mode,
|
|
|
|
const struct object_id *oid,
|
|
|
|
const char *path,
|
|
|
|
int stage,
|
|
|
|
unsigned int refresh_options);
|
|
|
|
|
|
|
|
struct cache_entry *make_empty_cache_entry(struct index_state *istate,
|
|
|
|
size_t name_len);
|
|
|
|
|
|
|
|
/*
|
2021-05-04 18:27:28 +02:00
|
|
|
* Create a cache_entry that is not intended to be added to an index. If
|
|
|
|
* `ce_mem_pool` is not NULL, the entry is allocated within the given memory
|
|
|
|
* pool. Caller is responsible for discarding "loose" entries with
|
|
|
|
* `discard_cache_entry()` and the memory pool with
|
|
|
|
* `mem_pool_discard(ce_mem_pool, should_validate_cache_entries())`.
|
2018-07-02 21:49:31 +02:00
|
|
|
*/
|
|
|
|
struct cache_entry *make_transient_cache_entry(unsigned int mode,
|
|
|
|
const struct object_id *oid,
|
|
|
|
const char *path,
|
2021-05-04 18:27:28 +02:00
|
|
|
int stage,
|
|
|
|
struct mem_pool *ce_mem_pool);
|
2018-07-02 21:49:31 +02:00
|
|
|
|
2021-05-04 18:27:28 +02:00
|
|
|
struct cache_entry *make_empty_transient_cache_entry(size_t len,
|
|
|
|
struct mem_pool *ce_mem_pool);
|
2018-07-02 21:49:31 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Discard cache entry.
|
|
|
|
*/
|
|
|
|
void discard_cache_entry(struct cache_entry *ce);
|
|
|
|
|
2018-07-02 21:49:39 +02:00
|
|
|
/*
|
|
|
|
* Check configuration if we should perform extra validation on cache
|
|
|
|
* entries.
|
|
|
|
*/
|
|
|
|
int should_validate_cache_entries(void);
|
|
|
|
|
block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is
dominated in malloc() calls. This can be mitigated by allocating a
large block of memory and manage it ourselves via memory pools.
This change moves the cache entry allocation to be on top of memory
pools.
Design:
The index_state struct will gain a notion of an associated memory_pool
from which cache_entries will be allocated from. When reading in the
index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool will be discarded as well - so the lifetime of
a cache_entry is tied to the lifetime of the index_state that it was
allocated for.
In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:
Terminology:
- 'the_index': represents the logical view of the index
- 'split_index': represents the "base" cache entries. Read from the
split index file.
'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is. This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
Managing transient cache_entry structs:
Cache entries are usually allocated for an index, but this is not always
the case. Cache entries are sometimes allocated because this is the
type that the existing checkout_entry function works with. Because of
this, the existing code needs to handle cache entries associated with an
index / memory pool, and those that only exist transiently. Several
strategies were contemplated around how to handle this:
Chosen approach:
An extra field was added to the cache_entry type to track whether the
cache_entry was allocated from a memory pool or not. This is currently
an int field, as there are no more available bits in the existing
ce_flags bit field. If / when more bits are needed, this new field can
be turned into a proper bit field.
Alternatives:
1) Do not include any information about how the cache_entry was
allocated. Calling code would be responsible for tracking whether the
cache_entry needed to be freed or not.
Pro: No extra memory overhead to track this state
Con: Extra complexity in callers to handle this correctly.
The extra complexity and burden to not regress this behavior in the
future was more than we wanted.
2) cache_entry would gain knowledge about which mem_pool allocated it
Pro: Could (potentially) do extra logic to know when a mem_pool no
longer had references to any cache_entry
Con: cache_entry would grow heavier by a pointer, instead of int
We didn't see a tangible benefit to this approach
3) Do not add any extra information to a cache_entry, but when freeing a
cache entry, check if the memory exists in a region managed by existing
mem_pools.
Pro: No extra memory overhead to track state
Con: Extra computation is performed when freeing cache entries
We decided tracking and iterating over known memory pool regions was
less desirable than adding an extra field to track this stae.
Signed-off-by: Jameson Miller <jamill@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
|
|
|
/*
|
|
|
|
* Duplicate a cache_entry. Allocate memory for the new entry from a
|
|
|
|
* memory_pool. Takes into account cache_entry fields that are meant
|
|
|
|
* for managing the underlying memory allocation of the cache_entry.
|
|
|
|
*/
|
|
|
|
struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct index_state *istate);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Validate the cache entries in the index. This is an internal
|
|
|
|
* consistency check that the cache_entry structs are allocated from
|
|
|
|
* the expected memory pool.
|
|
|
|
*/
|
|
|
|
void validate_cache_entries(const struct index_state *istate);
|
|
|
|
|
2021-07-23 20:52:22 +02:00
|
|
|
/*
|
|
|
|
* Bulk prefetch all missing cache entries that are not GITLINKs and that match
|
|
|
|
* the given predicate. This function should only be called if
|
2023-03-28 15:58:57 +02:00
|
|
|
* repo_has_promisor_remote() returns true.
|
2021-07-23 20:52:22 +02:00
|
|
|
*/
|
|
|
|
typedef int (*must_prefetch_predicate)(const struct cache_entry *);
|
|
|
|
void prefetch_cache_entries(const struct index_state *istate,
|
|
|
|
must_prefetch_predicate must_prefetch);
|
|
|
|
|
2023-02-10 11:28:39 +01:00
|
|
|
#ifdef USE_THE_INDEX_VARIABLE
|
2019-01-24 09:29:12 +01:00
|
|
|
extern struct index_state the_index;
|
2022-11-19 14:07:36 +01:00
|
|
|
#endif
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2008-04-27 19:39:27 +02:00
|
|
|
#define INIT_DB_QUIET 0x0001
|
2016-09-25 05:14:37 +02:00
|
|
|
#define INIT_DB_EXIST_OK 0x0002
|
2008-04-27 19:39:27 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int init_db(const char *git_dir, const char *real_git_dir,
|
2020-02-22 21:17:38 +01:00
|
|
|
const char *template_dir, int hash_algo,
|
2020-06-24 16:46:32 +02:00
|
|
|
const char *initial_branch, unsigned int flags);
|
builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256". This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).
This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using. We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.
We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be. And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.
Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1. This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.
Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-21 00:35:41 +02:00
|
|
|
void initialize_repository_version(int hash_algo, int reinit);
|
2008-04-27 19:39:27 +02:00
|
|
|
|
2005-04-09 18:48:20 +02:00
|
|
|
/* Initialize and use the cache information */
|
2014-06-13 14:19:23 +02:00
|
|
|
struct lock_file;
|
2019-04-29 10:28:14 +02:00
|
|
|
void preload_index(struct index_state *index,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct pathspec *pathspec,
|
|
|
|
unsigned int refresh_flags);
|
2019-04-29 10:28:14 +02:00
|
|
|
int do_read_index(struct index_state *istate, const char *path,
|
2019-04-29 10:28:23 +02:00
|
|
|
int must_exist); /* for testting only! */
|
2019-04-29 10:28:14 +02:00
|
|
|
int read_index_from(struct index_state *, const char *path,
|
2019-04-29 10:28:23 +02:00
|
|
|
const char *gitdir);
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_index_unborn(struct index_state *);
|
2017-10-05 22:32:11 +02:00
|
|
|
|
2021-03-30 15:10:48 +02:00
|
|
|
void ensure_full_index(struct index_state *istate);
|
|
|
|
|
2017-10-05 22:32:11 +02:00
|
|
|
/* For use with `write_locked_index()`. */
|
2014-06-13 14:19:23 +02:00
|
|
|
#define COMMIT_LOCK (1 << 0)
|
2018-03-01 21:40:20 +01:00
|
|
|
#define SKIP_IF_UNCHANGED (1 << 1)
|
2017-10-05 22:32:11 +02:00
|
|
|
|
|
|
|
/*
|
read-cache: drop explicit `CLOSE_LOCK`-flag
`write_locked_index()` takes two flags: `COMMIT_LOCK` and `CLOSE_LOCK`.
At most one is allowed. But it is also possible to use no flag, i.e.,
`0`. But when `write_locked_index()` calls `do_write_index()`, the
temporary file, a.k.a. the lockfile, will be closed. So passing `0` is
effectively the same as `CLOSE_LOCK`, which seems like a bug.
We might feel tempted to restructure the code in order to close the file
later, or conditionally. It also feels a bit unfortunate that we simply
"happen" to close the lock by way of an implementation detail of
lockfiles. But note that we need to close the temporary file before
`stat`-ing it, at least on Windows. See 9f41c7a6b (read-cache: close
index.lock in do_write_index, 2017-04-26).
Drop `CLOSE_LOCK` and make it explicit that `write_locked_index()`
always closes the lock. Whether it is also committed is governed by the
remaining flag, `COMMIT_LOCK`.
This means we neither have nor suggest that we have a mode to write the
index and leave the file open. Whatever extra contents we might
eventually want to write, we should probably write it from within
`write_locked_index()` itself anyway.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:12:12 +02:00
|
|
|
* Write the index while holding an already-taken lock. Close the lock,
|
|
|
|
* and if `COMMIT_LOCK` is given, commit it.
|
2017-10-05 22:32:11 +02:00
|
|
|
*
|
|
|
|
* Unless a split index is in use, write the index into the lockfile.
|
|
|
|
*
|
|
|
|
* With a split index, write the shared index to a temporary file,
|
|
|
|
* adjust its permissions and rename it into place, then write the
|
|
|
|
* split index to the lockfile. If the temporary file for the shared
|
|
|
|
* index cannot be created, fall back to the behavior described in
|
|
|
|
* the previous paragraph.
|
read-cache: leave lock in right state in `write_locked_index()`
If the original version of `write_locked_index()` returned with an
error, it didn't roll back the lockfile unless the error occured at the
very end, during closing/committing. See commit 03b866477 (read-cache:
new API write_locked_index instead of write_index/write_cache,
2014-06-13).
In commit 9f41c7a6b (read-cache: close index.lock in do_write_index,
2017-04-26), we learned to close the lock slightly earlier in the
callstack. That was mostly a side-effect of lockfiles being implemented
using temporary files, but didn't cause any real harm.
Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05) introduced a subtle bug. If the temporary file is deleted
(i.e., the lockfile is rolled back), the tempfile-pointer in the `struct
lock_file` will be left dangling. Thus, an attempt to reuse the
lockfile, or even just to roll it back, will induce undefined behavior
-- most likely a crash.
Besides not crashing, we clearly want to make things consistent. The
guarantees which the lockfile-machinery itself provides is A) if we ask
to commit and it fails, roll back, and B) if we ask to close and it
fails, do _not_ roll back. Let's do the same for consistency.
Do not delete the temporary file in `do_write_index()`. One of its
callers, `write_locked_index()` will thereby avoid rolling back the
lock. The other caller, `write_shared_index()`, will delete its
temporary file anyway. Both of these callers will avoid undefined
behavior (crashing).
Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock
before returning. If we have already succeeded and committed, it will be
a noop. Simplify the existing callers where we now have a superfluous
call to `rollback_lockfile()`. That should keep future readers from
wondering why the callers are inconsistent.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:12:13 +02:00
|
|
|
*
|
|
|
|
* With `COMMIT_LOCK`, the lock is always committed or rolled back.
|
|
|
|
* Without it, the lock is closed, but neither committed nor rolled
|
|
|
|
* back.
|
2018-03-01 21:40:20 +01:00
|
|
|
*
|
|
|
|
* If `SKIP_IF_UNCHANGED` is given and the index is unchanged, nothing
|
|
|
|
* is written (and the lock is rolled back if `COMMIT_LOCK` is given).
|
2017-10-05 22:32:11 +02:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
|
2017-10-05 22:32:11 +02:00
|
|
|
|
2022-11-19 14:07:31 +01:00
|
|
|
void discard_index(struct index_state *);
|
2019-04-29 10:28:14 +02:00
|
|
|
void move_index_extensions(struct index_state *dst, struct index_state *src);
|
|
|
|
int unmerged_index(const struct index_state *);
|
2017-12-21 20:19:06 +01:00
|
|
|
|
|
|
|
/**
|
2018-07-01 03:25:00 +02:00
|
|
|
* Returns 1 if istate differs from tree, 0 otherwise. If tree is NULL,
|
|
|
|
* compares istate to HEAD. If tree is NULL and on an unborn branch,
|
|
|
|
* returns 1 if there are entries in istate, 0 otherwise. If an strbuf is
|
|
|
|
* provided, the space-separated list of files that differ will be appended
|
|
|
|
* to it.
|
2017-12-21 20:19:06 +01:00
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int repo_index_has_changes(struct repository *repo,
|
2019-04-29 10:28:23 +02:00
|
|
|
struct tree *tree,
|
|
|
|
struct strbuf *sb);
|
2017-12-21 20:19:06 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int verify_path(const char *path, unsigned mode);
|
|
|
|
int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
|
|
|
|
int index_dir_exists(struct index_state *istate, const char *name, int namelen);
|
|
|
|
void adjust_dirname_case(struct index_state *istate, char *name);
|
|
|
|
struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
|
2017-01-19 04:18:51 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Searches for an entry defined by name and namelen in the given index.
|
|
|
|
* If the return value is positive (including 0) it is the position of an
|
|
|
|
* exact match. If the return value is negative, the negated value minus 1
|
|
|
|
* is the position where the entry would be inserted.
|
|
|
|
* Example: The current index consists of these files and its stages:
|
|
|
|
*
|
|
|
|
* b#0, d#0, f#1, f#3
|
|
|
|
*
|
|
|
|
* index_name_pos(&index, "a", 1) -> -1
|
|
|
|
* index_name_pos(&index, "b", 1) -> 0
|
|
|
|
* index_name_pos(&index, "c", 1) -> -2
|
|
|
|
* index_name_pos(&index, "d", 1) -> 1
|
|
|
|
* index_name_pos(&index, "e", 1) -> -3
|
|
|
|
* index_name_pos(&index, "f", 1) -> -3
|
|
|
|
* index_name_pos(&index, "g", 1) -> -5
|
|
|
|
*/
|
2021-04-01 03:49:39 +02:00
|
|
|
int index_name_pos(struct index_state *, const char *name, int namelen);
|
2017-01-19 04:18:51 +01:00
|
|
|
|
2022-08-08 21:07:51 +02:00
|
|
|
/*
|
|
|
|
* Like index_name_pos, returns the position of an entry of the given name in
|
|
|
|
* the index if one exists, otherwise returns a negative value where the negated
|
|
|
|
* value minus 1 is the position where the index entry would be inserted. Unlike
|
|
|
|
* index_name_pos, however, a sparse index is not expanded to find an entry
|
|
|
|
* inside a sparse directory.
|
|
|
|
*/
|
|
|
|
int index_name_pos_sparse(struct index_state *, const char *name, int namelen);
|
|
|
|
|
2021-11-29 16:52:41 +01:00
|
|
|
/*
|
|
|
|
* Determines whether an entry with the given name exists within the
|
|
|
|
* given index. The return value is 1 if an exact match is found, otherwise
|
|
|
|
* it is 0. Note that, unlike index_name_pos, this function does not expand
|
|
|
|
* the index if it is sparse. If an item exists within the full index but it
|
|
|
|
* is contained within a sparse directory (and not in the sparse index), 0 is
|
|
|
|
* returned.
|
|
|
|
*/
|
|
|
|
int index_entry_exists(struct index_state *, const char *name, int namelen);
|
|
|
|
|
msvc: avoid using minus operator on unsigned types
MSVC complains about this with `-Wall`, which can be taken as a sign
that this is indeed a real bug. The symptom is:
C4146: unary minus operator applied to unsigned type, result
still unsigned
Let's avoid this warning in the minimal way, e.g. writing `-1 -
<unsigned value>` instead of `-<unsigned value> - 1`.
Note that the change in the `estimate_cache_size()` function is
needed because MSVC considers the "return type" of the `sizeof()`
operator to be `size_t`, i.e. unsigned, and therefore it cannot be
negated using the unary minus operator.
Even worse, that arithmetic is doing extra work, in vain. We want to
calculate the entry extra cache size as the difference between the
size of the `cache_entry` structure minus the size of the
`ondisk_cache_entry` structure, padded to the appropriate alignment
boundary.
To that end, we start by assigning that difference to the `per_entry`
variable, and then abuse the `len` parameter of the
`align_padding_size()` macro to take the negative size of the ondisk
entry size. Essentially, we try to avoid passing the already calculated
difference to that macro by passing the operands of that difference
instead, when the macro expects operands of an addition:
#define align_padding_size(size, len) \
((size + (len) + 8) & ~7) - (size + len)
Currently, we pass A and -B to that macro instead of passing A - B and
0, where A - B is already stored in the `per_entry` variable, ready to
be used.
This is neither necessary, nor intuitive. Let's fix this, and have code
that is both easier to read and that also does not trigger MSVC's
warning.
While at it, we take care of reporting overflows (which are unlikely,
but hey, defensive programming is good!).
We _also_ take pains of casting the unsigned value to signed: otherwise,
the signed operand (i.e. the `-1`) would be cast to unsigned before
doing the arithmetic.
Helped-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-04 17:09:26 +02:00
|
|
|
/*
|
|
|
|
* Some functions return the negative complement of an insert position when a
|
|
|
|
* precise match was not found but a position was found where the entry would
|
|
|
|
* need to be inserted. This helper protects that logic from any integer
|
|
|
|
* underflow.
|
|
|
|
*/
|
|
|
|
static inline int index_pos_to_insert_pos(uintmax_t pos)
|
|
|
|
{
|
|
|
|
if (pos > INT_MAX)
|
|
|
|
die("overflow: -1 - %"PRIuMAX, pos);
|
|
|
|
return -1 - (int)pos;
|
|
|
|
}
|
|
|
|
|
2005-05-08 06:55:21 +02:00
|
|
|
#define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */
|
|
|
|
#define ADD_CACHE_OK_TO_REPLACE 2 /* Ok to replace file/directory */
|
2005-06-25 11:25:29 +02:00
|
|
|
#define ADD_CACHE_SKIP_DFCHECK 4 /* Ok to skip DF conflict checks */
|
2021-03-20 23:37:46 +01:00
|
|
|
#define ADD_CACHE_JUST_APPEND 8 /* Append only */
|
2008-08-21 10:44:53 +02:00
|
|
|
#define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */
|
2014-06-13 14:19:42 +02:00
|
|
|
#define ADD_CACHE_KEEP_CACHE_TREE 32 /* Do not invalidate cache-tree */
|
2019-01-17 17:27:11 +01:00
|
|
|
#define ADD_CACHE_RENORMALIZE 64 /* Pass along HASH_RENORMALIZE */
|
2019-04-29 10:28:14 +02:00
|
|
|
int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
|
|
|
|
void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
|
2017-01-19 04:18:52 +01:00
|
|
|
|
|
|
|
/* Remove entry, return true if there are more entries to go. */
|
2019-04-29 10:28:14 +02:00
|
|
|
int remove_index_entry_at(struct index_state *, int pos);
|
2017-01-19 04:18:52 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void remove_marked_cache_entries(struct index_state *istate, int invalidate);
|
|
|
|
int remove_file_from_index(struct index_state *, const char *path);
|
2008-05-21 21:04:34 +02:00
|
|
|
#define ADD_CACHE_VERBOSE 1
|
|
|
|
#define ADD_CACHE_PRETEND 2
|
2008-05-25 23:03:50 +02:00
|
|
|
#define ADD_CACHE_IGNORE_ERRORS 4
|
2008-07-21 10:24:17 +02:00
|
|
|
#define ADD_CACHE_IGNORE_REMOVAL 8
|
2008-08-21 10:44:53 +02:00
|
|
|
#define ADD_CACHE_INTENT 16
|
2017-01-19 04:18:53 +01:00
|
|
|
/*
|
|
|
|
* These two are used to add the contents of the file at path
|
|
|
|
* to the index, marking the working tree up-to-date by storing
|
|
|
|
* the cached stat info in the resulting cache entry. A caller
|
|
|
|
* that has already run lstat(2) on the path can call
|
|
|
|
* add_to_index(), and all others can call add_file_to_index();
|
|
|
|
* the latter will do necessary lstat(2) internally before
|
|
|
|
* calling the former.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int add_to_index(struct index_state *, const char *path, struct stat *, int flags);
|
|
|
|
int add_file_to_index(struct index_state *, const char *path, int flags);
|
2017-01-19 04:18:53 +01:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
|
|
|
|
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
|
|
|
|
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
|
2021-04-01 03:49:39 +02:00
|
|
|
int index_name_is_other(struct index_state *, const char *, int);
|
|
|
|
void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *);
|
2007-11-10 09:15:03 +01:00
|
|
|
|
|
|
|
/* do stat comparison even if CE_VALID is true */
|
|
|
|
#define CE_MATCH_IGNORE_VALID 01
|
|
|
|
/* do not check the contents but report dirty on racily-clean entries */
|
2009-12-14 12:43:58 +01:00
|
|
|
#define CE_MATCH_RACY_IS_DIRTY 02
|
|
|
|
/* do stat comparison even if CE_SKIP_WORKTREE is true */
|
|
|
|
#define CE_MATCH_IGNORE_SKIP_WORKTREE 04
|
2014-01-27 15:45:07 +01:00
|
|
|
/* ignore non-existent files during stat update */
|
|
|
|
#define CE_MATCH_IGNORE_MISSING 0x08
|
2014-01-27 15:45:08 +01:00
|
|
|
/* enable stat refresh */
|
|
|
|
#define CE_MATCH_REFRESH 0x10
|
2017-09-22 18:35:40 +02:00
|
|
|
/* don't refresh_fsmonitor state or do stat comparison even if CE_FSMONITOR_VALID is true */
|
|
|
|
#define CE_MATCH_IGNORE_FSMONITOR 0X20
|
2019-04-29 10:28:14 +02:00
|
|
|
int is_racy_timestamp(const struct index_state *istate,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct cache_entry *ce);
|
2022-01-07 12:17:31 +01:00
|
|
|
int has_racy_timestamp(struct index_state *istate);
|
2019-04-29 10:28:14 +02:00
|
|
|
int ie_match_stat(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
|
|
|
|
int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
|
2007-11-10 09:15:03 +01:00
|
|
|
|
2013-06-20 10:37:50 +02:00
|
|
|
/*
|
|
|
|
* Record to sd the data from st that we use to check whether a file
|
|
|
|
* might have changed.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
void fill_stat_data(struct stat_data *sd, struct stat *st);
|
2013-06-20 10:37:50 +02:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Return 0 if st is consistent with a file not having been changed
|
|
|
|
* since sd was filled. If there are differences, return a
|
|
|
|
* combination of MTIME_CHANGED, CTIME_CHANGED, OWNER_CHANGED,
|
|
|
|
* INODE_CHANGED, and DATA_CHANGED.
|
|
|
|
*/
|
2019-04-29 10:28:14 +02:00
|
|
|
int match_stat_data(const struct stat_data *sd, struct stat *st);
|
|
|
|
int match_stat_data_racy(const struct index_state *istate,
|
2019-04-29 10:28:23 +02:00
|
|
|
const struct stat_data *sd, struct stat *st);
|
2013-06-20 10:37:50 +02:00
|
|
|
|
2019-05-24 14:23:47 +02:00
|
|
|
void fill_stat_cache_info(struct index_state *istate, struct cache_entry *ce, struct stat *st);
|
2005-05-15 23:23:12 +02:00
|
|
|
|
2021-04-08 22:41:26 +02:00
|
|
|
#define REFRESH_REALLY (1 << 0) /* ignore_valid */
|
|
|
|
#define REFRESH_UNMERGED (1 << 1) /* allow unmerged */
|
|
|
|
#define REFRESH_QUIET (1 << 2) /* be quiet about it */
|
|
|
|
#define REFRESH_IGNORE_MISSING (1 << 3) /* ignore non-existent */
|
|
|
|
#define REFRESH_IGNORE_SUBMODULES (1 << 4) /* ignore submodules */
|
|
|
|
#define REFRESH_IN_PORCELAIN (1 << 5) /* user friendly output, not "needs update" */
|
|
|
|
#define REFRESH_PROGRESS (1 << 6) /* show progress bar if stderr is tty */
|
|
|
|
#define REFRESH_IGNORE_SKIP_WORKTREE (1 << 7) /* ignore skip_worktree entries */
|
2019-04-29 10:28:14 +02:00
|
|
|
int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg);
|
2019-09-11 20:20:25 +02:00
|
|
|
/*
|
|
|
|
* Refresh the index and write it to disk.
|
|
|
|
*
|
|
|
|
* 'refresh_flags' is passed directly to 'refresh_index()', while
|
|
|
|
* 'COMMIT_LOCK | write_flags' is passed to 'write_locked_index()', so
|
|
|
|
* the lockfile is always either committed or rolled back.
|
|
|
|
*
|
|
|
|
* If 'gentle' is passed, errors locking the index are ignored.
|
|
|
|
*
|
|
|
|
* Return 1 if refreshing the index returns an error, -1 if writing
|
|
|
|
* the index to disk fails, 0 on success.
|
|
|
|
*
|
|
|
|
* Note that if refreshing the index returns an error, we still write
|
|
|
|
* out the index (unless locking fails).
|
|
|
|
*/
|
|
|
|
int repo_refresh_and_write_index(struct repository*, unsigned int refresh_flags, unsigned int write_flags, int gentle, const struct pathspec *, char *seen, const char *header_msg);
|
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
struct cache_entry *refresh_cache_entry(struct index_state *, struct cache_entry *, unsigned int);
|
2006-05-19 18:56:35 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
void set_alternate_index_output(const char *);
|
2014-10-01 12:28:42 +02:00
|
|
|
|
2017-04-14 22:32:21 +02:00
|
|
|
extern int verify_index_checksum;
|
2017-10-18 16:27:25 +02:00
|
|
|
extern int verify_ce_order;
|
2017-04-14 22:32:21 +02:00
|
|
|
|
2005-04-09 18:48:20 +02:00
|
|
|
#define MTIME_CHANGED 0x0001
|
|
|
|
#define CTIME_CHANGED 0x0002
|
|
|
|
#define OWNER_CHANGED 0x0004
|
|
|
|
#define MODE_CHANGED 0x0008
|
|
|
|
#define INODE_CHANGED 0x0010
|
|
|
|
#define DATA_CHANGED 0x0020
|
2005-05-05 14:38:25 +02:00
|
|
|
#define TYPE_CHANGED 0x0040
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2023-02-05 11:36:28 +01:00
|
|
|
int base_name_compare(const char *name1, size_t len1, int mode1,
|
|
|
|
const char *name2, size_t len2, int mode2);
|
|
|
|
int df_name_compare(const char *name1, size_t len1, int mode1,
|
|
|
|
const char *name2, size_t len2, int mode2);
|
2019-04-29 10:28:14 +02:00
|
|
|
int name_compare(const char *name1, size_t len1, const char *name2, size_t len2);
|
|
|
|
int cache_name_stage_compare(const char *name1, int len1, int stage1, const char *name2, int len2, int stage2);
|
2005-04-08 00:13:13 +02:00
|
|
|
|
2009-07-09 22:35:31 +02:00
|
|
|
struct cache_def {
|
2014-07-05 00:41:46 +02:00
|
|
|
struct strbuf path;
|
2009-07-09 22:35:31 +02:00
|
|
|
int flags;
|
|
|
|
int track_flags;
|
|
|
|
int prefix_len_stat_func;
|
|
|
|
};
|
2021-09-27 14:54:27 +02:00
|
|
|
#define CACHE_DEF_INIT { \
|
|
|
|
.path = STRBUF_INIT, \
|
|
|
|
}
|
2014-07-12 01:02:34 +02:00
|
|
|
static inline void cache_def_clear(struct cache_def *cache)
|
2014-07-05 00:41:46 +02:00
|
|
|
{
|
|
|
|
strbuf_release(&cache->path);
|
|
|
|
}
|
2009-07-09 22:35:31 +02:00
|
|
|
|
2019-04-29 10:28:14 +02:00
|
|
|
int has_symlink_leading_path(const char *name, int len);
|
|
|
|
int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
|
checkout: don't follow symlinks when removing entries
At 1d718a5108 ("do not overwrite untracked symlinks", 2011-02-20),
symlink.c:check_leading_path() started returning different codes for
FL_ENOENT and FL_SYMLINK. But one of its callers, unlink_entry(), was
not adjusted for this change, so it started to follow symlinks on the
leading path of to-be-removed entries. Fix that and add a regression
test.
Note that since 1d718a5108 check_leading_path() no longer differentiates
the case where it found a symlink in the path's leading components from
the cases where it found a regular file or failed to lstat() the
component. So, a side effect of this current patch is that
unlink_entry() now returns early in all of these three cases. And
because we no longer try to unlink such paths, we also don't get the
warning from remove_or_warn().
For the regular file and symlink cases, it's questionable whether the
warning was useful in the first place: unlink_entry() removes tracked
paths that should no longer be present in the state we are checking out
to. If the path had its leading dir replaced by another file, it means
that the basename already doesn't exist, so there is no need for a
warning. Sure, we are leaving a regular file or symlink behind at the
path's dirname, but this file is either untracked now (so again, no
need to warn), or it will be replaced by a tracked file during the next
phase of this checkout operation.
As for failing to lstat() one of the leading components, the basename
might still exist only we cannot unlink it (e.g. due to the lack of the
required permissions). Since the user expect it to be removed
(especially with checkout's --no-overlay option), add back the warning
in this more relevant case.
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-18 19:43:47 +01:00
|
|
|
int check_leading_path(const char *name, int len, int warn_on_lstat_err);
|
2019-04-29 10:28:14 +02:00
|
|
|
int has_dirs_only_path(const char *name, int len, int prefix_len);
|
2021-02-12 15:49:41 +01:00
|
|
|
void invalidate_lstat_cache(void);
|
2019-04-29 10:28:14 +02:00
|
|
|
void schedule_dir_for_removal(const char *name, int len);
|
|
|
|
void remove_scheduled_dirs(void);
|
2005-06-06 06:59:54 +02:00
|
|
|
|
2006-12-23 08:33:44 +01:00
|
|
|
struct pack_window {
|
|
|
|
struct pack_window *next;
|
|
|
|
unsigned char *base;
|
|
|
|
off_t offset;
|
|
|
|
size_t len;
|
|
|
|
unsigned int last_used;
|
|
|
|
unsigned int inuse_cnt;
|
|
|
|
};
|
|
|
|
|
2005-07-01 02:15:39 +02:00
|
|
|
struct pack_entry {
|
2007-03-07 02:44:30 +01:00
|
|
|
off_t offset;
|
2005-07-01 02:15:39 +02:00
|
|
|
struct packed_git *p;
|
|
|
|
};
|
|
|
|
|
[PATCH] Add update-server-info.
The git-update-server-info command prepares informational files
to help clients discover the contents of a repository, and pull
from it via a dumb transport protocols. Currently, the
following files are produced.
- The $repo/info/refs file lists the name of heads and tags
available in the $repo/refs/ directory, along with their
SHA1. This can be used by git-ls-remote command running on
the client side.
- The $repo/info/rev-cache file describes the commit ancestry
reachable from references in the $repo/refs/ directory. This
file is in an append-only binary format to make the server
side friendly to rsync mirroring scheme, and can be read by
git-show-rev-cache command.
- The $repo/objects/info/pack file lists the name of the packs
available, the interdependencies among them, and the head
commits and tags contained in them. Along with the other two
files, this is designed to help clients to make smart pull
decisions.
The git-receive-pack command is changed to invoke it at the end,
so just after a push to a public repository finishes via "git
push", the server info is automatically updated.
In addition, building of the rev-cache file can be done by a
standalone git-build-rev-cache command separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-24 02:54:41 +02:00
|
|
|
/* Dumb servers support */
|
2019-04-29 10:28:14 +02:00
|
|
|
int update_server_info(int);
|
[PATCH] Add update-server-info.
The git-update-server-info command prepares informational files
to help clients discover the contents of a repository, and pull
from it via a dumb transport protocols. Currently, the
following files are produced.
- The $repo/info/refs file lists the name of heads and tags
available in the $repo/refs/ directory, along with their
SHA1. This can be used by git-ls-remote command running on
the client side.
- The $repo/info/rev-cache file describes the commit ancestry
reachable from references in the $repo/refs/ directory. This
file is in an append-only binary format to make the server
side friendly to rsync mirroring scheme, and can be read by
git-show-rev-cache command.
- The $repo/objects/info/pack file lists the name of the packs
available, the interdependencies among them, and the head
commits and tags contained in them. Along with the other two
files, this is designed to help clients to make smart pull
decisions.
The git-receive-pack command is changed to invoke it at the end,
so just after a push to a public repository finishes via "git
push", the server info is automatically updated.
In addition, building of the rev-cache file can be done by a
standalone git-build-rev-cache command separately.
Signed-off-by: Junio C Hamano <junkio@cox.net>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-07-24 02:54:41 +02:00
|
|
|
|
2015-05-19 19:55:16 +02:00
|
|
|
#define COPY_READ_ERROR (-2)
|
|
|
|
#define COPY_WRITE_ERROR (-3)
|
2019-04-29 10:28:14 +02:00
|
|
|
int copy_fd(int ifd, int ofd);
|
|
|
|
int copy_file(const char *dst, const char *src, int mode);
|
|
|
|
int copy_file_with_time(const char *dst, const char *src, int mode);
|
2015-05-19 19:55:16 +02:00
|
|
|
|
binary patch.
This adds "binary patch" to the diff output and teaches apply
what to do with them.
On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line. This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case. However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.
This patch allows the diff to generate "binary" patch when
operating under --full-index option. The binary patch follows
the usual extended git diff headers, and looks like this:
"GIT binary patch\n"
<length byte><data>"\n"
...
"\n"
Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding. Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters. The payload is the
same diff-delta as we use in the packfiles.
On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
|
|
|
/* base85 */
|
2007-04-10 00:56:33 +02:00
|
|
|
int decode_85(char *dst, const char *line, int linelen);
|
|
|
|
void encode_85(char *buf, const unsigned char *data, int bytes);
|
binary patch.
This adds "binary patch" to the diff output and teaches apply
what to do with them.
On the diff generation side, traditionally, we said "Binary
files differ\n" without giving anything other than the preimage
and postimage object name on the index line. This was good
enough for applying a patch generated from your own repository
(very useful while rebasing), because the postimage would be
available in such a case. However, this was not useful when the
recipient of such a patch via e-mail were to apply it, even if
the preimage was available.
This patch allows the diff to generate "binary" patch when
operating under --full-index option. The binary patch follows
the usual extended git diff headers, and looks like this:
"GIT binary patch\n"
<length byte><data>"\n"
...
"\n"
Each line is prefixed with a "length-byte", whose value is upper
or lowercase alphabet that encodes number of bytes that the data
on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ...,
'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of
5-byte sequence, each of which encodes up to 4 bytes in base85
encoding. Because 52 / 4 * 5 = 65 and we have the length byte,
an output line is capped to 66 characters. The payload is the
same diff-delta as we use in the packfiles.
On the consumption side, git-apply now can decode and apply the
binary patch when --allow-binary-replacement is given, the diff
was generated with --full-index, and the receiving repository
has the preimage blob, which is the same condition as it always
required when accepting an "Binary files differ\n" patch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
|
|
|
|
2014-06-11 09:56:49 +02:00
|
|
|
/* pkt-line.c */
|
2011-02-24 15:30:19 +01:00
|
|
|
void packet_trace_identity(const char *prog);
|
2006-09-02 18:23:48 +02:00
|
|
|
|
2007-11-18 10:12:04 +01:00
|
|
|
/* add */
|
2008-05-12 19:58:10 +02:00
|
|
|
/*
|
|
|
|
* return 0 if success, 1 - if addition of a file failed and
|
|
|
|
* ADD_FILES_IGNORE_ERRORS was specified in flags
|
|
|
|
*/
|
2016-09-14 23:07:47 +02:00
|
|
|
int add_files_to_cache(const char *prefix, const struct pathspec *pathspec, int flags);
|
2007-11-18 10:12:04 +01:00
|
|
|
|
2007-08-31 22:13:42 +02:00
|
|
|
/* diff.c */
|
|
|
|
extern int diff_auto_refresh_index;
|
|
|
|
|
2007-02-16 01:32:45 +01:00
|
|
|
/* match-trees.c */
|
2019-06-27 11:28:51 +02:00
|
|
|
void shift_tree(struct repository *, const struct object_id *, const struct object_id *, struct object_id *, int);
|
|
|
|
void shift_tree_by(struct repository *, const struct object_id *, const struct object_id *, struct object_id *, const char *);
|
2007-02-16 01:32:45 +01:00
|
|
|
|
2007-11-02 08:24:27 +01:00
|
|
|
/*
|
|
|
|
* whitespace rules.
|
|
|
|
* used by both diff and apply
|
2010-11-30 09:29:11 +01:00
|
|
|
* last two digits are tab width
|
2007-11-02 08:24:27 +01:00
|
|
|
*/
|
2010-11-30 09:29:11 +01:00
|
|
|
#define WS_BLANK_AT_EOL 0100
|
|
|
|
#define WS_SPACE_BEFORE_TAB 0200
|
|
|
|
#define WS_INDENT_WITH_NON_TAB 0400
|
|
|
|
#define WS_CR_AT_EOL 01000
|
|
|
|
#define WS_BLANK_AT_EOF 02000
|
|
|
|
#define WS_TAB_IN_INDENT 04000
|
2009-09-06 07:21:17 +02:00
|
|
|
#define WS_TRAILING_SPACE (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
|
2010-11-30 09:29:11 +01:00
|
|
|
#define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
|
|
|
|
#define WS_TAB_WIDTH_MASK 077
|
2017-06-30 02:06:53 +02:00
|
|
|
/* All WS_* -- when extended, adapt diff.c emit_symbol */
|
|
|
|
#define WS_RULE_MASK 07777
|
2007-12-06 09:14:14 +01:00
|
|
|
extern unsigned whitespace_rule_cfg;
|
2019-04-29 10:28:14 +02:00
|
|
|
unsigned whitespace_rule(struct index_state *, const char *);
|
|
|
|
unsigned parse_whitespace_rule(const char *);
|
|
|
|
unsigned ws_check(const char *line, int len, unsigned ws_rule);
|
|
|
|
void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE *stream, const char *set, const char *reset, const char *ws);
|
|
|
|
char *whitespace_error_string(unsigned ws);
|
|
|
|
void ws_fix_copy(struct strbuf *, const char *, int, unsigned, int *);
|
2022-12-13 12:12:58 +01:00
|
|
|
int ws_blank_line(const char *line, int len);
|
2010-11-30 09:29:11 +01:00
|
|
|
#define ws_tab_width(rule) ((rule) & WS_TAB_WIDTH_MASK)
|
2007-11-02 08:24:27 +01:00
|
|
|
|
2007-11-18 10:13:32 +01:00
|
|
|
/* ls-files */
|
2017-06-13 00:13:58 +02:00
|
|
|
void overlay_tree_on_index(struct index_state *istate,
|
|
|
|
const char *tree_name, const char *prefix);
|
2007-11-18 10:13:32 +01:00
|
|
|
|
2012-10-26 17:53:49 +02:00
|
|
|
/* merge.c */
|
|
|
|
struct commit_list;
|
2018-09-21 17:57:29 +02:00
|
|
|
int try_merge_command(struct repository *r,
|
|
|
|
const char *strategy, size_t xopts_nr,
|
2012-10-26 17:53:49 +02:00
|
|
|
const char **xopts, struct commit_list *common,
|
|
|
|
const char *head_arg, struct commit_list *remotes);
|
2018-09-21 17:57:29 +02:00
|
|
|
int checkout_fast_forward(struct repository *r,
|
|
|
|
const struct object_id *from,
|
2017-05-07 00:10:33 +02:00
|
|
|
const struct object_id *to,
|
2012-10-26 17:53:49 +02:00
|
|
|
int overwrite_ignore);
|
|
|
|
|
2010-03-06 21:34:41 +01:00
|
|
|
|
2012-03-30 09:52:18 +02:00
|
|
|
int sane_execvp(const char *file, char *const argv[]);
|
|
|
|
|
2013-06-20 10:37:51 +02:00
|
|
|
/*
|
|
|
|
* A struct to encapsulate the concept of whether a file has changed
|
|
|
|
* since we last checked it. This uses criteria similar to those used
|
|
|
|
* for the index.
|
|
|
|
*/
|
|
|
|
struct stat_validity {
|
|
|
|
struct stat_data *sd;
|
|
|
|
};
|
|
|
|
|
|
|
|
void stat_validity_clear(struct stat_validity *sv);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Returns 1 if the path is a regular file (or a symlink to a regular
|
|
|
|
* file) and matches the saved stat_validity, 0 otherwise. A missing
|
|
|
|
* or inaccessible file is considered a match if the struct was just
|
|
|
|
* initialized, or if the previous update found an inaccessible file.
|
|
|
|
*/
|
|
|
|
int stat_validity_check(struct stat_validity *sv, const char *path);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Update the stat_validity from a file opened at descriptor fd. If
|
|
|
|
* the file is missing, inaccessible, or not a regular file, then
|
|
|
|
* future calls to stat_validity_check will match iff one of those
|
|
|
|
* conditions continues to be true.
|
|
|
|
*/
|
|
|
|
void stat_validity_update(struct stat_validity *sv, int fd);
|
|
|
|
|
2014-02-27 13:56:52 +01:00
|
|
|
int versioncmp(const char *s1, const char *s2);
|
|
|
|
|
2005-04-08 00:13:13 +02:00
|
|
|
#endif /* CACHE_H */
|