git-commit-vandalism/cache.h

1890 lines
65 KiB
C
Raw Normal View History

#ifndef CACHE_H
#define CACHE_H
#include "git-compat-util.h"
#include "strbuf.h"
#include "hashmap.h"
#include "list.h"
#include "advice.h"
#include "gettext.h"
#include "convert.h"
#include "trace.h"
#include "trace2.h"
#include "string-list.h"
pack-revindex: drop hash table The main entry point to the pack-revindex code is find_pack_revindex(). This calls revindex_for_pack(), which lazily computes and caches the revindex for the pack. We store the cache in a very simple hash table. It's created by init_pack_revindex(), which inserts an entry for every packfile we know about, and we never grow or shrink the hash. If we ever need the revindex for a pack that isn't in the hash, we die() with an internal error. This can lead to a race, because we may load more packs after having called init_pack_revindex(). For example, imagine we have one process which needs to look at the revindex for a variety of objects (e.g., cat-file's "%(objectsize:disk)" format). Simultaneously, git-gc is running, which is doing a `git repack -ad`. We might hit a sequence like: 1. We need the revidx for some packed object. We call find_pack_revindex() and end up in init_pack_revindex() to create the hash table for all packs we know about. 2. We look up another object and can't find it, because the repack has removed the pack it's in. We re-scan the pack directory and find a new pack containing the object. It gets added to our packed_git list. 3. We call find_pack_revindex() for the new object, which hits revindex_for_pack() for our new pack. It can't find the packed_git in the revindex hash, and dies. You could also replace the `repack` above with a push or fetch to create a new pack, though these are less likely (you would have to somehow learn about the new objects to look them up). Prior to 1a6d8b9 (do not discard revindex when re-preparing packfiles, 2014-01-15), this was safe, as we threw away the revindex whenever we re-scanned the pack directory (and thus re-created the revindex hash on the fly). However, we don't want to simply revert that commit, as it was solving a different race. So we have a few options: - We can fix the race in 1a6d8b9 differently, by having the bitmap code look in the revindex hash instead of caching the pointer. But this would introduce a lot of extra hash lookups for common bitmap operations. - We could teach the revindex to dynamically add new packs to the hash table. This would perform the same, but would mean adding extra code to the revindex hash (which currently cannot be resized at all). - We can get rid of the hash table entirely. There is exactly one revindex per pack, so we can just store it in the packed_git struct. Since it's initialized lazily, it does not add to the startup cost. This is the best of both worlds: less code and fewer hash table lookups. The original code likely avoided this in the name of encapsulation. But the packed_git and reverse_index code are fairly intimate already, so it's not much of a loss. This patch implements the final option. It's a minimal conversion that retains the pack_revindex struct. No callers need to change, and we can do further cleanup in a follow-on patch. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-21 07:19:49 +01:00
#include "pack-revindex.h"
#include "hash.h"
#include "path.h"
#include "sha1-array.h"
#include "repository.h"
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
#include "mem-pool.h"
fix openssl headers conflicting with custom SHA1 implementations On ARM I have the following compilation errors: CC fast-import.o In file included from cache.h:8, from builtin.h:6, from fast-import.c:142: arm/sha1.h:14: error: conflicting types for 'SHA_CTX' /usr/include/openssl/sha.h:105: error: previous declaration of 'SHA_CTX' was here arm/sha1.h:16: error: conflicting types for 'SHA1_Init' /usr/include/openssl/sha.h:115: error: previous declaration of 'SHA1_Init' was here arm/sha1.h:17: error: conflicting types for 'SHA1_Update' /usr/include/openssl/sha.h:116: error: previous declaration of 'SHA1_Update' was here arm/sha1.h:18: error: conflicting types for 'SHA1_Final' /usr/include/openssl/sha.h:117: error: previous declaration of 'SHA1_Final' was here make: *** [fast-import.o] Error 1 This is because openssl header files are always included in git-compat-util.h since commit 684ec6c63c whenever NO_OPENSSL is not set, which somehow brings in <openssl/sha1.h> clashing with the custom ARM version. Compilation of git is probably broken on PPC too for the same reason. Turns out that the only file requiring openssl/ssl.h and openssl/err.h is imap-send.c. But only moving those problematic includes there doesn't solve the issue as it also includes cache.h which brings in the conflicting local SHA1 header file. As suggested by Jeff King, the best solution is to rename our references to SHA1 functions and structure to something git specific, and define those according to the implementation used. Signed-off-by: Nicolas Pitre <nico@cam.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2008-10-01 20:05:20 +02:00
#include <zlib.h>
2011-06-10 20:52:15 +02:00
typedef struct git_zstream {
z_stream z;
unsigned long avail_in;
unsigned long avail_out;
unsigned long total_in;
unsigned long total_out;
unsigned char *next_in;
unsigned char *next_out;
} git_zstream;
void git_inflate_init(git_zstream *);
void git_inflate_init_gzip_only(git_zstream *);
void git_inflate_end(git_zstream *);
int git_inflate(git_zstream *, int flush);
void git_deflate_init(git_zstream *, int level);
void git_deflate_init_gzip(git_zstream *, int level);
void git_deflate_init_raw(git_zstream *, int level);
2011-06-10 20:52:15 +02:00
void git_deflate_end(git_zstream *);
int git_deflate_abort(git_zstream *);
2011-06-10 20:52:15 +02:00
int git_deflate_end_gently(git_zstream *);
int git_deflate(git_zstream *, int flush);
unsigned long git_deflate_bound(git_zstream *, unsigned long);
/* The length in bytes and in hex digits of an object name (SHA-1 value). */
#define GIT_SHA1_RAWSZ 20
#define GIT_SHA1_HEXSZ (2 * GIT_SHA1_RAWSZ)
/* The block size of SHA-1. */
#define GIT_SHA1_BLKSZ 64
/* The length in bytes and in hex digits of an object name (SHA-256 value). */
#define GIT_SHA256_RAWSZ 32
#define GIT_SHA256_HEXSZ (2 * GIT_SHA256_RAWSZ)
/* The block size of SHA-256. */
#define GIT_SHA256_BLKSZ 64
/* The length in byte and in hex digits of the largest possible hash value. */
#define GIT_MAX_RAWSZ GIT_SHA256_RAWSZ
#define GIT_MAX_HEXSZ GIT_SHA256_HEXSZ
/* The largest possible block size for any supported hash. */
#define GIT_MAX_BLKSZ GIT_SHA256_BLKSZ
struct object_id {
unsigned char hash[GIT_MAX_RAWSZ];
};
#define the_hash_algo the_repository->hash_algo
#if defined(DT_UNKNOWN) && !defined(NO_D_TYPE_IN_DIRENT)
#define DTYPE(de) ((de)->d_type)
#else
#undef DT_UNKNOWN
#undef DT_DIR
#undef DT_REG
#undef DT_LNK
#define DT_UNKNOWN 0
#define DT_DIR 1
#define DT_REG 2
#define DT_LNK 3
#define DTYPE(de) DT_UNKNOWN
#endif
/* unknown mode (impossible combination S_IFIFO|S_IFCHR) */
#define S_IFINVALID 0030000
/*
* A "directory link" is a link to another git directory.
*
* The value 0160000 is not normally a valid mode, and
* also just happens to be S_IFDIR + S_IFLNK
*/
#define S_IFGITLINK 0160000
#define S_ISGITLINK(m) (((m) & S_IFMT) == S_IFGITLINK)
tree-diff: rework diff_tree() to generate diffs for multiparent cases as well Previously diff_tree(), which is now named ll_diff_tree_sha1(), was generating diff_filepair(s) for two trees t1 and t2, and that was usually used for a commit as t1=HEAD~, and t2=HEAD - i.e. to see changes a commit introduces. In Git, however, we have fundamentally built flexibility in that a commit can have many parents - 1 for a plain commit, 2 for a simple merge, but also more than 2 for merging several heads at once. For merges there is a so called combine-diff, which shows diff, a merge introduces by itself, omitting changes done by any parent. That works through first finding paths, that are different to all parents, and then showing generalized diff, with separate columns for +/- for each parent. The code lives in combine-diff.c . There is an impedance mismatch, however, in that a commit could generally have any number of parents, and that while diffing trees, we divide cases for 2-tree diffs and more-than-2-tree diffs. I mean there is no special casing for multiple parents commits in e.g. revision-walker . That impedance mismatch *hurts* *performance* *badly* for generating combined diffs - in "combine-diff: optimize combine_diff_path sets intersection" I've already removed some slowness from it, but from the timings provided there, it could be seen, that combined diffs still cost more than an order of magnitude more cpu time, compared to diff for usual commits, and that would only be an optimistic estimate, if we take into account that for e.g. linux.git there is only one merge for several dozens of plain commits. That slowness comes from the fact that currently, while generating combined diff, a lot of time is spent computing diff(commit,commit^2) just to only then intersect that huge diff to almost small set of files from diff(commit,commit^1). That's because at present, to compute combine-diff, for first finding paths, that "every parent touches", we use the following combine-diff property/definition: D(A,P1...Pn) = D(A,P1) ^ ... ^ D(A,Pn) (w.r.t. paths) where D(A,P1...Pn) is combined diff between commit A, and parents Pi and D(A,Pi) is usual two-tree diff Pi..A So if any of that D(A,Pi) is huge, tracting 1 n-parent combine-diff as n 1-parent diffs and intersecting results will be slow. And usually, for linux.git and other topic-based workflows, that D(A,P2) is huge, because, if merge-base of A and P2, is several dozens of merges (from A, via first parent) below, that D(A,P2) will be diffing sum of merges from several subsystems to 1 subsystem. The solution is to avoid computing n 1-parent diffs, and to find changed-to-all-parents paths via scanning A's and all Pi's trees simultaneously, at each step comparing their entries, and based on that comparison, populate paths result, and deduce we could *skip* *recursing* into subdirectories, if at least for 1 parent, sha1 of that dir tree is the same as in A. That would save us from doing significant amount of needless work. Such approach is very similar to what diff_tree() does, only there we deal with scanning only 2 trees simultaneously, and for n+1 tree, the logic is a bit more complex: D(T,P1...Pn) calculation scheme ------------------------------- D(T,P1...Pn) = D(T,P1) ^ ... ^ D(T,Pn) (regarding resulting paths set) D(T,Pj) - diff between T..Pj D(T,P1...Pn) - combined diff from T to parents P1,...,Pn We start from all trees, which are sorted, and compare their entries in lock-step: T P1 Pn - - - |t| |p1| |pn| |-| |--| ... |--| imin = argmin(p1...pn) | | | | | | |-| |--| |--| |.| |. | |. | . . . . . . at any time there could be 3 cases: 1) t < p[imin]; 2) t > p[imin]; 3) t = p[imin]. Schematic deduction of what every case means, and what to do, follows: 1) t < p[imin] -> ∀j t ∉ Pj -> "+t" ∈ D(T,Pj) -> D += "+t"; t↓ 2) t > p[imin] 2.1) ∃j: pj > p[imin] -> "-p[imin]" ∉ D(T,Pj) -> D += ø; ∀ pi=p[imin] pi↓ 2.2) ∀i pi = p[imin] -> pi ∉ T -> "-pi" ∈ D(T,Pi) -> D += "-p[imin]"; ∀i pi↓ 3) t = p[imin] 3.1) ∃j: pj > p[imin] -> "+t" ∈ D(T,Pj) -> only pi=p[imin] remains to investigate 3.2) pi = p[imin] -> investigate δ(t,pi) | | v 3.1+3.2) looking at δ(t,pi) ∀i: pi=p[imin] - if all != ø -> ⎧δ(t,pi) - if pi=p[imin] -> D += ⎨ ⎩"+t" - if pi>p[imin] in any case t↓ ∀ pi=p[imin] pi↓ ~ For comparison, here is how diff_tree() works: D(A,B) calculation scheme ------------------------- A B - - |a| |b| a < b -> a ∉ B -> D(A,B) += +a a↓ |-| |-| a > b -> b ∉ A -> D(A,B) += -b b↓ | | | | a = b -> investigate δ(a,b) a↓ b↓ |-| |-| |.| |.| . . . . ~~~~~~~~ This patch generalizes diff tree-walker to work with arbitrary number of parents as described above - i.e. now there is a resulting tree t, and some parents trees tp[i] i=[0..nparent). The generalization builds on the fact that usual diff D(A,B) is by definition the same as combined diff D(A,[B]), so if we could rework the code for common case and make it be not slower for nparent=1 case, usual diff(t1,t2) generation will not be slower, and multiparent diff tree-walker would greatly benefit generating combine-diff. What we do is as follows: 1) diff tree-walker ll_diff_tree_sha1() is internally reworked to be a paths generator (new name diff_tree_paths()), with each generated path being `struct combine_diff_path` with info for path, new sha1,mode and for every parent which sha1,mode it was in it. 2) From that info, we can still generate usual diff queue with struct diff_filepairs, via "exporting" generated combine_diff_path, if we know we run for nparent=1 case. (see emit_diff() which is now named emit_diff_first_parent_only()) 3) In order for diff_can_quit_early(), which checks DIFF_OPT_TST(opt, HAS_CHANGES)) to work, that exporting have to be happening not in bulk, but incrementally, one diff path at a time. For such consumers, there is a new callback in diff_options introduced: ->pathchange(opt, struct combine_diff_path *) which, if set to !NULL, is called for every generated path. (see new compat ll_diff_tree_sha1() wrapper around new paths generator for setup) 4) The paths generation itself, is reworked from previous ll_diff_tree_sha1() code according to "D(A,P1...Pn) calculation scheme" provided above: On the start we allocate [nparent] arrays in place what was earlier just for one parent tree. then we just generalize loops, and comparison according to the algorithm. Some notes(*): 1) alloca(), for small arrays, is used for "runs not slower for nparent=1 case than before" goal - if we change it to xmalloc()/free() the timings get ~1% worse. For alloca() we use just-introduced xalloca/xalloca_free compatibility wrappers, so it should not be a portability problem. 2) For every parent tree, we need to keep a tag, whether entry from that parent equals to entry from minimal parent. For performance reasons I'm keeping that tag in entry's mode field in unused bit - see S_IFXMIN_NEQ. Not doing so, we'd need to alloca another [nparent] array, which hurts performance. 3) For emitted paths, memory could be reused, if we know the path was processed via callback and will not be needed later. We use efficient hand-made realloc-style path_appendnew(), that saves us from ~1-1.5% of potential additional slowdown. 4) goto(s) are used in several places, as the code executes a little bit faster with lowered register pressure. Also - we should now check for FIND_COPIES_HARDER not only when two entries names are the same, and their hashes are equal, but also for a case, when a path was removed from some of all parents having it. The reason is, if we don't, that path won't be emitted at all (see "a > xi" case), and we'll just skip it, and FIND_COPIES_HARDER wants all paths - with diff or without - to be emitted, to be later analyzed for being copies sources. The new check is only necessary for nparent >1, as for nparent=1 case xmin_eqtotal always =1 =nparent, and a path is always added to diff as removal. ~~~~~~~~ Timings for # without -c, i.e. testing only nparent=1 case `git log --raw --no-abbrev --no-renames` before and after the patch are as follows: navy.git linux.git v3.10..v3.11 before 0.611s 1.889s after 0.619s 1.907s slowdown 1.3% 0.9% This timings show we did no harm to usual diff(tree1,tree2) generation. From the table we can see that we actually did ~1% slowdown, but I think I've "earned" that 1% in the previous patch ("tree-diff: reuse base str(buf) memory on sub-tree recursion", HEAD~~) so for nparent=1 case, net timings stays approximately the same. The output also stayed the same. (*) If we revert 1)-4) to more usual techniques, for nparent=1 case, we'll get ~2-2.5% of additional slowdown, which I've tried to avoid, as "do no harm for nparent=1 case" rule. For linux.git, combined diff will run an order of magnitude faster and appropriate timings will be provided in the next commit, as we'll be taking advantage of the new diff tree-walker for combined-diff generation there. P.S. and combined diff is not some exotic/for-play-only stuff - for example for a program I write to represent Git archives as readonly filesystem, there is initial scan with `git log --reverse --raw --no-abbrev --no-renames -c` to extract log of what was created/changed when, as a result building a map {} sha1 -> in which commit (and date) a content was added that `-c` means also show combined diff for merges, and without them, if a merge is non-trivial (merges changes from two parents with both having separate changes to a file), or an evil one, the map will not be full, i.e. some valid sha1 would be absent from it. That case was my initial motivation for combined diffs speedup. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-04-06 23:46:26 +02:00
/*
* Some mode bits are also used internally for computations.
*
* They *must* not overlap with any valid modes, and they *must* not be emitted
* to outside world - i.e. appear on disk or network. In other words, it's just
* temporary fields, which we internally use, but they have to stay in-house.
*
* ( such approach is valid, as standard S_IF* fits into 16 bits, and in Git
* codebase mode is `unsigned int` which is assumed to be at least 32 bits )
*/
/* used internally in tree-diff */
#define S_DIFFTREE_IFXMIN_NEQ 0x80000000
/*
* Intensive research over the course of many years has shown that
* port 9418 is totally unused by anything else. Or
*
* Your search - "port 9418" - did not match any documents.
*
* as www.google.com puts it.
*
* This port has been properly assigned for git use by IANA:
* git (Assigned-9418) [I06-050728-0001].
*
* git 9418/tcp git pack transfer service
* git 9418/udp git pack transfer service
*
* with Linus Torvalds <torvalds@osdl.org> as the point of
* contact. September 2005.
*
* See http://www.iana.org/assignments/port-numbers
*/
#define DEFAULT_GIT_PORT 9418
/*
* Basic data structures for the directory cache
*/
#define CACHE_SIGNATURE 0x44495243 /* "DIRC" */
struct cache_header {
uint32_t hdr_signature;
uint32_t hdr_version;
uint32_t hdr_entries;
};
#define INDEX_FORMAT_LB 2
#define INDEX_FORMAT_UB 4
/*
* The "cache_time" is just the low 32 bits of the
* time. It doesn't matter if it overflows - we only
* check it for equality in the 32 bits we save.
*/
struct cache_time {
uint32_t sec;
uint32_t nsec;
};
struct stat_data {
struct cache_time sd_ctime;
struct cache_time sd_mtime;
unsigned int sd_dev;
unsigned int sd_ino;
unsigned int sd_uid;
unsigned int sd_gid;
unsigned int sd_size;
};
struct cache_entry {
struct hashmap_entry ent;
struct stat_data ce_stat_data;
unsigned int ce_mode;
unsigned int ce_flags;
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
unsigned int mem_pool_allocated;
unsigned int ce_namelen;
unsigned int index; /* for link extension */
struct object_id oid;
char name[FLEX_ARRAY]; /* more */
};
#define CE_STAGEMASK (0x3000)
#define CE_EXTENDED (0x4000)
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09 06:15:24 +01:00
#define CE_VALID (0x8000)
#define CE_STAGESHIFT 12
/*
* Range 0xFFFF0FFF in ce_flags is divided into
* two parts: in-memory flags and on-disk ones.
* Flags in CE_EXTENDED_FLAGS will get saved on-disk
* if you want to save a new flag, add it in
* CE_EXTENDED_FLAGS
*
* In-memory only flags
*/
#define CE_UPDATE (1 << 16)
#define CE_REMOVE (1 << 17)
#define CE_UPTODATE (1 << 18)
#define CE_ADDED (1 << 19)
Fix name re-hashing semantics We handled the case of removing and re-inserting cache entries badly, which is something that merging commonly needs to do (removing the different stages, and then re-inserting one of them as the merged state). We even had a rather ugly special case for this failure case, where replace_index_entry() basically turned itself into a no-op if the new and the old entries were the same, exactly because the hash routines didn't handle it on their own. So what this patch does is to not just have the UNHASHED bit, but a HASHED bit too, and when you insert an entry into the name hash, that involves: - clear the UNHASHED bit, because now it's valid again for lookup (which is really all that UNHASHED meant) - if we're being lazy, we're done here (but we still want to clear the UNHASHED bit regardless of lazy mode, since we can become unlazy later, and so we need the UNHASHED bit to always be set correctly, even if we never actually insert the entry into the hash list) - if it was already hashed, we just leave it on the list - otherwise mark it HASHED and insert it into the list this all means that unhashing and rehashing a name all just works automatically. Obviously, you cannot change the name of an entry (that would be a serious bug), but nothing can validly do that anyway (you'd have to allocate a new struct cache_entry anyway since the name length could change), so that's not a new limitation. The code actually gets simpler in many ways, although the lazy hashing does mean that there are a few odd cases (ie something can be marked unhashed even though it was never on the hash in the first place, and isn't actually marked hashed!). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-23 05:37:40 +01:00
#define CE_HASHED (1 << 20)
#define CE_FSMONITOR_VALID (1 << 21)
#define CE_WT_REMOVE (1 << 22) /* remove in work directory */
#define CE_CONFLICTED (1 << 23)
#define CE_UNPACKED (1 << 24)
#define CE_NEW_SKIP_WORKTREE (1 << 25)
unpack-trees.c: prepare for looking ahead in the index This prepares but does not yet implement a look-ahead in the index entries when traverse-trees.c decides to give us tree entries in an order that does not match what is in the index. A case where a look-ahead in the index is necessary happens when merging branch B into branch A while the index matches the current branch A, using a tree O as their common ancestor, and these three trees looks like this: O A B t t t-i t-i t-i t-j t-j t/1 t/2 The traverse_trees() function gets "t", "t-i" and "t" from trees O, A and B first, and notices that A may have a matching "t" behind "t-i" and "t-j" (indeed it does), and tells A to give that entry instead. After unpacking blob "t" from tree B (as it hasn't changed since O in B and A removed it, it will result in its removal), it descends into directory "t/". The side that walked index in parallel to the tree traversal used to be implemented with one pointer, o->pos, that points at the next index entry to be processed. When this happens, the pointer o->pos still points at "t-i" that is the first entry. We should be able to skip "t-i" and "t-j" and locate "t/1" from the index while the recursive invocation of traverse_trees() walks and match entries found there, and later come back to process "t-i". While that look-ahead is not implemented yet, this adds a flag bit, CE_UNPACKED, to mark the entries in the index that has already been processed. o->pos pointer has been renamed to o->cache_bottom and it points at the first entry that may still need to be processed. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-07 23:59:54 +01:00
checkout: avoid unnecessary match_pathspec calls In checkout_paths() we do this - for all updated items, call match_pathspec - for all items, call match_pathspec (inside unmerge_cache) - for all items, call match_pathspec (for showing "path .. is unmerged) - for updated items, call match_pathspec and update paths That's a lot of duplicate match_pathspec(s) and the function is not exactly cheap to be called so many times, especially on large indexes. This patch makes it call match_pathspec once per updated index entry, save the result in ce_flags and reuse the results in the following loops. The changes in 0a1283b (checkout $tree $path: do not clobber local changes in $path not in $tree - 2011-09-30) limit the affected paths to ones we read from $tree. We do not do anything to other modified entries in this case, so the "for all items" above could be modified to "for all updated items". But.. The command's behavior now is modified slightly: unmerged entries that match $path, but not updated by $tree, are now NOT touched. Although this should be considered a bug fix, not a regression. A new test is added for this change. And while at there, free ps_matched after use. The following command is tested on webkit, 215k entries. The pattern is chosen mainly to make match_pathspec sweat: git checkout -- "*[a-zA-Z]*[a-zA-Z]*[a-zA-Z]*" before after real 0m3.493s 0m2.737s user 0m2.239s 0m1.586s sys 0m1.252s 0m1.151s Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-27 06:58:21 +01:00
/* used to temporarily mark paths matched by pathspecs */
#define CE_MATCHED (1 << 26)
#define CE_UPDATE_IN_BASE (1 << 27)
#define CE_STRIP_NAME (1 << 28)
/*
* Extended on-disk flags
*/
#define CE_INTENT_TO_ADD (1 << 29)
#define CE_SKIP_WORKTREE (1 << 30)
/* CE_EXTENDED2 is for future extension */
#define CE_EXTENDED2 (1U << 31)
#define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
/*
* Safeguard to avoid saving wrong flags:
* - CE_EXTENDED2 won't get saved until its semantic is known
* - Bits in 0x0000FFFF have been saved in ce_flags already
* - Bits in 0x003F0000 are currently in-memory flags
*/
#if CE_EXTENDED_FLAGS & 0x803FFFFF
#error "CE_EXTENDED_FLAGS out of range"
#endif
/* Forward structure decls */
struct pathspec;
struct child_process;
struct tree;
/*
* Copy the sha1 and stat state of a cache entry from one to
* another. But we never change the name, or the hash state!
*/
static inline void copy_cache_entry(struct cache_entry *dst,
const struct cache_entry *src)
{
unsigned int state = dst->ce_flags & CE_HASHED;
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
int mem_pool_allocated = dst->mem_pool_allocated;
/* Don't copy hash chain and name */
memcpy(&dst->ce_stat_data, &src->ce_stat_data,
offsetof(struct cache_entry, name) -
offsetof(struct cache_entry, ce_stat_data));
/* Restore the hash state */
dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
/* Restore the mem_pool_allocated flag */
dst->mem_pool_allocated = mem_pool_allocated;
}
static inline unsigned create_ce_flags(unsigned stage)
{
return (stage << CE_STAGESHIFT);
}
#define ce_namelen(ce) ((ce)->ce_namelen)
#define ce_size(ce) cache_entry_size(ce_namelen(ce))
#define ce_stage(ce) ((CE_STAGEMASK & (ce)->ce_flags) >> CE_STAGESHIFT)
#define ce_uptodate(ce) ((ce)->ce_flags & CE_UPTODATE)
#define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE)
#define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE)
#define ce_intent_to_add(ce) ((ce)->ce_flags & CE_INTENT_TO_ADD)
#define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
static inline unsigned int create_ce_mode(unsigned int mode)
{
if (S_ISLNK(mode))
return S_IFLNK;
if (S_ISDIR(mode) || S_ISGITLINK(mode))
return S_IFGITLINK;
return S_IFREG | ce_permissions(mode);
}
static inline unsigned int ce_mode_from_stat(const struct cache_entry *ce,
unsigned int mode)
{
extern int trust_executable_bit, has_symlinks;
if (!has_symlinks && S_ISREG(mode) &&
ce && S_ISLNK(ce->ce_mode))
return ce->ce_mode;
if (!trust_executable_bit && S_ISREG(mode)) {
if (ce && S_ISREG(ce->ce_mode))
return ce->ce_mode;
return create_ce_mode(0666);
}
return create_ce_mode(mode);
}
static inline int ce_to_dtype(const struct cache_entry *ce)
{
unsigned ce_mode = ntohl(ce->ce_mode);
if (S_ISREG(ce_mode))
return DT_REG;
else if (S_ISDIR(ce_mode) || S_ISGITLINK(ce_mode))
return DT_DIR;
else if (S_ISLNK(ce_mode))
return DT_LNK;
else
return DT_UNKNOWN;
}
static inline unsigned int canon_mode(unsigned int mode)
{
if (S_ISREG(mode))
return S_IFREG | ce_permissions(mode);
if (S_ISLNK(mode))
return S_IFLNK;
if (S_ISDIR(mode))
return S_IFDIR;
return S_IFGITLINK;
}
#define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
#define SOMETHING_CHANGED (1 << 0) /* unclassified changes go here */
#define CE_ENTRY_CHANGED (1 << 1)
#define CE_ENTRY_REMOVED (1 << 2)
#define CE_ENTRY_ADDED (1 << 3)
#define RESOLVE_UNDO_CHANGED (1 << 4)
#define CACHE_TREE_CHANGED (1 << 5)
#define SPLIT_INDEX_ORDERED (1 << 6)
#define UNTRACKED_CHANGED (1 << 7)
#define FSMONITOR_CHANGED (1 << 8)
struct split_index;
struct untracked_cache;
struct index_state {
struct cache_entry **cache;
unsigned int version;
unsigned int cache_nr, cache_alloc, cache_changed;
struct string_list *resolve_undo;
struct cache_tree *cache_tree;
struct split_index *split_index;
struct cache_time timestamp;
unpack_trees(): protect the handcrafted in-core index from read_cache() unpack_trees() rebuilds the in-core index from scratch by allocating a new structure and finishing it off by copying the built one to the final index. The resulting in-core index is Ok for most use, but read_cache() does not recognize it as such. The function is meant to be no-op if you already have loaded the index, until you call discard_cache(). This change the way read_cache() detects an already initialized in-core index, by introducing an extra bit, and marks the handcrafted in-core index as initialized, to avoid this problem. A better fix in the longer term would be to change the read_cache() API so that it will always discard and re-read from the on-disk index to avoid confusion. But there are higher level API that have relied on the current semantics, and they and their users all need to get converted, which is outside the scope of 'maint' track. An example of such a higher level API is write_cache_as_tree(), which is used by git-write-tree as well as later Porcelains like git-merge, revert and cherry-pick. In the longer term, we should remove read_cache() from there and add one to cmd_write_tree(); other callers expect that the in-core index they prepared is what gets written as a tree so no other change is necessary for this particular codepath. The original version of this patch marked the index by pointing an otherwise wasted malloc'ed memory with o->result.alloc, but this version uses Linus's idea to use a new "initialized" bit, which is conceptually much cleaner. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-08-23 21:57:30 +02:00
unsigned name_hash_initialized : 1,
initialized : 1,
drop_cache_tree : 1,
updated_workdir : 1,
updated_skipworktree : 1,
fsmonitor_has_run_once : 1;
struct hashmap name_hash;
struct hashmap dir_hash;
struct object_id oid;
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
struct mem_pool *ce_mem_pool;
};
/* Name hashing */
int test_lazy_init_name_hash(struct index_state *istate, int try_threaded);
void add_name_hash(struct index_state *istate, struct cache_entry *ce);
void remove_name_hash(struct index_state *istate, struct cache_entry *ce);
void free_name_hash(struct index_state *istate);
block alloc: add lifecycle APIs for cache_entry structs It has been observed that the time spent loading an index with a large number of entries is partly dominated by malloc() calls. This change is in preparation for using memory pools to reduce the number of malloc() calls made to allocate cahce entries when loading an index. Add an API to allocate and discard cache entries, abstracting the details of managing the memory backing the cache entries. This commit does actually change how memory is managed - this will be done in a later commit in the series. This change makes the distinction between cache entries that are associated with an index and cache entries that are not associated with an index. A main use of cache entries is with an index, and we can optimize the memory management around this. We still have other cases where a cache entry is not persisted with an index, and so we need to handle the "transient" use case as well. To keep the congnitive overhead of managing the cache entries, there will only be a single discard function. This means there must be enough information kept with the cache entry so that we know how to discard them. A summary of the main functions in the API is: make_cache_entry: create cache entry for use in an index. Uses specified parameters to populate cache_entry fields. make_empty_cache_entry: Create an empty cache entry for use in an index. Returns cache entry with empty fields. make_transient_cache_entry: create cache entry that is not used in an index. Uses specified parameters to populate cache_entry fields. make_empty_transient_cache_entry: create cache entry that is not used in an index. Returns cache entry with empty fields. discard_cache_entry: A single function that knows how to discard a cache entry regardless of how it was allocated. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:31 +02:00
/* Cache entry creation and cleanup */
/*
* Create cache_entry intended for use in the specified index. Caller
* is responsible for discarding the cache_entry with
* `discard_cache_entry`.
*/
struct cache_entry *make_cache_entry(struct index_state *istate,
unsigned int mode,
const struct object_id *oid,
const char *path,
int stage,
unsigned int refresh_options);
struct cache_entry *make_empty_cache_entry(struct index_state *istate,
size_t name_len);
/*
* Create a cache_entry that is not intended to be added to an index.
* Caller is responsible for discarding the cache_entry
* with `discard_cache_entry`.
*/
struct cache_entry *make_transient_cache_entry(unsigned int mode,
const struct object_id *oid,
const char *path,
int stage);
struct cache_entry *make_empty_transient_cache_entry(size_t name_len);
/*
* Discard cache entry.
*/
void discard_cache_entry(struct cache_entry *ce);
/*
* Check configuration if we should perform extra validation on cache
* entries.
*/
int should_validate_cache_entries(void);
block alloc: allocate cache entries from mem_pool When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Managing transient cache_entry structs: Cache entries are usually allocated for an index, but this is not always the case. Cache entries are sometimes allocated because this is the type that the existing checkout_entry function works with. Because of this, the existing code needs to handle cache entries associated with an index / memory pool, and those that only exist transiently. Several strategies were contemplated around how to handle this: Chosen approach: An extra field was added to the cache_entry type to track whether the cache_entry was allocated from a memory pool or not. This is currently an int field, as there are no more available bits in the existing ce_flags bit field. If / when more bits are needed, this new field can be turned into a proper bit field. Alternatives: 1) Do not include any information about how the cache_entry was allocated. Calling code would be responsible for tracking whether the cache_entry needed to be freed or not. Pro: No extra memory overhead to track this state Con: Extra complexity in callers to handle this correctly. The extra complexity and burden to not regress this behavior in the future was more than we wanted. 2) cache_entry would gain knowledge about which mem_pool allocated it Pro: Could (potentially) do extra logic to know when a mem_pool no longer had references to any cache_entry Con: cache_entry would grow heavier by a pointer, instead of int We didn't see a tangible benefit to this approach 3) Do not add any extra information to a cache_entry, but when freeing a cache entry, check if the memory exists in a region managed by existing mem_pools. Pro: No extra memory overhead to track state Con: Extra computation is performed when freeing cache entries We decided tracking and iterating over known memory pool regions was less desirable than adding an extra field to track this stae. Signed-off-by: Jameson Miller <jamill@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-02 21:49:37 +02:00
/*
* Duplicate a cache_entry. Allocate memory for the new entry from a
* memory_pool. Takes into account cache_entry fields that are meant
* for managing the underlying memory allocation of the cache_entry.
*/
struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct index_state *istate);
/*
* Validate the cache entries in the index. This is an internal
* consistency check that the cache_entry structs are allocated from
* the expected memory pool.
*/
void validate_cache_entries(const struct index_state *istate);
#ifdef USE_THE_INDEX_COMPATIBILITY_MACROS
extern struct index_state the_index;
#define active_cache (the_index.cache)
#define active_nr (the_index.cache_nr)
#define active_alloc (the_index.cache_alloc)
#define active_cache_changed (the_index.cache_changed)
#define active_cache_tree (the_index.cache_tree)
#define read_cache() repo_read_index(the_repository)
read-cache: fix reading the shared index for other repos read_index_from() takes a path argument for the location of the index file. For reading the shared index in split index mode however it just ignores that path argument, and reads it from the gitdir of the current repository. This works as long as an index in the_repository is read. Once that changes, such as when we read the index of a submodule, or of a different working tree than the current one, the gitdir of the_repository will no longer contain the appropriate shared index, and git will fail to read it. For example t3007-ls-files-recurse-submodules.sh was broken with GIT_TEST_SPLIT_INDEX set in 188dce131f ("ls-files: use repository object", 2017-06-22), and t7814-grep-recurse-submodules.sh was also broken in a similar manner, probably by introducing struct repository there, although I didn't track down the exact commit for that. be489d02d2 ("revision.c: --indexed-objects add objects from all worktrees", 2017-08-23) breaks with split index mode in a similar manner, not erroring out when it can't read the index, but instead carrying on with pruning, without taking the index of the worktree into account. Fix this by passing an additional gitdir parameter to read_index_from, to indicate where it should look for and read the shared index from. read_cache_from() defaults to using the gitdir of the_repository. As it is mostly a convenience macro, having to pass get_git_dir() for every call seems overkill, and if necessary users can have more control by using read_index_from(). Helped-by: Brandon Williams <bmwill@google.com> Signed-off-by: Thomas Gummerer <t.gummerer@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-07 23:30:13 +01:00
#define read_cache_from(path) read_index_from(&the_index, (path), (get_git_dir()))
#define read_cache_preload(pathspec) repo_read_index_preload(the_repository, (pathspec), 0)
#define is_cache_unborn() is_index_unborn(&the_index)
#define read_cache_unmerged() repo_read_index_unmerged(the_repository)
#define discard_cache() discard_index(&the_index)
#define unmerged_cache() unmerged_index(&the_index)
#define cache_name_pos(name, namelen) index_name_pos(&the_index,(name),(namelen))
#define add_cache_entry(ce, option) add_index_entry(&the_index, (ce), (option))
#define rename_cache_entry_at(pos, new_name) rename_index_entry_at(&the_index, (pos), (new_name))
#define remove_cache_entry_at(pos) remove_index_entry_at(&the_index, (pos))
#define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
#define add_to_cache(path, st, flags) add_to_index(&the_index, (path), (st), (flags))
#define add_file_to_cache(path, flags) add_file_to_index(&the_index, (path), (flags))
#define chmod_cache_entry(ce, flip) chmod_index_entry(&the_index, (ce), (flip))
#define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL, NULL)
#define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
#define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
#define cache_dir_exists(name, namelen) index_dir_exists(&the_index, (name), (namelen))
#define cache_file_exists(name, namelen, igncase) index_file_exists(&the_index, (name), (namelen), (igncase))
#define cache_name_is_other(name, namelen) index_name_is_other(&the_index, (name), (namelen))
#define resolve_undo_clear() resolve_undo_clear_index(&the_index)
#define unmerge_cache_entry_at(at) unmerge_index_entry_at(&the_index, at)
#define unmerge_cache(pathspec) unmerge_index(&the_index, pathspec)
#define read_blob_data_from_cache(path, sz) read_blob_data_from_index(&the_index, (path), (sz))
#define hold_locked_index(lock_file, flags) repo_hold_locked_index(the_repository, (lock_file), (flags))
#endif
#define TYPE_BITS 3
/*
* Values in this enum (except those outside the 3 bit range) are part
* of pack file format. See Documentation/technical/pack-format.txt
* for more information.
*/
enum object_type {
OBJ_BAD = -1,
OBJ_NONE = 0,
OBJ_COMMIT = 1,
OBJ_TREE = 2,
OBJ_BLOB = 3,
OBJ_TAG = 4,
/* 5 for future expansion */
OBJ_OFS_DELTA = 6,
OBJ_REF_DELTA = 7,
OBJ_ANY,
OBJ_MAX
};
static inline enum object_type object_type(unsigned int mode)
{
return S_ISDIR(mode) ? OBJ_TREE :
S_ISGITLINK(mode) ? OBJ_COMMIT :
OBJ_BLOB;
}
/* Double-check local_repo_env below if you add to this list. */
#define GIT_DIR_ENVIRONMENT "GIT_DIR"
$GIT_COMMON_DIR: a new environment variable This variable is intended to support multiple working directories attached to a repository. Such a repository may have a main working directory, created by either "git init" or "git clone" and one or more linked working directories. These working directories and the main repository share the same repository directory. In linked working directories, $GIT_COMMON_DIR must be defined to point to the real repository directory and $GIT_DIR points to an unused subdirectory inside $GIT_COMMON_DIR. File locations inside the repository are reorganized from the linked worktree view point: - worktree-specific such as HEAD, logs/HEAD, index, other top-level refs and unrecognized files are from $GIT_DIR. - the rest like objects, refs, info, hooks, packed-refs, shallow... are from $GIT_COMMON_DIR (except info/sparse-checkout, but that's a separate patch) Scripts are supposed to retrieve paths in $GIT_DIR with "git rev-parse --git-path", which will take care of "$GIT_DIR vs $GIT_COMMON_DIR" business. The redirection is done by git_path(), git_pathdup() and strbuf_git_path(). The selected list of paths goes to $GIT_COMMON_DIR, not the other way around in case a developer adds a new worktree-specific file and it's accidentally promoted to be shared across repositories (this includes unknown files added by third party commands) The list of known files that belong to $GIT_DIR are: ADD_EDIT.patch BISECT_ANCESTORS_OK BISECT_EXPECTED_REV BISECT_LOG BISECT_NAMES CHERRY_PICK_HEAD COMMIT_MSG FETCH_HEAD HEAD MERGE_HEAD MERGE_MODE MERGE_RR NOTES_EDITMSG NOTES_MERGE_WORKTREE ORIG_HEAD REVERT_HEAD SQUASH_MSG TAG_EDITMSG fast_import_crash_* logs/HEAD next-index-* rebase-apply rebase-merge rsync-refs-* sequencer/* shallow_* Path mapping is NOT done for git_path_submodule(). Multi-checkouts are not supported as submodules. Helped-by: Jens Lehmann <Jens.Lehmann@web.de> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-11-30 09:24:36 +01:00
#define GIT_COMMON_DIR_ENVIRONMENT "GIT_COMMON_DIR"
ref namespaces: infrastructure Add support for dividing the refs of a single repository into multiple namespaces, each of which can have its own branches, tags, and HEAD. Git can expose each namespace as an independent repository to pull from and push to, while sharing the object store, and exposing all the refs to operations such as git-gc. Storing multiple repositories as namespaces of a single repository avoids storing duplicate copies of the same objects, such as when storing multiple branches of the same source. The alternates mechanism provides similar support for avoiding duplicates, but alternates do not prevent duplication between new objects added to the repositories without ongoing maintenance, while namespaces do. To specify a namespace, set the GIT_NAMESPACE environment variable to the namespace. For each ref namespace, git stores the corresponding refs in a directory under refs/namespaces/. For example, GIT_NAMESPACE=foo will store refs under refs/namespaces/foo/. You can also specify namespaces via the --namespace option to git. Note that namespaces which include a / will expand to a hierarchy of namespaces; for example, GIT_NAMESPACE=foo/bar will store refs under refs/namespaces/foo/refs/namespaces/bar/. This makes paths in GIT_NAMESPACE behave hierarchically, so that cloning with GIT_NAMESPACE=foo/bar produces the same result as cloning with GIT_NAMESPACE=foo and cloning from that repo with GIT_NAMESPACE=bar. It also avoids ambiguity with strange namespace paths such as foo/refs/heads/, which could otherwise generate directory/file conflicts within the refs directory. Add the infrastructure for ref namespaces: handle the GIT_NAMESPACE environment variable and --namespace option, and support iterating over refs in a namespace. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Jamey Sharp <jamey@minilop.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-05 19:54:44 +02:00
#define GIT_NAMESPACE_ENVIRONMENT "GIT_NAMESPACE"
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
#define GIT_WORK_TREE_ENVIRONMENT "GIT_WORK_TREE"
#define GIT_PREFIX_ENVIRONMENT "GIT_PREFIX"
#define GIT_SUPER_PREFIX_ENVIRONMENT "GIT_INTERNAL_SUPER_PREFIX"
#define DEFAULT_GIT_DIR_ENVIRONMENT ".git"
Rename environment variables. H. Peter Anvin mentioned that using SHA1_whatever as an environment variable name is not nice and we should instead use names starting with "GIT_" prefix to avoid conflicts. Here is what this patch does: * Renames the following environment variables: New name Old Name GIT_AUTHOR_DATE AUTHOR_DATE GIT_AUTHOR_EMAIL AUTHOR_EMAIL GIT_AUTHOR_NAME AUTHOR_NAME GIT_COMMITTER_EMAIL COMMIT_AUTHOR_EMAIL GIT_COMMITTER_NAME COMMIT_AUTHOR_NAME GIT_ALTERNATE_OBJECT_DIRECTORIES SHA1_FILE_DIRECTORIES GIT_OBJECT_DIRECTORY SHA1_FILE_DIRECTORY * Introduces a compatibility macro, gitenv(), which does an getenv() and if it fails calls gitenv_bc(), which in turn picks up the value from old name while giving a warning about using an old name. * Changes all users of the environment variable to fetch environment variable with the new name using gitenv(). * Updates the documentation and scripts shipped with Linus GIT distribution. The transition plan is as follows: * We will keep the backward compatibility list used by gitenv() for now, so the current scripts and user environments continue to work as before. The users will get warnings when they have old name but not new name in their environment to the stderr. * The Porcelain layers should start using new names. However, just in case it ends up calling old Plumbing layer implementation, they should also export old names, taking values from the corresponding new names, during the transition period. * After a transition period, we would drop the compatibility support and drop gitenv(). Revert the callers to directly call getenv() but keep using the new names. The last part is probably optional and the transition duration needs to be set to a reasonable value. Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-05-10 02:57:56 +02:00
#define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"
#define INDEX_ENVIRONMENT "GIT_INDEX_FILE"
#define GRAFT_ENVIRONMENT "GIT_GRAFT_FILE"
#define GIT_SHALLOW_FILE_ENVIRONMENT "GIT_SHALLOW_FILE"
#define TEMPLATE_DIR_ENVIRONMENT "GIT_TEMPLATE_DIR"
#define CONFIG_ENVIRONMENT "GIT_CONFIG"
#define CONFIG_DATA_ENVIRONMENT "GIT_CONFIG_PARAMETERS"
#define EXEC_PATH_ENVIRONMENT "GIT_EXEC_PATH"
#define CEILING_DIRECTORIES_ENVIRONMENT "GIT_CEILING_DIRECTORIES"
#define NO_REPLACE_OBJECTS_ENVIRONMENT "GIT_NO_REPLACE_OBJECTS"
#define GIT_REPLACE_REF_BASE_ENVIRONMENT "GIT_REPLACE_REF_BASE"
Add basic infrastructure to assign attributes to paths This adds the basic infrastructure to assign attributes to paths, in a way similar to what the exclusion mechanism does based on $GIT_DIR/info/exclude and .gitignore files. An attribute is just a simple string that does not contain any whitespace. They can be specified in $GIT_DIR/info/attributes file, and .gitattributes file in each directory. Each line in these files defines a pattern matching rule. Similar to the exclusion mechanism, a later match overrides an earlier match in the same file, and entries from .gitattributes file in the same directory takes precedence over the ones from parent directories. Lines in $GIT_DIR/info/attributes file are used as the lowest precedence default rules. A line is either a comment (an empty line, or a line that begins with a '#'), or a rule, which is a whitespace separated list of tokens. The first token on the line is a shell glob pattern. The rest are names of attributes, each of which can optionally be prefixed with '!'. Such a line means "if a path matches this glob, this attribute is set (or unset -- if the attribute name is prefixed with '!'). For glob matching, the same "if the pattern does not have a slash in it, the basename of the path is matched with fnmatch(3) against the pattern, otherwise, the path is matched with the pattern with FNM_PATHNAME" rule as the exclusion mechanism is used. This does not define what an attribute means. Tying an attribute to various effects it has on git operation for paths that have it will be specified separately. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-12 10:07:32 +02:00
#define GITATTRIBUTES_FILE ".gitattributes"
#define INFOATTRIBUTES_FILE "info/attributes"
attribute macro support This adds "attribute macros" (for lack of better name). So far, we have low-level attributes such as crlf and diff, which are defined in operational terms --- setting or unsetting them on a particular path directly affects what is done to the path. For example, in order to decline diffs or crlf conversions on a binary blob, no diffs on PostScript files, and treat all other files normally, you would have something like these: * diff crlf *.ps !diff proprietary.o !diff !crlf That is fine as the operation goes, but gets unwieldy rather rapidly, when we start adding more low-level attributes that are defined in operational terms. A near-term example of such an attribute would be 'merge-3way' which would control if git should attempt the usual 3-way file-level merge internally, or leave merging to a specialized external program of user's choice. When it is added, we do _not_ want to force the users to update the above to: * diff crlf merge-3way *.ps !diff proprietary.o !diff !crlf !merge-3way The way this patch solves this issue is to realize that the attributes the user is assigning to paths are not defined in terms of operations but in terms of what they are. All of the three low-level attributes usually make sense for most of the files that sane SCM users have git operate on (these files are typically called "text'). Only a few cases, such as binary blob, need exception to decline the "usual treatment given to text files" -- and people mark them as "binary". So this allows the $GIT_DIR/info/alternates and .gitattributes at the toplevel of the project to also specify attributes that assigns other attributes. The syntax is '[attr]' followed by an attribute name followed by a list of attribute names: [attr] binary !diff !crlf !merge-3way When "binary" attribute is set to a path, if the path has not got diff/crlf/merge-3way attribute set or unset by other rules, this rule unsets the three low-level attributes. It is expected that the user level .gitattributes will be expressed mostly in terms of attributes based on what the files are, and the above sample would become like this: (built-in attribute configuration) [attr] binary !diff !crlf !merge-3way * diff crlf merge-3way (project specific .gitattributes) proprietary.o binary (user preference $GIT_DIR/info/attributes) *.ps !diff There are a few caveats. * As described above, you can define these macros only in $GIT_DIR/info/attributes and toplevel .gitattributes. * There is no attempt to detect circular definition of macro attributes, and definitions are evaluated from bottom to top as usual to fill in other attributes that have not yet got values. The following would work as expected: [attr] text diff crlf [attr] ps text !diff *.ps ps while this would most likely not (I haven't tried): [attr] ps text !diff [attr] text diff crlf *.ps ps * When a macro says "[attr] A B !C", saying that a path does not have attribute A does not let you tell anything about attributes B or C. That is, given this: [attr] text diff crlf [attr] ps text !diff *.txt !ps path hello.txt, which would match "*.txt" pattern, would have "ps" attribute set to zero, but that does not make text attribute of hello.txt set to false (nor diff attribute set to true). Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-04-14 17:54:37 +02:00
#define ATTRIBUTE_MACRO_PREFIX "[attr]"
#define GITMODULES_FILE ".gitmodules"
#define GITMODULES_INDEX ":.gitmodules"
#define GITMODULES_HEAD "HEAD:.gitmodules"
#define GIT_NOTES_REF_ENVIRONMENT "GIT_NOTES_REF"
#define GIT_NOTES_DEFAULT_REF "refs/notes/commits"
#define GIT_NOTES_DISPLAY_REF_ENVIRONMENT "GIT_NOTES_DISPLAY_REF"
#define GIT_NOTES_REWRITE_REF_ENVIRONMENT "GIT_NOTES_REWRITE_REF"
#define GIT_NOTES_REWRITE_MODE_ENVIRONMENT "GIT_NOTES_REWRITE_MODE"
add global --literal-pathspecs option Git takes pathspec arguments in many places to limit the scope of an operation. These pathspecs are treated not as literal paths, but as glob patterns that can be fed to fnmatch. When a user is giving a specific pattern, this is a nice feature. However, when programatically providing pathspecs, it can be a nuisance. For example, to find the latest revision which modified "$foo", one can use "git rev-list -- $foo". But if "$foo" contains glob characters (e.g., "f*"), it will erroneously match more entries than desired. The caller needs to quote the characters in $foo, and even then, the results may not be exactly the same as with a literal pathspec. For instance, the depth checks in match_pathspec_depth do not kick in if we match via fnmatch. This patch introduces a global command-line option (i.e., one for "git" itself, not for specific commands) to turn this behavior off. It also has a matching environment variable, which can make it easier if you are a script or porcelain interface that is going to issue many such commands. This option cannot turn off globbing for particular pathspecs. That could eventually be done with a ":(noglob)" magic pathspec prefix. However, that level of granularity is more cumbersome to use for many cases, and doing ":(noglob)" right would mean converting the whole codebase to use "struct pathspec", as the usual "const char **pathspec" cannot represent extra per-item flags. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-12-19 23:37:30 +01:00
#define GIT_LITERAL_PATHSPECS_ENVIRONMENT "GIT_LITERAL_PATHSPECS"
#define GIT_GLOB_PATHSPECS_ENVIRONMENT "GIT_GLOB_PATHSPECS"
#define GIT_NOGLOB_PATHSPECS_ENVIRONMENT "GIT_NOGLOB_PATHSPECS"
#define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
#define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
git: add --no-optional-locks option Some tools like IDEs or fancy editors may periodically run commands like "git status" in the background to keep track of the state of the repository. Some of these commands may refresh the index and write out the result in an opportunistic way: if they can get the index lock, then they update the on-disk index with any updates they find. And if not, then their in-core refresh is lost and just has to be recomputed by the next caller. But taking the index lock may conflict with other operations in the repository. Especially ones that the user is doing themselves, which _aren't_ opportunistic. In other words, "git status" knows how to back off when somebody else is holding the lock, but other commands don't know that status would be happy to drop the lock if somebody else wanted it. There are a couple possible solutions: 1. Have some kind of "pseudo-lock" that allows other commands to tell status that they want the lock. This is likely to be complicated and error-prone to implement (and maybe even impossible with just dotlocks to work from, as it requires some inter-process communication). 2. Avoid background runs of commands like "git status" that want to do opportunistic updates, preferring instead plumbing like diff-files, etc. This is awkward for a couple of reasons. One is that "status --porcelain" reports a lot more about the repository state than is available from individual plumbing commands. And two is that we actually _do_ want to see the refreshed index. We just don't want to take a lock or write out the result. Whereas commands like diff-files expect us to refresh the index separately and write it to disk so that they can depend on the result. But that write is exactly what we're trying to avoid. 3. Ask "status" not to lock or write the index. This is easy to implement. The big downside is that any work done in refreshing the index for such a call is lost when the process exits. So a background process may end up re-hashing a changed file multiple times until the user runs a command that does an index refresh themselves. This patch implements the option 3. The idea (and the test) is largely stolen from a Git for Windows patch by Johannes Schindelin, 67e5ce7f63 (status: offer *not* to lock the index and update it, 2016-08-12). The twist here is that instead of making this an option to "git status", it becomes a "git" option and matching environment variable. The reason there is two-fold: 1. An environment variable is carried through to sub-processes. And whether an invocation is a background process or not should apply to the whole process tree. So you could do "git --no-optional-locks foo", and if "foo" is a script or alias that calls "status", you'll still get the effect. 2. There may be other programs that want the same treatment. I've punted here on finding more callers to convert, since "status" is the obvious one to call as a repeated background job. But "git diff"'s opportunistic refresh of the index may be a good candidate. The test is taken from 67e5ce7f63, and it's worth repeating Johannes's explanation: Note that the regression test added in this commit does not *really* verify that no index.lock file was written; that test is not possible in a portable way. Instead, we verify that .git/index is rewritten *only* when `git status` is run without `--no-optional-locks`. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 08:54:30 +02:00
#define GIT_OPTIONAL_LOCKS_ENVIRONMENT "GIT_OPTIONAL_LOCKS"
#define GIT_TEXT_DOMAIN_DIR_ENVIRONMENT "GIT_TEXTDOMAINDIR"
/*
* Environment variable used in handshaking the wire protocol.
* Contains a colon ':' separated list of keys with optional values
* 'key[=value]'. Presence of unknown keys and values must be
* ignored.
*/
#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
/* HTTP header used to handshake the wire protocol */
#define GIT_PROTOCOL_HEADER "Git-Protocol"
/*
setup: suppress implicit "." work-tree for bare repos If an explicit GIT_DIR is given without a working tree, we implicitly assume that the current working directory should be used as the working tree. E.g.,: GIT_DIR=/some/repo.git git status would compare against the cwd. Unfortunately, we fool this rule for sub-invocations of git by setting GIT_DIR internally ourselves. For example: git init foo cd foo/.git git status ;# fails, as we expect git config alias.st status git status ;# does not fail, but should What happens is that we run setup_git_directory when doing alias lookup (since we need to see the config), set GIT_DIR as a result, and then leave GIT_WORK_TREE blank (because we do not have one). Then when we actually run the status command, we do setup_git_directory again, which sees our explicit GIT_DIR and uses the cwd as an implicit worktree. It's tempting to argue that we should be suppressing that second invocation of setup_git_directory, as it could use the values we already found in memory. However, the problem still exists for sub-processes (e.g., if "git status" were an external command). You can see another example with the "--bare" option, which sets GIT_DIR explicitly. For example: git init foo cd foo/.git git status ;# fails git --bare status ;# does NOT fail We need some way of telling sub-processes "even though GIT_DIR is set, do not use cwd as an implicit working tree". We could do it by putting a special token into GIT_WORK_TREE, but the obvious choice (an empty string) has some portability problems. Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE, which suppresses the use of cwd as a working tree when GIT_DIR is set. We trigger the new variable when we know we are in a bare setting. The variable is left intentionally undocumented, as this is an internal detail (for now, anyway). If somebody comes up with a good alternate use for it, and once we are confident we have shaken any bugs out of it, we can consider promoting it further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 10:32:22 +01:00
* This environment variable is expected to contain a boolean indicating
* whether we should or should not treat:
*
* GIT_DIR=foo.git git ...
*
* as if GIT_WORK_TREE=. was given. It's not expected that users will make use
* of this, but we use it internally to communicate to sub-processes that we
* are in a bare repo. If not set, defaults to true.
*/
#define GIT_IMPLICIT_WORK_TREE_ENVIRONMENT "GIT_IMPLICIT_WORK_TREE"
/*
* Repository-local GIT_* environment variables; these will be cleared
* when git spawns a sub-process that runs inside another repository.
* The array is NULL-terminated, which makes it easy to pass in the "env"
* parameter of a run-command invocation, or to do a simple walk.
*/
extern const char * const local_repo_env[];
void setup_git_env(const char *git_dir);
config: only read .git/config from configured repos When git_config() runs, it looks in the system, user-wide, and repo-level config files. It gets the latter by calling git_pathdup(), which in turn calls get_git_dir(). If we haven't set up the git repository yet, this may simply return ".git", and we will look at ".git/config". This seems like it would be helpful (presumably we haven't set up the repository yet, so it tries to find it), but it turns out to be a bad idea for a few reasons: - it's not sufficient, and therefore hides bugs in a confusing way. Config will be respected if commands are run from the top-level of the working tree, but not from a subdirectory. - it's not always true that we haven't set up the repository _yet_; we may not want to do it at all. For instance, if you run "git init /some/path" from inside another repository, it should not load config from the existing repository. - there might be a path ".git/config", but it is not the actual repository we would find via setup_git_directory(). This may happen, e.g., if you are storing a git repository inside another git repository, but have munged one of the files in such a way that the inner repository is not valid (e.g., by removing HEAD). We have at least two bugs of the second type in git-init, introduced by ae5f677 (lazily load core.sharedrepository, 2016-03-11). It causes init to use git_configset(), which loads all of the config, including values from the current repo (if any). This shows up in two ways: 1. If we happen to be in an existing repository directory, we'll read and respect core.sharedrepository from it, even though it should have no bearing on the new repository. A new test in t1301 covers this. 2. Similarly, if we're in an existing repo that sets core.logallrefupdates, that will cause init to fail to set it in a newly created repository (because it thinks that the user's templates already did so). A new test in t0001 covers this. We also need to adjust an existing test in t1302, which gives another example of why this patch is an improvement. That test creates an embedded repository with a bogus core.repositoryformatversion of "99". It wants to make sure that we actually stop at the bogus repo rather than continuing upward to find the outer repo. So it checks that "git config core.repositoryformatversion" returns 99. But that only works because we blindly read ".git/config", even though we _know_ we're in a repository whose vintage we do not understand. After this patch, we avoid reading config from the unknown vintage repository at all, which is a safer choice. But we need to tweak the test, since core.repositoryformatversion will not return 99; it will claim that it could not find the variable at all. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-13 05:24:15 +02:00
/*
* Returns true iff we have a configured git repository (either via
* setup_git_directory, or in the environment via $GIT_DIR).
*/
int have_git_dir(void);
extern int is_bare_repository_cfg;
int is_bare_repository(void);
int is_inside_git_dir(void);
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
extern char *git_work_tree_cfg;
int is_inside_work_tree(void);
const char *get_git_dir(void);
const char *get_git_common_dir(void);
char *get_object_directory(void);
char *get_index_file(void);
char *get_graft_file(struct repository *r);
void set_git_dir(const char *path);
int get_common_dir_noenv(struct strbuf *sb, const char *gitdir);
int get_common_dir(struct strbuf *sb, const char *gitdir);
const char *get_git_namespace(void);
const char *strip_namespace(const char *namespaced_ref);
const char *get_super_prefix(void);
const char *get_git_work_tree(void);
/*
* Return true if the given path is a git directory; note that this _just_
* looks at the directory itself. If you want to know whether "foo/.git"
* is a repository, you must feed that path, not just "foo".
*/
int is_git_directory(const char *path);
/*
* Return 1 if the given path is the root of a git repository or
* submodule, else 0. Will not return 1 for bare repositories with the
* exception of creating a bare repository in "foo/.git" and calling
* is_git_repository("foo").
*
* If we run into read errors, we err on the side of saying "yes, it is",
* as we usually consider sub-repos precious, and would prefer to err on the
* side of not disrupting or deleting them.
*/
int is_nonbare_repository_dir(struct strbuf *path);
#define READ_GITFILE_ERR_STAT_FAILED 1
#define READ_GITFILE_ERR_NOT_A_FILE 2
#define READ_GITFILE_ERR_OPEN_FAILED 3
#define READ_GITFILE_ERR_READ_FAILED 4
#define READ_GITFILE_ERR_INVALID_FORMAT 5
#define READ_GITFILE_ERR_NO_PATH 6
#define READ_GITFILE_ERR_NOT_A_REPO 7
#define READ_GITFILE_ERR_TOO_LARGE 8
void read_gitfile_error_die(int error_code, const char *path, const char *dir);
const char *read_gitfile_gently(const char *path, int *return_error_code);
#define read_gitfile(path) read_gitfile_gently((path), NULL)
const char *resolve_gitdir_gently(const char *suspect, int *return_error_code);
#define resolve_gitdir(path) resolve_gitdir_gently((path), NULL)
void set_git_work_tree(const char *tree);
#define ALTERNATE_DB_ENVIRONMENT "GIT_ALTERNATE_OBJECT_DIRECTORIES"
void setup_work_tree(void);
/*
* Find the commondir and gitdir of the repository that contains the current
* working directory, without changing the working directory or other global
* state. The result is appended to commondir and gitdir. If the discovered
* gitdir does not correspond to a worktree, then 'commondir' and 'gitdir' will
* both have the same result appended to the buffer. The return value is
* either 0 upon success and non-zero if no repository was found.
*/
int discover_git_directory(struct strbuf *commondir,
struct strbuf *gitdir);
const char *setup_git_directory_gently(int *);
const char *setup_git_directory(void);
char *prefix_path(const char *prefix, int len, const char *path);
char *prefix_path_gently(const char *prefix, int len, int *remaining, const char *path);
/*
* Concatenate "prefix" (if len is non-zero) and "path", with no
* connecting characters (so "prefix" should end with a "/").
* Unlike prefix_path, this should be used if the named file does
* not have to interact with index entry; i.e. name of a random file
* on the filesystem.
*
* The return value is always a newly allocated string (even if the
* prefix was empty).
*/
char *prefix_filename(const char *prefix, const char *path);
int check_filename(const char *prefix, const char *name);
void verify_filename(const char *prefix,
const char *name,
int diagnose_misspelt_rev);
void verify_non_filename(const char *prefix, const char *name);
int path_inside_repo(const char *prefix, const char *path);
#define INIT_DB_QUIET 0x0001
#define INIT_DB_EXIST_OK 0x0002
int init_db(const char *git_dir, const char *real_git_dir,
const char *template_dir, unsigned int flags);
void sanitize_stdfds(void);
int daemonize(void);
#define alloc_nr(x) (((x)+16)*3/2)
/*
* Realloc the buffer pointed at by variable 'x' so that it can hold
* at least 'nr' entries; the number of entries currently allocated
* is 'alloc', using the standard growing factor alloc_nr() macro.
*
* DO NOT USE any expression with side-effect for 'x', 'nr', or 'alloc'.
*/
#define ALLOC_GROW(x, nr, alloc) \
do { \
if ((nr) > alloc) { \
if (alloc_nr(alloc) < (nr)) \
alloc = (nr); \
else \
alloc = alloc_nr(alloc); \
REALLOC_ARRAY(x, alloc); \
} \
} while (0)
/* Initialize and use the cache information */
struct lock_file;
void preload_index(struct index_state *index,
const struct pathspec *pathspec,
unsigned int refresh_flags);
int do_read_index(struct index_state *istate, const char *path,
int must_exist); /* for testting only! */
int read_index_from(struct index_state *, const char *path,
const char *gitdir);
int is_index_unborn(struct index_state *);
/* For use with `write_locked_index()`. */
#define COMMIT_LOCK (1 << 0)
#define SKIP_IF_UNCHANGED (1 << 1)
/*
* Write the index while holding an already-taken lock. Close the lock,
* and if `COMMIT_LOCK` is given, commit it.
*
* Unless a split index is in use, write the index into the lockfile.
*
* With a split index, write the shared index to a temporary file,
* adjust its permissions and rename it into place, then write the
* split index to the lockfile. If the temporary file for the shared
* index cannot be created, fall back to the behavior described in
* the previous paragraph.
read-cache: leave lock in right state in `write_locked_index()` If the original version of `write_locked_index()` returned with an error, it didn't roll back the lockfile unless the error occured at the very end, during closing/committing. See commit 03b866477 (read-cache: new API write_locked_index instead of write_index/write_cache, 2014-06-13). In commit 9f41c7a6b (read-cache: close index.lock in do_write_index, 2017-04-26), we learned to close the lock slightly earlier in the callstack. That was mostly a side-effect of lockfiles being implemented using temporary files, but didn't cause any real harm. Recently, commit 076aa2cbd (tempfile: auto-allocate tempfiles on heap, 2017-09-05) introduced a subtle bug. If the temporary file is deleted (i.e., the lockfile is rolled back), the tempfile-pointer in the `struct lock_file` will be left dangling. Thus, an attempt to reuse the lockfile, or even just to roll it back, will induce undefined behavior -- most likely a crash. Besides not crashing, we clearly want to make things consistent. The guarantees which the lockfile-machinery itself provides is A) if we ask to commit and it fails, roll back, and B) if we ask to close and it fails, do _not_ roll back. Let's do the same for consistency. Do not delete the temporary file in `do_write_index()`. One of its callers, `write_locked_index()` will thereby avoid rolling back the lock. The other caller, `write_shared_index()`, will delete its temporary file anyway. Both of these callers will avoid undefined behavior (crashing). Teach `write_locked_index(..., COMMIT_LOCK)` to roll back the lock before returning. If we have already succeeded and committed, it will be a noop. Simplify the existing callers where we now have a superfluous call to `rollback_lockfile()`. That should keep future readers from wondering why the callers are inconsistent. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 22:12:13 +02:00
*
* With `COMMIT_LOCK`, the lock is always committed or rolled back.
* Without it, the lock is closed, but neither committed nor rolled
* back.
*
* If `SKIP_IF_UNCHANGED` is given and the index is unchanged, nothing
* is written (and the lock is rolled back if `COMMIT_LOCK` is given).
*/
int write_locked_index(struct index_state *, struct lock_file *lock, unsigned flags);
int discard_index(struct index_state *);
void move_index_extensions(struct index_state *dst, struct index_state *src);
int unmerged_index(const struct index_state *);
/**
* Returns 1 if istate differs from tree, 0 otherwise. If tree is NULL,
* compares istate to HEAD. If tree is NULL and on an unborn branch,
* returns 1 if there are entries in istate, 0 otherwise. If an strbuf is
* provided, the space-separated list of files that differ will be appended
* to it.
*/
int repo_index_has_changes(struct repository *repo,
struct tree *tree,
struct strbuf *sb);
int verify_path(const char *path, unsigned mode);
int strcmp_offset(const char *s1, const char *s2, size_t *first_change);
int index_dir_exists(struct index_state *istate, const char *name, int namelen);
void adjust_dirname_case(struct index_state *istate, char *name);
struct cache_entry *index_file_exists(struct index_state *istate, const char *name, int namelen, int igncase);
/*
* Searches for an entry defined by name and namelen in the given index.
* If the return value is positive (including 0) it is the position of an
* exact match. If the return value is negative, the negated value minus 1
* is the position where the entry would be inserted.
* Example: The current index consists of these files and its stages:
*
* b#0, d#0, f#1, f#3
*
* index_name_pos(&index, "a", 1) -> -1
* index_name_pos(&index, "b", 1) -> 0
* index_name_pos(&index, "c", 1) -> -2
* index_name_pos(&index, "d", 1) -> 1
* index_name_pos(&index, "e", 1) -> -3
* index_name_pos(&index, "f", 1) -> -3
* index_name_pos(&index, "g", 1) -> -5
*/
int index_name_pos(const struct index_state *, const char *name, int namelen);
#define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */
#define ADD_CACHE_OK_TO_REPLACE 2 /* Ok to replace file/directory */
#define ADD_CACHE_SKIP_DFCHECK 4 /* Ok to skip DF conflict checks */
#define ADD_CACHE_JUST_APPEND 8 /* Append only; tree.c::read_tree() */
#define ADD_CACHE_NEW_ONLY 16 /* Do not replace existing ones */
#define ADD_CACHE_KEEP_CACHE_TREE 32 /* Do not invalidate cache-tree */
#define ADD_CACHE_RENORMALIZE 64 /* Pass along HASH_RENORMALIZE */
int add_index_entry(struct index_state *, struct cache_entry *ce, int option);
void rename_index_entry_at(struct index_state *, int pos, const char *new_name);
/* Remove entry, return true if there are more entries to go. */
int remove_index_entry_at(struct index_state *, int pos);
void remove_marked_cache_entries(struct index_state *istate, int invalidate);
int remove_file_from_index(struct index_state *, const char *path);
#define ADD_CACHE_VERBOSE 1
#define ADD_CACHE_PRETEND 2
#define ADD_CACHE_IGNORE_ERRORS 4
#define ADD_CACHE_IGNORE_REMOVAL 8
#define ADD_CACHE_INTENT 16
/*
* These two are used to add the contents of the file at path
* to the index, marking the working tree up-to-date by storing
* the cached stat info in the resulting cache entry. A caller
* that has already run lstat(2) on the path can call
* add_to_index(), and all others can call add_file_to_index();
* the latter will do necessary lstat(2) internally before
* calling the former.
*/
int add_to_index(struct index_state *, const char *path, struct stat *, int flags);
int add_file_to_index(struct index_state *, const char *path, int flags);
int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip);
int ce_same_name(const struct cache_entry *a, const struct cache_entry *b);
void set_object_name_for_intent_to_add_entry(struct cache_entry *ce);
int index_name_is_other(const struct index_state *, const char *, int);
void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *);
/* do stat comparison even if CE_VALID is true */
#define CE_MATCH_IGNORE_VALID 01
/* do not check the contents but report dirty on racily-clean entries */
#define CE_MATCH_RACY_IS_DIRTY 02
/* do stat comparison even if CE_SKIP_WORKTREE is true */
#define CE_MATCH_IGNORE_SKIP_WORKTREE 04
/* ignore non-existent files during stat update */
#define CE_MATCH_IGNORE_MISSING 0x08
/* enable stat refresh */
#define CE_MATCH_REFRESH 0x10
/* don't refresh_fsmonitor state or do stat comparison even if CE_FSMONITOR_VALID is true */
#define CE_MATCH_IGNORE_FSMONITOR 0X20
int is_racy_timestamp(const struct index_state *istate,
const struct cache_entry *ce);
int ie_match_stat(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, unsigned int);
#define HASH_WRITE_OBJECT 1
#define HASH_FORMAT_CHECK 2
#define HASH_RENORMALIZE 4
int index_fd(struct index_state *istate, struct object_id *oid, int fd, struct stat *st, enum object_type type, const char *path, unsigned flags);
int index_path(struct index_state *istate, struct object_id *oid, const char *path, struct stat *st, unsigned flags);
/*
* Record to sd the data from st that we use to check whether a file
* might have changed.
*/
void fill_stat_data(struct stat_data *sd, struct stat *st);
/*
* Return 0 if st is consistent with a file not having been changed
* since sd was filled. If there are differences, return a
* combination of MTIME_CHANGED, CTIME_CHANGED, OWNER_CHANGED,
* INODE_CHANGED, and DATA_CHANGED.
*/
int match_stat_data(const struct stat_data *sd, struct stat *st);
int match_stat_data_racy(const struct index_state *istate,
const struct stat_data *sd, struct stat *st);
void fill_stat_cache_info(struct index_state *istate, struct cache_entry *ce, struct stat *st);
#define REFRESH_REALLY 0x0001 /* ignore_valid */
#define REFRESH_UNMERGED 0x0002 /* allow unmerged */
#define REFRESH_QUIET 0x0004 /* be quiet about it */
#define REFRESH_IGNORE_MISSING 0x0008 /* ignore non-existent */
#define REFRESH_IGNORE_SUBMODULES 0x0010 /* ignore submodules */
#define REFRESH_IN_PORCELAIN 0x0020 /* user friendly output, not "needs update" */
#define REFRESH_PROGRESS 0x0040 /* show progress bar if stderr is tty */
int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg);
struct cache_entry *refresh_cache_entry(struct index_state *, struct cache_entry *, unsigned int);
void set_alternate_index_output(const char *);
extern int verify_index_checksum;
extern int verify_ce_order;
/* Environment bits from configuration mechanism */
extern int trust_executable_bit;
extern int trust_ctime;
extern int check_stat;
extern int quote_path_fully;
extern int has_symlinks;
extern int minimum_abbrev, default_abbrev;
extern int ignore_case;
"Assume unchanged" git This adds "assume unchanged" logic, started by this message in the list discussion recently: <Pine.LNX.4.64.0601311807470.7301@g5.osdl.org> This is a workaround for filesystems that do not have lstat() that is quick enough for the index mechanism to take advantage of. On the paths marked as "assumed to be unchanged", the user needs to explicitly use update-index to register the object name to be in the next commit. You can use two new options to update-index to set and reset the CE_VALID bit: git-update-index --assume-unchanged path... git-update-index --no-assume-unchanged path... These forms manipulate only the CE_VALID bit; it does not change the object name recorded in the index file. Nor they add a new entry to the index. When the configuration variable "core.ignorestat = true" is set, the index entries are marked with CE_VALID bit automatically after: - update-index to explicitly register the current object name to the index file. - when update-index --refresh finds the path to be up-to-date. - when tools like read-tree -u and apply --index update the working tree file and register the current object name to the index file. The flag is dropped upon read-tree that does not check out the index entry. This happens regardless of the core.ignorestat settings. Index entries marked with CE_VALID bit are assumed to be unchanged most of the time. However, there are cases that CE_VALID bit is ignored for the sake of safety and usability: - while "git-read-tree -m" or git-apply need to make sure that the paths involved in the merge do not have local modifications. This sacrifices performance for safety. - when git-checkout-index -f -q -u -a tries to see if it needs to checkout the paths. Otherwise you can never check anything out ;-). - when git-update-index --really-refresh (a new flag) tries to see if the index entry is up to date. You can start with everything marked as CE_VALID and run this once to drop CE_VALID bit for paths that are modified. Most notably, "update-index --refresh" honours CE_VALID and does not actively stat, so after you modified a file in the working tree, update-index --refresh would not notice until you tell the index about it with "git-update-index path" or "git-update-index --no-assume-unchanged path". This version is not expected to be perfect. I think diff between index and/or tree and working files may need some adjustment, and there probably needs other cases we should automatically unmark paths that are marked to be CE_VALID. But the basics seem to work, and ready to be tested by people who asked for this feature. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-02-09 06:15:24 +01:00
extern int assume_unchanged;
extern int prefer_symlink_refs;
extern int warn_ambiguous_refs;
cat-file: disable object/refname ambiguity check for batch mode A common use of "cat-file --batch-check" is to feed a list of objects from "rev-list --objects" or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all >objects $ time git cat-file --batch-check='%(objectname)' <objects on the linux.git repository: real 1m13.494s user 0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real 0m54.697s user 0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real 0m7.452s user 0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 08:20:05 +02:00
extern int warn_on_object_refname_ambiguity;
extern const char *apply_default_whitespace;
extern const char *apply_default_ignorewhitespace;
extern const char *git_attributes_file;
extern const char *git_hooks_path;
extern int zlib_compression_level;
Custom compression levels for objects and packs Add config variables pack.compression and core.loosecompression , and switch --compression=level to pack-objects. Loose objects will be compressed using core.loosecompression if set, else core.compression if set, else Z_BEST_SPEED. Packed objects will be compressed using --compression=level if seen, else pack.compression if set, else core.compression if set, else Z_DEFAULT_COMPRESSION. This is the "pack compression level". Loose objects added to a pack undeltified will be recompressed to the pack compression level if it is unequal to the current loose compression level by the preceding rules, or if the loose object was written while core.legacyheaders = true. Newly deltified loose objects are always compressed to the current pack compression level. Previously packed objects added to a pack are recompressed to the current pack compression level exactly when their deltification status changes, since the previous pack data cannot be reused. In either case, the --no-reuse-object switch from the first patch below will always force recompression to the current pack compression level, instead of assuming the pack compression level hasn't changed and pack data can be reused when possible. This applies on top of the following patches from Nicolas Pitre: [PATCH] allow for undeltified objects not to be reused [PATCH] make "repack -f" imply "pack-objects --no-reuse-object" Signed-off-by: Dana L. How <danahow@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-05-09 22:56:50 +02:00
extern int core_compression_level;
extern int pack_compression_level;
Fully activate the sliding window pack access. This finally turns on the sliding window behavior for packfile data access by mapping limited size windows and chaining them under the packed_git->windows list. We consider a given byte offset to be within the window only if there would be at least 20 bytes (one hash worth of data) accessible after the requested offset. This range selection relates to the contract that use_pack() makes with its callers, allowing them to access one hash or one object header without needing to call use_pack() for every byte of data obtained. In the worst case scenario we will map the same page of data twice into memory: once at the end of one window and once again at the start of the next window. This duplicate page mapping will happen only when an object header or a delta base reference is spanned over the end of a window and is always limited to just one page of duplication, as no sane operating system will ever have a page size smaller than a hash. I am assuming that the possible wasted page of virtual address space is going to perform faster than the alternatives, which would be to copy the object header or ref delta into a temporary buffer prior to parsing, or to check the window range on every byte during header parsing. We may decide to revisit this decision in the future since this is just a gut instinct decision and has not actually been proven out by experimental testing. Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-12-23 08:34:28 +01:00
extern size_t packed_git_window_size;
extern size_t packed_git_limit;
extern size_t delta_base_cache_limit;
extern unsigned long big_file_threshold;
extern unsigned long pack_size_limit_cfg;
/*
* Accessors for the core.sharedrepository config which lazy-load the value
* from the config (if not already set). The "reset" function can be
* used to unset "set" or cached value, meaning that the value will be loaded
* fresh from the config file on the next call to get_shared_repository().
*/
void set_shared_repository(int value);
int get_shared_repository(void);
void reset_shared_repository(void);
/*
* Do replace refs need to be checked this run? This variable is
* initialized to true unless --no-replace-object is used or
* $GIT_NO_REPLACE_OBJECTS is set, but is set to false by some
* commands that do not want replace references to be active.
*/
extern int read_replace_refs;
extern char *git_replace_ref_base;
extern int fsync_object_files;
extern int core_preload_index;
extern int core_apply_sparse_checkout;
git on Mac OS and precomposed unicode Mac OS X mangles file names containing unicode on file systems HFS+, VFAT or SAMBA. When a file using unicode code points outside ASCII is created on a HFS+ drive, the file name is converted into decomposed unicode and written to disk. No conversion is done if the file name is already decomposed unicode. Calling open("\xc3\x84", ...) with a precomposed "Ä" yields the same result as open("\x41\xcc\x88",...) with a decomposed "Ä". As a consequence, readdir() returns the file names in decomposed unicode, even if the user expects precomposed unicode. Unlike on HFS+, Mac OS X stores files on a VFAT drive (e.g. an USB drive) in precomposed unicode, but readdir() still returns file names in decomposed unicode. When a git repository is stored on a network share using SAMBA, file names are send over the wire and written to disk on the remote system in precomposed unicode, but Mac OS X readdir() returns decomposed unicode to be compatible with its behaviour on HFS+ and VFAT. The unicode decomposition causes many problems: - The names "git add" and other commands get from the end user may often be precomposed form (the decomposed form is not easily input from the keyboard), but when the commands read from the filesystem to see what it is going to update the index with already is on the filesystem, readdir() will give decomposed form, which is different. - Similarly "git log", "git mv" and all other commands that need to compare pathnames found on the command line (often but not always precomposed form; a command line input resulting from globbing may be in decomposed) with pathnames found in the tree objects (should be precomposed form to be compatible with other systems and for consistency in general). - The same for names stored in the index, which should be precomposed, that may need to be compared with the names read from readdir(). NFS mounted from Linux is fully transparent and does not suffer from the above. As Mac OS X treats precomposed and decomposed file names as equal, we can - wrap readdir() on Mac OS X to return the precomposed form, and - normalize decomposed form given from the command line also to the precomposed form, to ensure that all pathnames used in Git are always in the precomposed form. This behaviour can be requested by setting "core.precomposedunicode" configuration variable to true. The code in compat/precomposed_utf8.c implements basically 4 new functions: precomposed_utf8_opendir(), precomposed_utf8_readdir(), precomposed_utf8_closedir() and precompose_argv(). The first three are to wrap opendir(3), readdir(3), and closedir(3) functions. The argv[] conversion allows to use the TAB filename completion done by the shell on command line. It tolerates other tools which use readdir() to feed decomposed file names into git. When creating a new git repository with "git init" or "git clone", "core.precomposedunicode" will be set "false". The user needs to activate this feature manually. She typically sets core.precomposedunicode to "true" on HFS and VFAT, or file systems mounted via SAMBA. Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-08 15:50:25 +02:00
extern int precomposed_unicode;
extern int protect_hfs;
extern int protect_ntfs;
extern const char *core_fsmonitor;
/*
* Include broken refs in all ref iterations, which will
* generally choke dangerous operations rather than letting
* them silently proceed without taking the broken ref into
* account.
*/
extern int ref_paranoia;
git: add --no-optional-locks option Some tools like IDEs or fancy editors may periodically run commands like "git status" in the background to keep track of the state of the repository. Some of these commands may refresh the index and write out the result in an opportunistic way: if they can get the index lock, then they update the on-disk index with any updates they find. And if not, then their in-core refresh is lost and just has to be recomputed by the next caller. But taking the index lock may conflict with other operations in the repository. Especially ones that the user is doing themselves, which _aren't_ opportunistic. In other words, "git status" knows how to back off when somebody else is holding the lock, but other commands don't know that status would be happy to drop the lock if somebody else wanted it. There are a couple possible solutions: 1. Have some kind of "pseudo-lock" that allows other commands to tell status that they want the lock. This is likely to be complicated and error-prone to implement (and maybe even impossible with just dotlocks to work from, as it requires some inter-process communication). 2. Avoid background runs of commands like "git status" that want to do opportunistic updates, preferring instead plumbing like diff-files, etc. This is awkward for a couple of reasons. One is that "status --porcelain" reports a lot more about the repository state than is available from individual plumbing commands. And two is that we actually _do_ want to see the refreshed index. We just don't want to take a lock or write out the result. Whereas commands like diff-files expect us to refresh the index separately and write it to disk so that they can depend on the result. But that write is exactly what we're trying to avoid. 3. Ask "status" not to lock or write the index. This is easy to implement. The big downside is that any work done in refreshing the index for such a call is lost when the process exits. So a background process may end up re-hashing a changed file multiple times until the user runs a command that does an index refresh themselves. This patch implements the option 3. The idea (and the test) is largely stolen from a Git for Windows patch by Johannes Schindelin, 67e5ce7f63 (status: offer *not* to lock the index and update it, 2016-08-12). The twist here is that instead of making this an option to "git status", it becomes a "git" option and matching environment variable. The reason there is two-fold: 1. An environment variable is carried through to sub-processes. And whether an invocation is a background process or not should apply to the whole process tree. So you could do "git --no-optional-locks foo", and if "foo" is a script or alias that calls "status", you'll still get the effect. 2. There may be other programs that want the same treatment. I've punted here on finding more callers to convert, since "status" is the obvious one to call as a repeated background job. But "git diff"'s opportunistic refresh of the index may be a good candidate. The test is taken from 67e5ce7f63, and it's worth repeating Johannes's explanation: Note that the regression test added in this commit does not *really* verify that no index.lock file was written; that test is not possible in a portable way. Instead, we verify that .git/index is rewritten *only* when `git status` is run without `--no-optional-locks`. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-27 08:54:30 +02:00
/*
* Returns the boolean value of $GIT_OPTIONAL_LOCKS (or the default value).
*/
int use_optional_locks(void);
/*
* The character that begins a commented line in user-editable file
* that is subject to stripspace.
*/
extern char comment_line_char;
extern int auto_comment_line_char;
enum log_refs_config {
LOG_REFS_UNSET = -1,
LOG_REFS_NONE = 0,
LOG_REFS_NORMAL,
LOG_REFS_ALWAYS
};
extern enum log_refs_config log_all_ref_updates;
enum rebase_setup_type {
AUTOREBASE_NEVER = 0,
AUTOREBASE_LOCAL,
AUTOREBASE_REMOTE,
AUTOREBASE_ALWAYS
};
enum push_default_type {
PUSH_DEFAULT_NOTHING = 0,
PUSH_DEFAULT_MATCHING,
PUSH_DEFAULT_SIMPLE,
PUSH_DEFAULT_UPSTREAM,
push: Provide situational hints for non-fast-forward errors Pushing a non-fast-forward update to a remote repository will result in an error, but the hint text doesn't provide the correct resolution in every case. Give better resolution advice in three push scenarios: 1) If you push your current branch and it triggers a non-fast-forward error, you should merge remote changes with 'git pull' before pushing again. 2) If you push to a shared repository others push to, and your local tracking branches are not kept up to date, the 'matching refs' default will generate non-fast-forward errors on outdated branches. If this is your workflow, the 'matching refs' default is not for you. Consider setting the 'push.default' configuration variable to 'current' or 'upstream' to ensure only your current branch is pushed. 3) If you explicitly specify a ref that is not your current branch or push matching branches with ':', you will generate a non-fast-forward error if any pushed branch tip is out of date. You should checkout the offending branch and merge remote changes before pushing again. Teach transport.c to recognize these scenarios and configure push.c to hint for them. If 'git push's default behavior changes or we discover more scenarios, extension is easy. Standardize on the advice API and add three new advice variables, 'pushNonFFCurrent', 'pushNonFFDefault', and 'pushNonFFMatching'. Setting any of these to 'false' will disable their affiliated advice. Setting 'pushNonFastForward' to false will disable all three, thus preserving the config option for users who already set it, but guaranteeing new users won't disable push advice accidentally. Based-on-patch-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Christopher Tiwald <christiwald@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-20 05:31:33 +01:00
PUSH_DEFAULT_CURRENT,
PUSH_DEFAULT_UNSPECIFIED
};
extern enum rebase_setup_type autorebase;
extern enum push_default_type push_default;
enum object_creation_mode {
OBJECT_CREATION_USES_HARDLINKS = 0,
OBJECT_CREATION_USES_RENAMES = 1
};
extern enum object_creation_mode object_creation_mode;
extern char *notes_ref_name;
extern int grafts_replace_parents;
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
/*
* GIT_REPO_VERSION is the version we write by default. The
* _READ variant is the highest number we know how to
* handle.
*/
#define GIT_REPO_VERSION 0
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
#define GIT_REPO_VERSION_READ 1
extern int repository_format_precious_objects;
extern char *repository_format_partial_clone;
extern const char *core_partial_clone_filter_default;
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
extern int repository_format_worktree_config;
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
/*
* You _have_ to initialize a `struct repository_format` using
* `= REPOSITORY_FORMAT_INIT` before calling `read_repository_format()`.
*/
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
struct repository_format {
int version;
int precious_objects;
char *partial_clone; /* value of extensions.partialclone */
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
int worktree_config;
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
int is_bare;
int hash_algo;
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
char *work_tree;
struct string_list unknown_extensions;
};
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
/*
* Always use this to initialize a `struct repository_format`
* to a well-defined, default state before calling
* `read_repository()`.
*/
#define REPOSITORY_FORMAT_INIT \
{ \
.version = -1, \
.is_bare = -1, \
.hash_algo = GIT_HASH_SHA1, \
.unknown_extensions = STRING_LIST_INIT_DUP, \
}
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
/*
* Read the repository format characteristics from the config file "path" into
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
* "format" struct. Returns the numeric version. On error, or if no version is
* found in the configuration, -1 is returned, format->version is set to -1,
* and all other fields in the struct are set to the default configuration
* (REPOSITORY_FORMAT_INIT). Always initialize the struct using
* REPOSITORY_FORMAT_INIT before calling this function.
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
*/
int read_repository_format(struct repository_format *format, const char *path);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
/*
* Free the memory held onto by `format`, but not the struct itself.
* (No need to use this after `read_repository_format()` fails.)
*/
void clear_repository_format(struct repository_format *format);
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
/*
* Verify that the repository described by repository_format is something we
* can read. If it is, return 0. Otherwise, return -1, and "err" will describe
* any errors encountered.
*/
int verify_repository_format(const struct repository_format *format,
struct strbuf *err);
/*
* Check the repository format version in the path found in get_git_dir(),
* and die if it is a version we don't understand. Generally one would
* set_git_dir() before calling this, and use it only for "are we in a valid
* repo?".
*/
void check_repository_format(void);
#define MTIME_CHANGED 0x0001
#define CTIME_CHANGED 0x0002
#define OWNER_CHANGED 0x0004
#define MODE_CHANGED 0x0008
#define INODE_CHANGED 0x0010
#define DATA_CHANGED 0x0020
#define TYPE_CHANGED 0x0040
/*
* Return an abbreviated sha1 unique within this repository's object database.
* The result will be at least `len` characters long, and will be NUL
* terminated.
*
* The non-`_r` version returns a static buffer which remains valid until 4
* more calls to find_unique_abbrev are made.
*
* The `_r` variant writes to a buffer supplied by the caller, which must be at
* least `GIT_MAX_HEXSZ + 1` bytes. The return value is the number of bytes
* written (excluding the NUL terminator).
*
* Note that while this version avoids the static buffer, it is not fully
* reentrant, as it calls into other non-reentrant git code.
*/
const char *repo_find_unique_abbrev(struct repository *r, const struct object_id *oid, int len);
#define find_unique_abbrev(oid, len) repo_find_unique_abbrev(the_repository, oid, len)
int repo_find_unique_abbrev_r(struct repository *r, char *hex, const struct object_id *oid, int len);
#define find_unique_abbrev_r(hex, oid, len) repo_find_unique_abbrev_r(the_repository, hex, oid, len)
extern const unsigned char null_sha1[GIT_MAX_RAWSZ];
extern const struct object_id null_oid;
static inline int hashcmp(const unsigned char *sha1, const unsigned char *sha2)
{
hashcmp: assert constant hash size Prior to 509f6f62a4 (cache: update object ID functions for the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a constant size of 20 bytes. Some compilers were able to turn that into a few quad-word comparisons, which is faster than actually calling memcmp(). In 509f6f62a4, we started using the_hash_algo->rawsz instead. Even though this will always be 20, the compiler doesn't know that while inlining hashcmp() and ends up just generating a call to memcmp(). Eventually we'll have to deal with multiple hash sizes, but for the upcoming v2.19, we can restore some of the original performance by asserting on the size. That gives the compiler enough information to know that the memcmp will always be called with a length of 20, and it performs the same optimization. Here are numbers for p0001.2 run against linux.git on a few versions. This is using -O2 with gcc 8.2.0. Test v2.18.0 v2.19.0-rc0 HEAD ------------------------------------------------------------------------------ 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0% You can see that v2.19 is a little slower than v2.18. This commit ended up slightly faster than v2.18, but there's a fair bit of run-to-run noise (the generated code in the two cases is basically the same). This patch does seem to be consistently 1-2% faster than v2.19. I tried changing hashcpy(), which was also touched by 509f6f62a4, in the same way, but couldn't measure any speedup. Which makes sense, at least for this workload. A traversal of the whole commit graph requires looking up every entry of every tree via lookup_object(). That's many multiples of the numbers of objects in the repository (most of the lookups just return "yes, we already saw that object"). [jn: verified using "make object.s" that the memcmp call goes away.] Reported-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff King <peff@peff.net Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 07:02:25 +02:00
/*
* Teach the compiler that there are only two possibilities of hash size
* here, so that it can optimize for this case as much as possible.
hashcmp: assert constant hash size Prior to 509f6f62a4 (cache: update object ID functions for the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a constant size of 20 bytes. Some compilers were able to turn that into a few quad-word comparisons, which is faster than actually calling memcmp(). In 509f6f62a4, we started using the_hash_algo->rawsz instead. Even though this will always be 20, the compiler doesn't know that while inlining hashcmp() and ends up just generating a call to memcmp(). Eventually we'll have to deal with multiple hash sizes, but for the upcoming v2.19, we can restore some of the original performance by asserting on the size. That gives the compiler enough information to know that the memcmp will always be called with a length of 20, and it performs the same optimization. Here are numbers for p0001.2 run against linux.git on a few versions. This is using -O2 with gcc 8.2.0. Test v2.18.0 v2.19.0-rc0 HEAD ------------------------------------------------------------------------------ 0001.2: 34.24(33.81+0.43) 34.83(34.42+0.40) +1.7% 33.90(33.47+0.42) -1.0% You can see that v2.19 is a little slower than v2.18. This commit ended up slightly faster than v2.18, but there's a fair bit of run-to-run noise (the generated code in the two cases is basically the same). This patch does seem to be consistently 1-2% faster than v2.19. I tried changing hashcpy(), which was also touched by 509f6f62a4, in the same way, but couldn't measure any speedup. Which makes sense, at least for this workload. A traversal of the whole commit graph requires looking up every entry of every tree via lookup_object(). That's many multiples of the numbers of objects in the repository (most of the lookups just return "yes, we already saw that object"). [jn: verified using "make object.s" that the memcmp call goes away.] Reported-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Jeff King <peff@peff.net Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 07:02:25 +02:00
*/
if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
return memcmp(sha1, sha2, GIT_MAX_RAWSZ);
return memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
}
static inline int oidcmp(const struct object_id *oid1, const struct object_id *oid2)
{
return hashcmp(oid1->hash, oid2->hash);
}
introduce hasheq() and oideq() The main comparison functions we provide for comparing object ids are hashcmp() and oidcmp(). These are more flexible than a strict equality check, since they also express ordering. That makes them useful for sorting and binary searching. However, it also makes them potentially slower than a strict equality check. Consider this C code, which is traditionally what our hashcmp has looked like: #include <string.h> int hashcmp(const unsigned char *a, const unsigned char *b) { return memcmp(a, b, 20); } Compiling with "gcc -O2 -S -fverbose-asm", the generated assembly shows that we actually call memcmp(). But if we change this to a strict equality check: return !memcmp(a, b, 20); we get a faster inline version: movq (%rdi), %rax # MEM[(void *)a_4(D)], MEM[(void *)a_4(D)] movq 8(%rdi), %rdx # MEM[(void *)a_4(D)], tmp101 xorq (%rsi), %rax # MEM[(void *)b_5(D)], tmp94 xorq 8(%rsi), %rdx # MEM[(void *)b_5(D)], tmp93 orq %rax, %rdx # tmp94, tmp93 jne .L2 #, movl 16(%rsi), %eax # MEM[(void *)b_5(D)], tmp104 cmpl %eax, 16(%rdi) # tmp104, MEM[(void *)a_4(D)] je .L5 #, Obviously our hashcmp() doesn't include the "!". But because it's an inline function, optimizing compilers are able to see "!hashcmp(a,b)" in calling code and take advantage of this case. So there has been no value thus far in introducing a more restricted interface for doing strict equality checks. But as Git learns about more values for the_hash_algo, our hashcmp() will grow more complicated and may even delegate at runtime to functions optimized specifically for that hash size. That breaks the inline connection we have, and the compiler will have to assume that the caller really cares about the sign and magnitude of the memcmp() result, even though the vast majority don't. We can solve that by introducing a hasheq() function (and matching oideq() wrapper), which callers can use to make it clear that they only care about equality. For now, the implementation will literally be "!hashcmp()", but it frees us up later to introduce code optimized specifically for the equality check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-28 23:22:35 +02:00
static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2)
{
/*
* We write this here instead of deferring to hashcmp so that the
* compiler can properly inline it and avoid calling memcmp.
*/
if (the_hash_algo->rawsz == GIT_MAX_RAWSZ)
return !memcmp(sha1, sha2, GIT_MAX_RAWSZ);
return !memcmp(sha1, sha2, GIT_SHA1_RAWSZ);
introduce hasheq() and oideq() The main comparison functions we provide for comparing object ids are hashcmp() and oidcmp(). These are more flexible than a strict equality check, since they also express ordering. That makes them useful for sorting and binary searching. However, it also makes them potentially slower than a strict equality check. Consider this C code, which is traditionally what our hashcmp has looked like: #include <string.h> int hashcmp(const unsigned char *a, const unsigned char *b) { return memcmp(a, b, 20); } Compiling with "gcc -O2 -S -fverbose-asm", the generated assembly shows that we actually call memcmp(). But if we change this to a strict equality check: return !memcmp(a, b, 20); we get a faster inline version: movq (%rdi), %rax # MEM[(void *)a_4(D)], MEM[(void *)a_4(D)] movq 8(%rdi), %rdx # MEM[(void *)a_4(D)], tmp101 xorq (%rsi), %rax # MEM[(void *)b_5(D)], tmp94 xorq 8(%rsi), %rdx # MEM[(void *)b_5(D)], tmp93 orq %rax, %rdx # tmp94, tmp93 jne .L2 #, movl 16(%rsi), %eax # MEM[(void *)b_5(D)], tmp104 cmpl %eax, 16(%rdi) # tmp104, MEM[(void *)a_4(D)] je .L5 #, Obviously our hashcmp() doesn't include the "!". But because it's an inline function, optimizing compilers are able to see "!hashcmp(a,b)" in calling code and take advantage of this case. So there has been no value thus far in introducing a more restricted interface for doing strict equality checks. But as Git learns about more values for the_hash_algo, our hashcmp() will grow more complicated and may even delegate at runtime to functions optimized specifically for that hash size. That breaks the inline connection we have, and the compiler will have to assume that the caller really cares about the sign and magnitude of the memcmp() result, even though the vast majority don't. We can solve that by introducing a hasheq() function (and matching oideq() wrapper), which callers can use to make it clear that they only care about equality. For now, the implementation will literally be "!hashcmp()", but it frees us up later to introduce code optimized specifically for the equality check. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-28 23:22:35 +02:00
}
static inline int oideq(const struct object_id *oid1, const struct object_id *oid2)
{
return hasheq(oid1->hash, oid2->hash);
}
static inline int is_null_sha1(const unsigned char *sha1)
{
return hasheq(sha1, null_sha1);
}
static inline int is_null_oid(const struct object_id *oid)
{
return hasheq(oid->hash, null_sha1);
}
static inline void hashcpy(unsigned char *sha_dst, const unsigned char *sha_src)
{
memcpy(sha_dst, sha_src, the_hash_algo->rawsz);
}
static inline void oidcpy(struct object_id *dst, const struct object_id *src)
{
memcpy(dst->hash, src->hash, GIT_MAX_RAWSZ);
}
static inline struct object_id *oiddup(const struct object_id *src)
{
struct object_id *dst = xmalloc(sizeof(struct object_id));
oidcpy(dst, src);
return dst;
}
static inline void hashclr(unsigned char *hash)
{
memset(hash, 0, the_hash_algo->rawsz);
}
static inline void oidclr(struct object_id *oid)
{
memset(oid->hash, 0, GIT_MAX_RAWSZ);
}
static inline void oidread(struct object_id *oid, const unsigned char *hash)
{
memcpy(oid->hash, hash, the_hash_algo->rawsz);
}
static inline int is_empty_blob_sha1(const unsigned char *sha1)
{
return hasheq(sha1, the_hash_algo->empty_blob->hash);
}
static inline int is_empty_blob_oid(const struct object_id *oid)
{
return oideq(oid, the_hash_algo->empty_blob);
}
static inline int is_empty_tree_sha1(const unsigned char *sha1)
{
return hasheq(sha1, the_hash_algo->empty_tree->hash);
}
static inline int is_empty_tree_oid(const struct object_id *oid)
{
return oideq(oid, the_hash_algo->empty_tree);
}
const char *empty_tree_oid_hex(void);
const char *empty_blob_oid_hex(void);
/* set default permissions by passing mode arguments to open(2) */
int git_mkstemps_mode(char *pattern, int suffix_len, int mode);
int git_mkstemp_mode(char *pattern, int mode);
/*
* NOTE NOTE NOTE!!
*
* PERM_UMASK, OLD_PERM_GROUP and OLD_PERM_EVERYBODY enumerations must
* not be changed. Old repositories have core.sharedrepository written in
* numeric format, and therefore these values are preserved for compatibility
* reasons.
*/
enum sharedrepo {
PERM_UMASK = 0,
OLD_PERM_GROUP = 1,
OLD_PERM_EVERYBODY = 2,
PERM_GROUP = 0660,
PERM_EVERYBODY = 0664
};
int git_config_perm(const char *var, const char *value);
int adjust_shared_perm(const char *path);
/*
* Create the directory containing the named path, using care to be
* somewhat safe against races. Return one of the scld_error values to
* indicate success/failure. On error, set errno to describe the
* problem.
*
* SCLD_VANISHED indicates that one of the ancestor directories of the
* path existed at one point during the function call and then
* suddenly vanished, probably because another process pruned the
* directory while we were working. To be robust against this kind of
* race, callers might want to try invoking the function again when it
* returns SCLD_VANISHED.
*
* safe_create_leading_directories() temporarily changes path while it
* is working but restores it before returning.
* safe_create_leading_directories_const() doesn't modify path, even
* temporarily.
*/
enum scld_error {
SCLD_OK = 0,
SCLD_FAILED = -1,
SCLD_PERMS = -2,
SCLD_EXISTS = -3,
SCLD_VANISHED = -4
};
enum scld_error safe_create_leading_directories(char *path);
enum scld_error safe_create_leading_directories_const(const char *path);
/*
* Callback function for raceproof_create_file(). This function is
* expected to do something that makes dirname(path) permanent despite
* the fact that other processes might be cleaning up empty
* directories at the same time. Usually it will create a file named
* path, but alternatively it could create another file in that
* directory, or even chdir() into that directory. The function should
* return 0 if the action was completed successfully. On error, it
* should return a nonzero result and set errno.
* raceproof_create_file() treats two errno values specially:
*
* - ENOENT -- dirname(path) does not exist. In this case,
* raceproof_create_file() tries creating dirname(path)
* (and any parent directories, if necessary) and calls
* the function again.
*
* - EISDIR -- the file already exists and is a directory. In this
* case, raceproof_create_file() removes the directory if
* it is empty (and recursively any empty directories that
* it contains) and calls the function again.
*
* Any other errno causes raceproof_create_file() to fail with the
* callback's return value and errno.
*
* Obviously, this function should be OK with being called again if it
* fails with ENOENT or EISDIR. In other scenarios it will not be
* called again.
*/
typedef int create_file_fn(const char *path, void *cb);
/*
* Create a file in dirname(path) by calling fn, creating leading
* directories if necessary. Retry a few times in case we are racing
* with another process that is trying to clean up the directory that
* contains path. See the documentation for create_file_fn for more
* details.
*
* Return the value and set the errno that resulted from the most
* recent call of fn. fn is always called at least once, and will be
* called more than once if it returns ENOENT or EISDIR.
*/
int raceproof_create_file(const char *path, create_file_fn fn, void *cb);
rerere: make sure it works even in a workdir attached to a young repository The git-new-workdir script in contrib/ makes a new work tree by sharing many subdirectories of the .git directory with the original repository. When rerere.enabled is set in the original repository, but the user has not encountered any conflicts yet, the original repository may not yet have .git/rr-cache directory. When rerere wants to run in a new work tree created from such a young original repository, it fails to mkdir(2) .git/rr-cache that is a symlink to a yet-to-be-created directory. There are three possible approaches to this: - A naive solution is not to create a symlink in the git-new-workdir script to a directory the original does not have (yet). This is not a solution, as we tend to lazily create subdirectories of .git/, and having rerere.enabled configuration set is a strong indication that the user _wants_ to have this lazy creation to happen; - We could always create .git/rr-cache upon repository creation. This is tempting but will not help people with existing repositories. - Detect this case by seeing that mkdir(2) failed with EEXIST, checking that the path is a symlink, and try running mkdir(2) on the link target. This patch solves the issue by doing the third one. Strictly speaking, this is incomplete. It does not attempt to handle relative symbolic link that points into the original repository, but this is good enough to help people who use contrib/workdir/git-new-workdir script. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-11 01:02:50 +01:00
int mkdir_in_gitdir(const char *path);
char *expand_user_path(const char *path, int real_home);
const char *enter_repo(const char *path, int strict);
static inline int is_absolute_path(const char *path)
{
return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
}
int is_directory(const char *);
char *strbuf_realpath(struct strbuf *resolved, const char *path,
int die_on_error);
const char *real_path(const char *path);
const char *real_path_if_valid(const char *path);
char *real_pathdup(const char *path, int die_on_error);
const char *absolute_path(const char *path);
char *absolute_pathdup(const char *path);
const char *remove_leading_path(const char *in, const char *prefix);
path.c: refactor relative_path(), not only strip prefix Original design of relative_path() is simple, just strip the prefix (*base) from the absolute path (*abs). In most cases, we need a real relative path, such as: ../foo, ../../bar. That's why there is another reimplementation (path_relative()) in quote.c. Borrow some codes from path_relative() in quote.c to refactor relative_path() in path.c, so that it could return real relative path, and user can reuse this function without reimplementing his/her own. The function path_relative() in quote.c will be substituted, and I would use the new relative_path() function when implementing the interactive git-clean later. Different results for relative_path() before and after this refactor: abs path base path relative (original) relative (refactor) ======== ========= =================== =================== /a/b /a/b . ./ /a/b/ /a/b . ./ /a /a/b/ /a ../ / /a/b/ / ../../ /a/c /a/b/ /a/c ../c /x/y /a/b/ /x/y ../../x/y a/b/ a/b/ . ./ a/b/ a/b . ./ a a/b a ../ x/y a/b/ x/y ../../x/y a/c a/b a/c ../c (empty) (null) (empty) ./ (empty) (empty) (empty) ./ (empty) /a/b (empty) ./ (null) (null) (null) ./ (null) (empty) (null) ./ (null) /a/b (segfault) ./ You may notice that return value "." has been changed to "./". It is because: * Function quote_path_relative() in quote.c will show the relative path as "./" if abs(in) and base(prefix) are the same. * Function relative_path() is called only once (in setup.c), and it will be OK for the return value as "./" instead of ".". Signed-off-by: Jiang Xin <worldhello.net@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-25 17:53:43 +02:00
const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
int normalize_path_copy_len(char *dst, const char *src, int *prefix_len);
int normalize_path_copy(char *dst, const char *src);
int longest_ancestor_length(const char *path, struct string_list *prefixes);
char *strip_path_suffix(const char *path, const char *suffix);
int daemon_avoid_alias(const char *path);
is_ntfs_dotgit: match other .git files When we started to catch NTFS short names that clash with .git, we only looked for GIT~1. This is sufficient because we only ever clone into an empty directory, so .git is guaranteed to be the first subdirectory or file in that directory. However, even with a fresh clone, .gitmodules is *not* necessarily the first file to be written that would want the NTFS short name GITMOD~1: a malicious repository can add .gitmodul0000 and friends, which sorts before `.gitmodules` and is therefore checked out *first*. For that reason, we have to test not only for ~1 short names, but for others, too. It's hard to just adapt the existing checks in is_ntfs_dotgit(): since Windows 2000 (i.e., in all Windows versions still supported by Git), NTFS short names are only generated in the <prefix>~<number> form up to number 4. After that, a *different* prefix is used, calculated from the long file name using an undocumented, but stable algorithm. For example, the short name of .gitmodules would be GITMOD~1, but if it is taken, and all of ~2, ~3 and ~4 are taken, too, the short name GI7EBA~1 will be used. From there, collisions are handled by incrementing the number, shortening the prefix as needed (until ~9999999 is reached, in which case NTFS will not allow the file to be created). We'd also want to handle .gitignore and .gitattributes, which suffer from a similar problem, using the fall-back short names GI250A~1 and GI7D29~1, respectively. To accommodate for that, we could reimplement the hashing algorithm, but it is just safer and simpler to provide the known prefixes. This algorithm has been reverse-engineered and described at https://usn.pw/blog/gen/2015/06/09/filenames/, which is defunct but still available via https://web.archive.org/. These can be recomputed by running the following Perl script: -- snip -- use warnings; use strict; sub compute_short_name_hash ($) { my $checksum = 0; foreach (split('', $_[0])) { $checksum = ($checksum * 0x25 + ord($_)) & 0xffff; } $checksum = ($checksum * 314159269) & 0xffffffff; $checksum = 1 + (~$checksum & 0x7fffffff) if ($checksum & 0x80000000); $checksum -= (($checksum * 1152921497) >> 60) * 1000000007; return scalar reverse sprintf("%x", $checksum & 0xffff); } print compute_short_name_hash($ARGV[0]); -- snap -- E.g., running that with the argument ".gitignore" will result in "250a" (which then becomes "gi250a" in the code). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff King <peff@peff.net>
2018-05-11 16:03:54 +02:00
/*
* These functions match their is_hfs_dotgit() counterparts; see utf8.h for
* details.
*/
int is_ntfs_dotgit(const char *name);
int is_ntfs_dotgitmodules(const char *name);
int is_ntfs_dotgitignore(const char *name);
int is_ntfs_dotgitattributes(const char *name);
/*
* Returns true iff "str" could be confused as a command-line option when
* passed to a sub-program like "ssh". Note that this has nothing to do with
* shell-quoting, which should be handled separately; we're assuming here that
* the string makes it verbatim to the sub-program.
*/
int looks_like_command_line_option(const char *str);
2015-04-21 06:06:27 +02:00
/**
* Return a newly allocated string with the evaluation of
* "$XDG_CONFIG_HOME/git/$filename" if $XDG_CONFIG_HOME is non-empty, otherwise
* "$HOME/.config/git/$filename". Return NULL upon error.
*/
char *xdg_config_home(const char *filename);
2015-04-21 06:06:27 +02:00
/**
* Return a newly allocated string with the evaluation of
* "$XDG_CACHE_HOME/git/$filename" if $XDG_CACHE_HOME is non-empty, otherwise
* "$HOME/.cache/git/$filename". Return NULL upon error.
*/
char *xdg_cache_home(const char *filename);
int git_open_cloexec(const char *name, int flags);
#define git_open(name) git_open_cloexec(name, O_RDONLY)
int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
int parse_loose_header(const char *hdr, unsigned long *sizep);
int check_object_signature(const struct object_id *oid, void *buf, unsigned long size, const char *type);
int finalize_object_file(const char *tmpfile, const char *filename);
/* Helper to check and "touch" a file */
int check_and_freshen_file(const char *fn, int freshen);
extern const signed char hexval_table[256];
static inline unsigned int hexval(unsigned char c)
{
return hexval_table[c];
}
/*
* Convert two consecutive hexadecimal digits into a char. Return a
* negative value on error. Don't run over the end of short strings.
*/
static inline int hex2chr(const char *s)
{
unsigned int val = hexval(s[0]);
return (val & ~0xf) ? val : (val << 4) | hexval(s[1]);
}
/* Convert to/from hex/sha1 representation */
#define MINIMUM_ABBREV minimum_abbrev
#define DEFAULT_ABBREV default_abbrev
/* used when the code does not know or care what the default abbrev is */
#define FALLBACK_DEFAULT_ABBREV 7
struct object_context {
unsigned short mode;
/*
* symlink_path is only used by get_tree_entry_follow_symlinks,
* and only for symlinks that point outside the repository.
*/
struct strbuf symlink_path;
/*
* If GET_OID_RECORD_PATH is set, this will record path (if any)
* found when resolving the name. The caller is responsible for
* releasing the memory.
*/
char *path;
};
#define GET_OID_QUIETLY 01
#define GET_OID_COMMIT 02
#define GET_OID_COMMITTISH 04
#define GET_OID_TREE 010
#define GET_OID_TREEISH 020
#define GET_OID_BLOB 040
#define GET_OID_FOLLOW_SYMLINKS 0100
#define GET_OID_RECORD_PATH 0200
#define GET_OID_ONLY_TO_DIE 04000
#define GET_OID_DISAMBIGUATORS \
(GET_OID_COMMIT | GET_OID_COMMITTISH | \
GET_OID_TREE | GET_OID_TREEISH | \
GET_OID_BLOB)
enum get_oid_result {
FOUND = 0,
MISSING_OBJECT = -1, /* The requested object is missing */
SHORT_NAME_AMBIGUOUS = -2,
/* The following only apply when symlinks are followed */
DANGLING_SYMLINK = -4, /*
* The initial symlink is there, but
* (transitively) points to a missing
* in-tree file
*/
SYMLINK_LOOP = -5,
NOT_DIR = -6, /*
* Somewhere along the symlink chain, a path is
* requested which contains a file as a
* non-final element.
*/
};
int repo_get_oid(struct repository *r, const char *str, struct object_id *oid);
int get_oidf(struct object_id *oid, const char *fmt, ...);
int repo_get_oid_commit(struct repository *r, const char *str, struct object_id *oid);
int repo_get_oid_committish(struct repository *r, const char *str, struct object_id *oid);
int repo_get_oid_tree(struct repository *r, const char *str, struct object_id *oid);
int repo_get_oid_treeish(struct repository *r, const char *str, struct object_id *oid);
int repo_get_oid_blob(struct repository *r, const char *str, struct object_id *oid);
int repo_get_oid_mb(struct repository *r, const char *str, struct object_id *oid);
void maybe_die_on_misspelt_object_name(struct repository *repo,
const char *name,
const char *prefix);
enum get_oid_result get_oid_with_context(struct repository *repo, const char *str,
unsigned flags, struct object_id *oid,
struct object_context *oc);
#define get_oid(str, oid) repo_get_oid(the_repository, str, oid)
#define get_oid_commit(str, oid) repo_get_oid_commit(the_repository, str, oid)
#define get_oid_committish(str, oid) repo_get_oid_committish(the_repository, str, oid)
#define get_oid_tree(str, oid) repo_get_oid_tree(the_repository, str, oid)
#define get_oid_treeish(str, oid) repo_get_oid_treeish(the_repository, str, oid)
#define get_oid_blob(str, oid) repo_get_oid_blob(the_repository, str, oid)
#define get_oid_mb(str, oid) repo_get_oid_mb(the_repository, str, oid)
typedef int each_abbrev_fn(const struct object_id *oid, void *);
int repo_for_each_abbrev(struct repository *r, const char *prefix, each_abbrev_fn, void *);
#define for_each_abbrev(prefix, fn, data) repo_for_each_abbrev(the_repository, prefix, fn, data)
int set_disambiguate_hint_config(const char *var, const char *value);
/*
* Try to read a SHA1 in hexadecimal format from the 40 characters
* starting at hex. Write the 20-byte result to sha1 in binary form.
* Return 0 on success. Reading stops if a NUL is encountered in the
* input, so it is safe to pass this function an arbitrary
* null-terminated string.
*/
int get_sha1_hex(const char *hex, unsigned char *sha1);
int get_oid_hex(const char *hex, struct object_id *sha1);
/*
* Read `len` pairs of hexadecimal digits from `hex` and write the
* values to `binary` as `len` bytes. Return 0 on success, or -1 if
* the input does not consist of hex digits).
*/
int hex_to_bytes(unsigned char *binary, const char *hex, size_t len);
/*
* Convert a binary hash to its hex equivalent. The `_r` variant is reentrant,
* and writes the NUL-terminated output to the buffer `out`, which must be at
* least `GIT_MAX_HEXSZ + 1` bytes, and returns a pointer to out for
* convenience.
*
* The non-`_r` variant returns a static buffer, but uses a ring of 4
* buffers, making it safe to make multiple calls for a single statement, like:
*
* printf("%s -> %s", sha1_to_hex(one), sha1_to_hex(two));
*/
char *hash_to_hex_algop_r(char *buffer, const unsigned char *hash, const struct git_hash_algo *);
char *sha1_to_hex_r(char *out, const unsigned char *sha1);
char *oid_to_hex_r(char *out, const struct object_id *oid);
char *hash_to_hex_algop(const unsigned char *hash, const struct git_hash_algo *); /* static buffer result! */
char *sha1_to_hex(const unsigned char *sha1); /* same static buffer */
char *hash_to_hex(const unsigned char *hash); /* same static buffer */
char *oid_to_hex(const struct object_id *oid); /* same static buffer */
/*
* Parse a 40-character hexadecimal object ID starting from hex, updating the
* pointer specified by end when parsing stops. The resulting object ID is
* stored in oid. Returns 0 on success. Parsing will stop on the first NUL or
* other invalid character. end is only updated on success; otherwise, it is
* unmodified.
*/
int parse_oid_hex(const char *hex, struct object_id *oid, const char **end);
/*
* This reads short-hand syntax that not only evaluates to a commit
* object name, but also can act as if the end user spelled the name
* of the branch from the command line.
*
* - "@{-N}" finds the name of the Nth previous branch we were on, and
* places the name of the branch in the given buf and returns the
* number of characters parsed if successful.
*
* - "<branch>@{upstream}" finds the name of the other ref that
* <branch> is configured to merge with (missing <branch> defaults
* to the current branch), and places the name of the branch in the
* given buf and returns the number of characters parsed if
* successful.
*
* If the input is not of the accepted format, it returns a negative
* number to signal an error.
*
* If the input was ok but there are not N branch switches in the
* reflog, it returns 0.
interpret_branch_name: allow callers to restrict expansions The interpret_branch_name() function converts names like @{-1} and @{upstream} into branch names. The expanded ref names are not fully qualified, and may be outside of the refs/heads/ namespace (e.g., "@" expands to "HEAD", and "@{upstream}" is likely to be in "refs/remotes/"). This is OK for callers like dwim_ref() which are primarily interested in resolving the resulting name, no matter where it is. But callers like "git branch" treat the result as a branch name in refs/heads/. When we expand to a ref outside that namespace, the results are very confusing (e.g., "git branch @" tries to create refs/heads/HEAD, which is nonsense). Callers can't know from the returned string how the expansion happened (e.g., did the user really ask for a branch named "HEAD", or did we do a bogus expansion?). One fix would be to return some out-parameters describing the types of expansion that occurred. This has the benefit that the caller can generate precise error messages ("I understood @{upstream} to mean origin/master, but that is a remote tracking branch, so you cannot create it as a local name"). However, out-parameters make the function interface somewhat cumbersome. Instead, let's do the opposite: let the caller tell us which elements to expand. That's easier to pass in, and none of the callers give more precise error messages than "@{upstream} isn't a valid branch name" anyway (which should be sufficient). The strbuf_branchname() function needs a similar parameter, as most of the callers access interpret_branch_name() through it. We can break the callers down into two groups: 1. Callers that are happy with any kind of ref in the result. We pass "0" here, so they continue to work without restrictions. This includes merge_name(), the reflog handling in add_pending_object_with_path(), and substitute_branch_name(). This last is what powers dwim_ref(). 2. Callers that have funny corner cases (mostly in git-branch and git-checkout). These need to make use of the new parameter, but I've left them as "0" in this patch, and will address them individually in follow-on patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 09:23:01 +01:00
*
* If "allowed" is non-zero, it is a treated as a bitfield of allowable
* expansions: local branches ("refs/heads/"), remote branches
* ("refs/remotes/"), or "HEAD". If no "allowed" bits are set, any expansion is
* allowed, even ones to refs outside of those namespaces.
*/
interpret_branch_name: allow callers to restrict expansions The interpret_branch_name() function converts names like @{-1} and @{upstream} into branch names. The expanded ref names are not fully qualified, and may be outside of the refs/heads/ namespace (e.g., "@" expands to "HEAD", and "@{upstream}" is likely to be in "refs/remotes/"). This is OK for callers like dwim_ref() which are primarily interested in resolving the resulting name, no matter where it is. But callers like "git branch" treat the result as a branch name in refs/heads/. When we expand to a ref outside that namespace, the results are very confusing (e.g., "git branch @" tries to create refs/heads/HEAD, which is nonsense). Callers can't know from the returned string how the expansion happened (e.g., did the user really ask for a branch named "HEAD", or did we do a bogus expansion?). One fix would be to return some out-parameters describing the types of expansion that occurred. This has the benefit that the caller can generate precise error messages ("I understood @{upstream} to mean origin/master, but that is a remote tracking branch, so you cannot create it as a local name"). However, out-parameters make the function interface somewhat cumbersome. Instead, let's do the opposite: let the caller tell us which elements to expand. That's easier to pass in, and none of the callers give more precise error messages than "@{upstream} isn't a valid branch name" anyway (which should be sufficient). The strbuf_branchname() function needs a similar parameter, as most of the callers access interpret_branch_name() through it. We can break the callers down into two groups: 1. Callers that are happy with any kind of ref in the result. We pass "0" here, so they continue to work without restrictions. This includes merge_name(), the reflog handling in add_pending_object_with_path(), and substitute_branch_name(). This last is what powers dwim_ref(). 2. Callers that have funny corner cases (mostly in git-branch and git-checkout). These need to make use of the new parameter, but I've left them as "0" in this patch, and will address them individually in follow-on patches. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-02 09:23:01 +01:00
#define INTERPRET_BRANCH_LOCAL (1<<0)
#define INTERPRET_BRANCH_REMOTE (1<<1)
#define INTERPRET_BRANCH_HEAD (1<<2)
int repo_interpret_branch_name(struct repository *r,
const char *str, int len,
struct strbuf *buf,
unsigned allowed);
#define interpret_branch_name(str, len, buf, allowed) \
repo_interpret_branch_name(the_repository, str, len, buf, allowed)
int validate_headref(const char *ref);
int base_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
int df_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
int name_compare(const char *name1, size_t len1, const char *name2, size_t len2);
int cache_name_stage_compare(const char *name1, int len1, int stage1, const char *name2, int len2, int stage2);
void *read_object_with_reference(const struct object_id *oid,
const char *required_type,
unsigned long *size,
struct object_id *oid_ret);
struct object *repo_peel_to_type(struct repository *r,
const char *name, int namelen,
struct object *o, enum object_type);
#define peel_to_type(name, namelen, obj, type) \
repo_peel_to_type(the_repository, name, namelen, obj, type)
enum date_mode_type {
DATE_NORMAL = 0,
DATE_HUMAN,
DATE_RELATIVE,
DATE_SHORT,
DATE_ISO8601,
DATE_ISO8601_STRICT,
DATE_RFC2822,
DATE_STRFTIME,
DATE_RAW,
DATE_UNIX
};
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
struct date_mode {
enum date_mode_type type;
const char *strftime_fmt;
int local;
};
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
/*
* Convenience helper for passing a constant type, like:
*
* show_date(t, tz, DATE_MODE(NORMAL));
*/
#define DATE_MODE(t) date_mode_from_type(DATE_##t)
struct date_mode *date_mode_from_type(enum date_mode_type type);
const char *show_date(timestamp_t time, int timezone, const struct date_mode *mode);
void show_date_relative(timestamp_t time, const struct timeval *now,
struct strbuf *timebuf);
void show_date_human(timestamp_t time, int tz, const struct timeval *now,
struct strbuf *timebuf);
int parse_date(const char *date, struct strbuf *out);
int parse_date_basic(const char *date, timestamp_t *timestamp, int *offset);
int parse_expiry_date(const char *date, timestamp_t *timestamp);
void datestamp(struct strbuf *out);
#define approxidate(s) approxidate_careful((s), NULL)
timestamp_t approxidate_careful(const char *, int *);
timestamp_t approxidate_relative(const char *date, const struct timeval *now);
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
void parse_date_format(const char *format, struct date_mode *mode);
int date_overflows(timestamp_t date);
#define IDENT_STRICT 1
#define IDENT_NO_DATE 2
#define IDENT_NO_NAME 4
enum want_ident {
WANT_BLANK_IDENT,
WANT_AUTHOR_IDENT,
WANT_COMMITTER_IDENT
};
const char *git_author_info(int);
const char *git_committer_info(int);
const char *fmt_ident(const char *name, const char *email,
enum want_ident whose_ident,
const char *date_str, int);
const char *fmt_name(enum want_ident);
const char *ident_default_name(void);
const char *ident_default_email(void);
const char *git_editor(void);
const char *git_sequence_editor(void);
const char *git_pager(int stdout_is_tty);
int is_terminal_dumb(void);
int git_ident_config(const char *, const char *, void *);
/*
* Prepare an ident to fall back on if the user didn't configure it.
*/
void prepare_fallback_ident(const char *name, const char *email);
void reset_ident_date(void);
struct ident_split {
const char *name_begin;
const char *name_end;
const char *mail_begin;
const char *mail_end;
const char *date_begin;
const char *date_end;
const char *tz_begin;
const char *tz_end;
};
/*
* Signals an success with 0, but time part of the result may be NULL
* if the input lacks timestamp and zone
*/
int split_ident_line(struct ident_split *, const char *, int);
/*
* Like show_date, but pull the timestamp and tz parameters from
* the ident_split. It will also sanity-check the values and produce
* a well-known sentinel date if they appear bogus.
*/
convert "enum date_mode" into a struct In preparation for adding date modes that may carry extra information beyond the mode itself, this patch converts the date_mode enum into a struct. Most of the conversion is fairly straightforward; we pass the struct as a pointer and dereference the type field where necessary. Locations that declare a date_mode can use a "{}" constructor. However, the tricky case is where we use the enum labels as constants, like: show_date(t, tz, DATE_NORMAL); Ideally we could say: show_date(t, tz, &{ DATE_NORMAL }); but of course C does not allow that. Likewise, we cannot cast the constant to a struct, because we need to pass an actual address. Our options are basically: 1. Manually add a "struct date_mode d = { DATE_NORMAL }" definition to each caller, and pass "&d". This makes the callers uglier, because they sometimes do not even have their own scope (e.g., they are inside a switch statement). 2. Provide a pre-made global "date_normal" struct that can be passed by address. We'd also need "date_rfc2822", "date_iso8601", and so forth. But at least the ugliness is defined in one place. 3. Provide a wrapper that generates the correct struct on the fly. The big downside is that we end up pointing to a single global, which makes our wrapper non-reentrant. But show_date is already not reentrant, so it does not matter. This patch implements 3, along with a minor macro to keep the size of the callers sane. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-25 18:55:02 +02:00
const char *show_ident_date(const struct ident_split *id,
const struct date_mode *mode);
/*
* Compare split idents for equality or strict ordering. Note that we
* compare only the ident part of the line, ignoring any timestamp.
*
* Because there are two fields, we must choose one as the primary key; we
* currently arbitrarily pick the email.
*/
int ident_cmp(const struct ident_split *, const struct ident_split *);
struct checkout {
struct index_state *istate;
const char *base_dir;
int base_dir_len;
struct delayed_checkout *delayed_checkout;
unsigned force:1,
quiet:1,
not_new:1,
clone:1,
refresh_cache:1;
};
#define CHECKOUT_INIT { NULL, "" }
#define TEMPORARY_FILENAME_LENGTH 25
int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *topath, int *nr_checkouts);
void enable_delayed_checkout(struct checkout *state);
int finish_delayed_checkout(struct checkout *state, int *nr_checkouts);
/*
* Unlink the last component and schedule the leading directories for
* removal, such that empty directories get removed.
*/
void unlink_entry(const struct cache_entry *ce);
struct cache_def {
struct strbuf path;
int flags;
int track_flags;
int prefix_len_stat_func;
};
#define CACHE_DEF_INIT { STRBUF_INIT, 0, 0, 0 }
static inline void cache_def_clear(struct cache_def *cache)
{
strbuf_release(&cache->path);
}
int has_symlink_leading_path(const char *name, int len);
int threaded_has_symlink_leading_path(struct cache_def *, const char *, int);
int check_leading_path(const char *name, int len);
int has_dirs_only_path(const char *name, int len, int prefix_len);
void schedule_dir_for_removal(const char *name, int len);
void remove_scheduled_dirs(void);
struct pack_window {
struct pack_window *next;
unsigned char *base;
off_t offset;
size_t len;
unsigned int last_used;
unsigned int inuse_cnt;
};
struct pack_entry {
off_t offset;
struct packed_git *p;
};
/*
* Create a temporary file rooted in the object database directory, or
* die on failure. The filename is taken from "pattern", which should have the
* usual "XXXXXX" trailer, and the resulting filename is written into the
* "template" buffer. Returns the open descriptor.
*/
int odb_mkstemp(struct strbuf *temp_filename, const char *pattern);
/*
* Create a pack .keep file named "name" (which should generally be the output
* of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
* error.
*/
int odb_pack_keep(const char *name);
sha1_file: support lazily fetching missing objects Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 16:27:14 +01:00
/*
* Set this to 0 to prevent oid_object_info_extended() from fetching missing
sha1_file: support lazily fetching missing objects Teach sha1_file to fetch objects from the remote configured in extensions.partialclone whenever an object is requested but missing. The fetching of objects can be suppressed through a global variable. This is used by fsck and index-pack. However, by default, such fetching is not suppressed. This is meant as a temporary measure to ensure that all Git commands work in such a situation. Future patches will update some commands to either tolerate missing objects (without fetching them) or be more efficient in fetching them. In order to determine the code changes in sha1_file.c necessary, I investigated the following: (1) functions in sha1_file.c that take in a hash, without the user regarding how the object is stored (loose or packed) (2) functions in packfile.c (because I need to check callers that know about the loose/packed distinction and operate on both differently, and ensure that they can handle the concept of objects that are neither loose nor packed) (1) is handled by the modification to sha1_object_info_extended(). For (2), I looked at for_each_packed_object and others. For for_each_packed_object, the callers either already work or are fixed in this patch: - reachable - only to find recent objects - builtin/fsck - already knows about missing objects - builtin/cat-file - warning message added in this commit Callers of the other functions do not need to be changed: - parse_pack_index - http - indirectly from http_get_info_packs - find_pack_entry_one - this searches a single pack that is provided as an argument; the caller already knows (through other means) that the sought object is in a specific pack - find_sha1_pack - fast-import - appears to be an optimization to not store a file if it is already in a pack - http-walker - to search through a struct alt_base - http-push - to search through remote packs - has_sha1_pack - builtin/fsck - already knows about promisor objects - builtin/count-objects - informational purposes only (check if loose object is also packed) - builtin/prune-packed - check if object to be pruned is packed (if not, don't prune it) - revision - used to exclude packed objects if requested by user - diff - just for optimization Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 16:27:14 +01:00
* blobs. This has a difference only if extensions.partialClone is set.
*
* Its default value is 1.
*/
extern int fetch_if_missing;
/* Dumb servers support */
int update_server_info(int);
const char *get_log_output_encoding(void);
const char *get_commit_output_encoding(void);
/*
* This is a hack for test programs like test-dump-untracked-cache to
* ensure that they do not modify the untracked cache when reading it.
* Do not use it otherwise!
*/
extern int ignore_untracked_cache_config;
add `config_set` API for caching config-like files Currently `git_config()` uses a callback mechanism and file rereads for config values. Due to this approach, it is not uncommon for the config files to be parsed several times during the run of a git program, with different callbacks picking out different variables useful to themselves. Add a `config_set`, that can be used to construct an in-memory cache for config-like files that the caller specifies (i.e., files like `.gitmodules`, `~/.gitconfig` etc.). Add two external functions `git_configset_get_value` and `git_configset_get_value_multi` for querying from the config sets. `git_configset_get_value` follows `last one wins` semantic (i.e. if there are multiple matches for the queried key in the files of the configset the value returned will be the last entry in `value_list`). `git_configset_get_value_multi` returns a list of values sorted in order of increasing priority (i.e. last match will be at the end of the list). Add type specific query functions like `git_configset_get_bool` and similar. Add a default `config_set`, `the_config_set` to cache all key-value pairs read from usual config files (repo specific .git/config, user wide ~/.gitconfig, XDG config and the global /etc/gitconfig). `the_config_set` is populated using `git_config()`. Add two external functions `git_config_get_value` and `git_config_get_value_multi` for querying in a non-callback manner from `the_config_set`. Also, add type specific query functions that are implemented as a thin wrapper around the `config_set` API. Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr> Signed-off-by: Tanay Abhra <tanayabh@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-28 12:10:38 +02:00
int committer_ident_sufficiently_given(void);
int author_ident_sufficiently_given(void);
extern const char *git_commit_encoding;
extern const char *git_log_output_encoding;
extern const char *git_mailmap_file;
extern const char *git_mailmap_blob;
/* IO helper functions */
void maybe_flush_or_die(FILE *, const char *);
__attribute__((format (printf, 2, 3)))
void fprintf_or_die(FILE *, const char *fmt, ...);
#define COPY_READ_ERROR (-2)
#define COPY_WRITE_ERROR (-3)
int copy_fd(int ifd, int ofd);
int copy_file(const char *dst, const char *src, int mode);
int copy_file_with_time(const char *dst, const char *src, int mode);
void write_or_die(int fd, const void *buf, size_t count);
void fsync_or_die(int fd, const char *);
ssize_t read_in_full(int fd, void *buf, size_t count);
ssize_t write_in_full(int fd, const void *buf, size_t count);
ssize_t pread_in_full(int fd, void *buf, size_t count, off_t offset);
static inline ssize_t write_str_in_full(int fd, const char *str)
{
return write_in_full(fd, str, strlen(str));
}
/**
* Open (and truncate) the file at path, write the contents of buf to it,
* and close it. Dies if any errors are encountered.
*/
void write_file_buf(const char *path, const char *buf, size_t len);
/**
* Like write_file_buf(), but format the contents into a buffer first.
* Additionally, write_file() will append a newline if one is not already
* present, making it convenient to write text files:
*
* write_file(path, "counter: %d", ctr);
*/
__attribute__((format (printf, 2, 3)))
void write_file(const char *path, const char *fmt, ...);
/* pager.c */
void setup_pager(void);
int pager_in_use(void);
extern int pager_use_color;
int term_columns(void);
int decimal_width(uintmax_t);
int check_pager_config(const char *cmd);
void prepare_pager_args(struct child_process *, const char *pager);
extern const char *editor_program;
extern const char *askpass_program;
extern const char *excludes_file;
binary patch. This adds "binary patch" to the diff output and teaches apply what to do with them. On the diff generation side, traditionally, we said "Binary files differ\n" without giving anything other than the preimage and postimage object name on the index line. This was good enough for applying a patch generated from your own repository (very useful while rebasing), because the postimage would be available in such a case. However, this was not useful when the recipient of such a patch via e-mail were to apply it, even if the preimage was available. This patch allows the diff to generate "binary" patch when operating under --full-index option. The binary patch follows the usual extended git diff headers, and looks like this: "GIT binary patch\n" <length byte><data>"\n" ... "\n" Each line is prefixed with a "length-byte", whose value is upper or lowercase alphabet that encodes number of bytes that the data on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ..., 'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of 5-byte sequence, each of which encodes up to 4 bytes in base85 encoding. Because 52 / 4 * 5 = 65 and we have the length byte, an output line is capped to 66 characters. The payload is the same diff-delta as we use in the packfiles. On the consumption side, git-apply now can decode and apply the binary patch when --allow-binary-replacement is given, the diff was generated with --full-index, and the receiving repository has the preimage blob, which is the same condition as it always required when accepting an "Binary files differ\n" patch. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
/* base85 */
int decode_85(char *dst, const char *line, int linelen);
void encode_85(char *buf, const unsigned char *data, int bytes);
binary patch. This adds "binary patch" to the diff output and teaches apply what to do with them. On the diff generation side, traditionally, we said "Binary files differ\n" without giving anything other than the preimage and postimage object name on the index line. This was good enough for applying a patch generated from your own repository (very useful while rebasing), because the postimage would be available in such a case. However, this was not useful when the recipient of such a patch via e-mail were to apply it, even if the preimage was available. This patch allows the diff to generate "binary" patch when operating under --full-index option. The binary patch follows the usual extended git diff headers, and looks like this: "GIT binary patch\n" <length byte><data>"\n" ... "\n" Each line is prefixed with a "length-byte", whose value is upper or lowercase alphabet that encodes number of bytes that the data on the line decodes to (1..52 -- 'A' means 1, 'B' means 2, ..., 'Z' means 26, 'a' means 27, ...). <data> is 1 or more groups of 5-byte sequence, each of which encodes up to 4 bytes in base85 encoding. Because 52 / 4 * 5 = 65 and we have the length byte, an output line is capped to 66 characters. The payload is the same diff-delta as we use in the packfiles. On the consumption side, git-apply now can decode and apply the binary patch when --allow-binary-replacement is given, the diff was generated with --full-index, and the receiving repository has the preimage blob, which is the same condition as it always required when accepting an "Binary files differ\n" patch. Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 01:51:44 +02:00
/* pkt-line.c */
void packet_trace_identity(const char *prog);
/* add */
/*
* return 0 if success, 1 - if addition of a file failed and
* ADD_FILES_IGNORE_ERRORS was specified in flags
*/
int add_files_to_cache(const char *prefix, const struct pathspec *pathspec, int flags);
/* diff.c */
extern int diff_auto_refresh_index;
/* match-trees.c */
void shift_tree(const struct object_id *, const struct object_id *, struct object_id *, int);
void shift_tree_by(const struct object_id *, const struct object_id *, struct object_id *, const char *);
/*
* whitespace rules.
* used by both diff and apply
* last two digits are tab width
*/
#define WS_BLANK_AT_EOL 0100
#define WS_SPACE_BEFORE_TAB 0200
#define WS_INDENT_WITH_NON_TAB 0400
#define WS_CR_AT_EOL 01000
#define WS_BLANK_AT_EOF 02000
#define WS_TAB_IN_INDENT 04000
#define WS_TRAILING_SPACE (WS_BLANK_AT_EOL|WS_BLANK_AT_EOF)
#define WS_DEFAULT_RULE (WS_TRAILING_SPACE|WS_SPACE_BEFORE_TAB|8)
#define WS_TAB_WIDTH_MASK 077
/* All WS_* -- when extended, adapt diff.c emit_symbol */
#define WS_RULE_MASK 07777
extern unsigned whitespace_rule_cfg;
unsigned whitespace_rule(struct index_state *, const char *);
unsigned parse_whitespace_rule(const char *);
unsigned ws_check(const char *line, int len, unsigned ws_rule);
void ws_check_emit(const char *line, int len, unsigned ws_rule, FILE *stream, const char *set, const char *reset, const char *ws);
char *whitespace_error_string(unsigned ws);
void ws_fix_copy(struct strbuf *, const char *, int, unsigned, int *);
int ws_blank_line(const char *line, int len, unsigned ws_rule);
#define ws_tab_width(rule) ((rule) & WS_TAB_WIDTH_MASK)
/* ls-files */
void overlay_tree_on_index(struct index_state *istate,
const char *tree_name, const char *prefix);
setup: make startup_info available everywhere Commit a60645f (setup: remember whether repository was found, 2010-08-05) introduced the startup_info structure, which records some parts of the setup_git_directory() process (notably, whether we actually found a repository or not). One of the uses of this data is for functions to behave appropriately based on whether we are in a repo. But the startup_info struct is just a pointer to storage provided by the main program, and the only program that sets it up is the git.c wrapper. Thus builtins have access to startup_info, but externally linked programs do not. Worse, library code which is accessible from both has to be careful about accessing startup_info. This can be used to trigger a die("BUG") via get_sha1(): $ git fast-import <<-\EOF tag foo from HEAD:./whatever EOF fatal: BUG: startup_info struct is not initialized. Obviously that's fairly nonsensical input to feed to fast-import, but we should never hit a die("BUG"). And there may be other ways to trigger it if other non-builtins resolve sha1s. So let's point the storage for startup_info to a static variable in setup.c, making it available to all users of the library code. We _could_ turn startup_info into a regular extern struct, but doing so would mean tweaking all of the existing use sites. So let's leave the pointer indirection in place. We can, however, drop any checks for NULL, as they will always be false (and likewise, we can drop the test covering this case, which was a rather artificial situation using one of the test-* programs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-05 23:10:27 +01:00
/* setup.c */
struct startup_info {
int have_repository;
const char *prefix;
};
extern struct startup_info *startup_info;
/* merge.c */
struct commit_list;
int try_merge_command(struct repository *r,
const char *strategy, size_t xopts_nr,
const char **xopts, struct commit_list *common,
const char *head_arg, struct commit_list *remotes);
int checkout_fast_forward(struct repository *r,
const struct object_id *from,
const struct object_id *to,
int overwrite_ignore);
int sane_execvp(const char *file, char *const argv[]);
/*
* A struct to encapsulate the concept of whether a file has changed
* since we last checked it. This uses criteria similar to those used
* for the index.
*/
struct stat_validity {
struct stat_data *sd;
};
void stat_validity_clear(struct stat_validity *sv);
/*
* Returns 1 if the path is a regular file (or a symlink to a regular
* file) and matches the saved stat_validity, 0 otherwise. A missing
* or inaccessible file is considered a match if the struct was just
* initialized, or if the previous update found an inaccessible file.
*/
int stat_validity_check(struct stat_validity *sv, const char *path);
/*
* Update the stat_validity from a file opened at descriptor fd. If
* the file is missing, inaccessible, or not a regular file, then
* future calls to stat_validity_check will match iff one of those
* conditions continues to be true.
*/
void stat_validity_update(struct stat_validity *sv, int fd);
int versioncmp(const char *s1, const char *s2);
void sleep_millisec(int millisec);
/*
* Create a directory and (if share is nonzero) adjust its permissions
* according to the shared_repository setting. Only use this for
* directories under $GIT_DIR. Don't use it for working tree
* directories.
*/
void safe_create_dir(const char *dir, int share);
/*
* Should we print an ellipsis after an abbreviated SHA-1 value
* when doing diff-raw output or indicating a detached HEAD?
*/
int print_sha1_ellipsis(void);
/* Return 1 if the file is empty or does not exists, 0 otherwise. */
int is_empty_or_missing_file(const char *filename);
#endif /* CACHE_H */