git-commit-vandalism

Author	SHA1	Message	Date
Jeff King	d4316604f8	pack-objects: default to writing bitmap hash-cache Enabling pack.writebitmaphashcache should always be a performance win. It costs only 4 bytes per object on disk, and the timings in `ae4f07fbcc` (pack-bitmap: implement optional name_hash cache, 2013-12-21) show it improving fetch and partial-bitmap clone times by 40-50%. The only reason we didn't enable it by default at the time is that early versions of JGit's bitmap reader complained about the presence of optional header bits it didn't understand. But that was changed in JGit's d2fa3987a (Use bitcheck to check for presence of OPT_FULL option, 2013-10-30), which made it into JGit v3.5.0 in late 2014. So let's turn this option on by default. It's backwards-compatible with all versions of Git, and if you are also using JGit on the same repository, you'd only run into problems using a version that's almost 5 years old. We'll drop the manual setting from all of our test scripts, including perf tests. This isn't strictly necessary, but it has two advantages: 1. If the hash-cache ever stops being enabled by default, our perf regression tests will notice. 2. We can use the modified perf tests to show off the behavior of an otherwise unconfigured repo, as shown below. These are the results of a few of a perf tests against linux.git that showed interesting results. You can see the expected speedup in 5310.4, which was noted in `ae4f07fbcc`. Curiously, 5310.8 did not improve (and actually got slower), despite seeing the opposite in `ae4f07fbcc`. I don't have an explanation for that. The tests from p5311 did not exist back then, but do show improvements (a smaller pack due to better deltas, which we found in less time). Test HEAD^ HEAD ------------------------------------------------------------------------------------- 5310.4: simulated fetch 7.39(22.70+0.25) 5.64(11.43+0.22) -23.7% 5310.8: clone (partial bitmap) 18.45(24.83+1.19) 19.94(28.40+1.36) +8.1% 5311.31: server (128 days) 0.41(1.13+0.05) 0.34(0.72+0.02) -17.1% 5311.32: size (128 days) 7.4M 7.0M -4.8% 5311.33: client (128 days) 1.33(1.49+0.06) 1.29(1.37+0.12) -3.0% Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-03-18 14:11:15 +09:00
Derrick Stolee	ae417807b0	trace2:data: pack-objects: add trace2 regions When studying the performance of 'git push' we would like to know how much time is spent at various parts of the command. One area that could cause performance trouble is 'git pack-objects'. Add trace2 regions around the three main actions taken in this command: 1. Enumerate objects. 2. Prepare pack. 3. Write pack-file. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-22 15:28:21 -08:00
Junio C Hamano	3a14fdec88	Merge branch 'sl/const' Code cleanup. * sl/const: various: tighten constness of some local variables	2019-02-06 22:05:28 -08:00
Junio C Hamano	cba595ab1a	Merge branch 'jk/loose-object-cache-oid' Code clean-up. * jk/loose-object-cache-oid: prefer "hash mismatch" to "sha1 mismatch" sha1-file: avoid "sha1 file" for generic use in messages sha1-file: prefer "loose object file" to "sha1 file" in messages sha1-file: drop has_sha1_file() convert has_sha1_file() callers to has_object_file() sha1-file: convert pass-through functions to object_id sha1-file: modernize loose header/stream functions sha1-file: modernize loose object file functions http: use struct object_id instead of bare sha1 update comment references to sha1_object_info() sha1-file: fix outdated sha1 comment references	2019-02-06 22:05:27 -08:00
Junio C Hamano	5fda343321	Merge branch 'ds/push-sparse-tree-walk' "git pack-objects" learned another algorithm to compute the set of objects to send, that trades the resulting packfile off to save traversal cost to favor small pushes. * ds/push-sparse-tree-walk: pack-objects: create GIT_TEST_PACK_SPARSE pack-objects: create pack.useSparse setting revision: implement sparse algorithm list-objects: consume sparse tree walk revision: add mark_tree_uninteresting_sparse	2019-02-06 22:05:25 -08:00
Junio C Hamano	7589e63648	Merge branch 'nd/the-index-final' The assumption to work on the single "in-core index" instance has been reduced from the library-ish part of the codebase. * nd/the-index-final: cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch read-cache.c: remove the_* from index_has_changes() merge-recursive.c: remove implicit dependency on the_repository merge-recursive.c: remove implicit dependency on the_index sha1-name.c: remove implicit dependency on the_index read-cache.c: replace update_index_if_able with repo_& read-cache.c: kill read_index() checkout: avoid the_index when possible repository.c: replace hold_locked_index() with repo_hold_locked_index() notes-utils.c: remove the_repository references grep: use grep_opt->repo instead of explict repo argument	2019-02-06 22:05:23 -08:00
Junio C Hamano	d243a323a5	Merge branch 'ph/pack-objects-mutex-fix' "git pack-objects" incorrectly used uninitialized mutex, which has been corrected. * ph/pack-objects-mutex-fix: pack-objects: merge read_lock and lock in packing_data struct pack-objects: move read mutex to packing_data struct	2019-02-05 14:26:16 -08:00
Shahzad Lone	33de80b1d5	various: tighten constness of some local variables Signed-off-by: Shahzad Lone <shahzadlone@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-02-04 09:57:10 -08:00
Junio C Hamano	371820d5f1	Merge branch 'bc/tree-walk-oid' The code to walk tree objects has been taught that we may be working with object names that are not computed with SHA-1. * bc/tree-walk-oid: cache: make oidcpy always copy GIT_MAX_RAWSZ bytes tree-walk: store object_id in a separate member match-trees: use hashcpy to splice trees match-trees: compute buffer offset correctly when splicing tree-walk: copy object ID before use	2019-01-29 12:47:56 -08:00
Patrick Hogg	edb673cf10	pack-objects: merge read_lock and lock in packing_data struct Rename the packing_data lock to obd_lock and upgrade it to a recursive mutex to make it suitable for current read_lock usages. Additionally remove the superfluous #ifndef NO_PTHREADS guard around mutex initialization in prepare_packing_data as the mutex functions themselves are already protected. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-28 11:22:12 -08:00
Patrick Hogg	459307b139	pack-objects: move read mutex to packing_data struct `ac77d0c37` ("pack-objects: shrink size field in struct object_entry", 2018-04-14) added an extra usage of read_lock/read_unlock in the newly introduced oe_get_size_slow for thread safety in parallel calls to try_delta(). Unfortunately oe_get_size_slow is also used in serial code, some of which is called before the first invocation of ll_find_deltas. As such the read mutex is not guaranteed to be initialized. Resolve this by moving the read mutex to packing_data and initializing it in prepare_packing_data which is initialized in cmd_pack_objects. Signed-off-by: Patrick Hogg <phogg@novamoon.net> Reviewed-by: Duy Nguyen <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-28 11:22:06 -08:00
Nguyễn Thái Ngọc Duy	f8adbec9fe	cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch By default, index compat macros are off from now on, because they could hide the_index dependency. Only those in builtin can use it. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-24 11:55:06 -08:00
Derrick Stolee	99dbbfa8dd	pack-objects: create GIT_TEST_PACK_SPARSE Create a test variable GIT_TEST_PACK_SPARSE to enable the sparse object walk algorithm by default during the test suite. Enabling this variable ensures coverage in many interesting cases, such as shallow clones, partial clones, and missing objects. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-17 13:44:44 -08:00
Derrick Stolee	3d036eb0d2	pack-objects: create pack.useSparse setting The '--sparse' flag in 'git pack-objects' changes the algorithm used to enumerate objects to one that is faster for individual users pushing new objects that change only a small cone of the working directory. The sparse algorithm is not recommended for a server, which likely sends new objects that appear across the entire working directory. Create a 'pack.useSparse' setting that enables this new algorithm. This allows 'git push' to use this algorithm without passing a '--sparse' flag all the way through four levels of run_command() calls. If the '--no-sparse' flag is set, then this config setting is overridden. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-17 13:44:43 -08:00
Derrick Stolee	4f6d26b167	list-objects: consume sparse tree walk When creating a pack-file using 'git pack-objects --revs' we provide a list of interesting and uninteresting commits. For example, a push operation would make the local topic branch be interesting and the known remote refs as uninteresting. We want to discover the set of new objects to send to the server as a thin pack. We walk these commits until we discover a frontier of commits such that every commit walk starting at interesting commits ends in a root commit or unintersting commit. We then need to discover which non-commit objects are reachable from uninteresting commits. This commit walk is not changing during this series. The mark_edges_uninteresting() method in list-objects.c iterates on the commit list and does the following: * If the commit is UNINTERSTING, then mark its root tree and every object it can reach as UNINTERESTING. * If the commit is interesting, then mark the root tree of every UNINTERSTING parent (and all objects that tree can reach) as UNINTERSTING. At the very end, we repeat the process on every commit directly given to the revision walk from stdin. This helps ensure we properly cover shallow commits that otherwise were not included in the frontier. The logic to recursively follow trees is in the mark_tree_uninteresting() method in revision.c. The algorithm avoids duplicate work by not recursing into trees that are already marked UNINTERSTING. Add a new 'sparse' option to the mark_edges_uninteresting() method that performs this logic in a slightly different way. As we iterate over the commits, we add all of the root trees to an oidset. Then, call mark_trees_uninteresting_sparse() on that oidset. Note that we include interesting trees in this process. The current implementation of mark_trees_unintersting_sparse() will walk the same trees as the old logic, but this will be replaced in a later change. Add a '--sparse' flag in 'git pack-objects' to call this new logic. Add a new test script t/t5322-pack-objects-sparse.sh that tests this option. The tests currently demonstrate that the resulting object list is the same as the old algorithm. This includes a case where both algorithms pack an object that is not needed by a remote due to limits on the explored set of trees. When the sparse algorithm is changed in a later commit, we will add a test that demonstrates a change of behavior in some cases. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-17 13:44:39 -08:00
brian m. carlson	ea82b2a085	tree-walk: store object_id in a separate member When parsing a tree, we read the object ID directly out of the tree buffer. This is normally fine, but such an object ID cannot be used with oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1, there may not be that many bytes to copy. Instead, store the object ID in a separate struct member. Since we can no longer efficiently compute the path length, store that information as well in struct name_entry. Ensure we only copy the object ID into the new buffer if the path length is nonzero, as some callers will pass us an empty path with no object ID following it, and we will not want to read past the end of the buffer. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-15 09:57:41 -08:00
Junio C Hamano	6e5be1f2d5	Merge branch 'md/exclude-promisor-objects-fix-cleanup' Code clean-up. * md/exclude-promisor-objects-fix-cleanup: revision.c: put promisor option in specialized struct	2019-01-14 15:29:31 -08:00
Jeff King	c93206b412	update comment references to sha1_object_info() Commit `abef9020e3` (sha1_file: convert sha1_object_info* to object_id, 2018-03-12) renamed the function to oid_object_info(), but missed some comments which mention it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-01-08 09:40:19 -08:00
Junio C Hamano	cde555480b	Merge branch 'nd/the-index' More codepaths become aware of working with in-core repository instance other than the default "the_repository". * nd/the-index: (22 commits) rebase-interactive.c: remove the_repository references rerere.c: remove the_repository references pack-*.c: remove the_repository references pack-check.c: remove the_repository references notes-cache.c: remove the_repository references line-log.c: remove the_repository reference diff-lib.c: remove the_repository references delta-islands.c: remove the_repository references cache-tree.c: remove the_repository references bundle.c: remove the_repository references branch.c: remove the_repository reference bisect.c: remove the_repository reference blame.c: remove implicit dependency the_repository sequencer.c: remove implicit dependency on the_repository sequencer.c: remove implicit dependency on the_index transport.c: remove implicit dependency on the_index notes-merge.c: remove implicit dependency the_repository notes-merge.c: remove implicit dependency on the_index list-objects.c: reduce the_repository references list-objects-filter.c: remove implicit dependency on the_index ...	2019-01-04 13:33:33 -08:00
Matthew DeVore	bbcde41a70	revision.c: put promisor option in specialized struct Put the allow_exclude_promisor_objects flag in setup_revision_opt. When it was in rev_info, it was unclear when it was used, since rev_info is passed to functions that don't use the flag. This resulted in unnecessary setting of the flag in prune.c, so fix that as well. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-12-06 14:52:56 +09:00
Junio C Hamano	7fab474656	Merge branch 'cc/delta-islands' A few issues in the implementation of "delta-islands" feature has been corrected. * cc/delta-islands: pack-objects: fix off-by-one in delta-island tree-depth computation pack-objects: zero-initialize tree_depth/layer arrays pack-objects: fix tree_depth and layer invariants	2018-11-21 20:39:02 +09:00
Jeff King	3949053617	pack-objects: fix off-by-one in delta-island tree-depth computation When delta-islands are in use, we need to record the deepest path at which we find each tree and blob. Our loop to do so counts slashes, so "foo" is depth 0, "foo/bar" is depth 1, and so on. However, this neglects root trees, which are represented by the empty string. Those also have depth 0, but are at a layer above "foo". Thus, "foo" should be 1, "foo/bar" at 2, and so on. We use this depth to topo-sort the trees in resolve_tree_islands(). As a result, we may fail to visit a root tree before the sub-trees it contains, and therefore not correctly pass down the island marks. That in turn could lead to missing some delta opportunities (objects are in the same island, but we didn't realize it) or creating unwanted cross-island deltas (one object is in an island another isn't, but we don't realize). In practice, it seems to have only a small effect. Some experiments on the real-world git/git fork network at GitHub showed an improvement of only 0.14% in the resulting clone size. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-21 13:50:28 +09:00
Junio C Hamano	f5f0f68d61	Merge branch 'tb/print-size-t-with-uintmax-format' Code preparation to replace ulong vars with size_t vars where appropriate. * tb/print-size-t-with-uintmax-format: Upcast size_t variables to uintmax_t when printing	2018-11-19 16:24:41 +09:00
Junio C Hamano	137399934d	Merge branch 'ds/push-squelch-ambig-warning' "git push" used to check ambiguities between object-names and refnames while processing the list of refs' old and new values, which was unnecessary (as it knew that it is feeding raw object names). This has been optimized out. * ds/push-squelch-ambig-warning: pack-objects: ignore ambiguous object warnings	2018-11-19 16:24:40 +09:00
Junio C Hamano	ab96f28ba4	Merge branch 'jk/unused-parameter-fixes' Various functions have been audited for "-Wunused-parameter" warnings and bugs in them got fixed. * jk/unused-parameter-fixes: midx: double-check large object write loop assert NOARG/NONEG behavior of parse-options callbacks parse-options: drop OPT_DATE() apply: return -1 from option callback instead of calling exit(1) cat-file: report an error on multiple --batch options tag: mark "--message" option with NONEG show-branch: mark --reflog option as NONEG format-patch: mark "--no-numbered" option with NONEG status: mark --find-renames option with NONEG cat-file: mark batch options with NONEG pack-objects: mark index-version option as NONEG ls-files: mark exclude options as NONEG am: handle --no-patch-format option apply: mark include/exclude options as NONEG	2018-11-18 18:23:53 +09:00
Junio C Hamano	26b80a841a	Merge branch 'nd/pthreads' The codebase has been cleaned up to reduce "#ifndef NO_PTHREADS". * nd/pthreads: Clean up pthread_create() error handling read-cache.c: initialize copy_len to shut up gcc 8 read-cache.c: reduce branching based on HAVE_THREADS read-cache.c: remove #ifdef NO_PTHREADS pack-objects: remove #ifdef NO_PTHREADS preload-index.c: remove #ifdef NO_PTHREADS grep: clean up num_threads handling grep: remove #ifdef NO_PTHREADS attr.c: remove #ifdef NO_PTHREADS name-hash.c: remove #ifdef NO_PTHREADS index-pack: remove #ifdef NO_PTHREADS send-pack.c: move async's #ifdef NO_PTHREADS back to run-command.c run-command.h: include thread-utils.h instead of pthread.h thread-utils: macros to unconditionally compile pthreads API	2018-11-18 18:23:52 +09:00
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 16:43:52 +09:00
Nguyễn Thái Ngọc Duy	7c14112741	pack-*.c: remove the_repository references Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:50:06 +09:00
Nguyễn Thái Ngọc Duy	385cb64ff3	delta-islands.c: remove the_repository references Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 14:50:06 +09:00
Derrick Stolee	a4544b311e	pack-objects: ignore ambiguous object warnings A git push process runs several processes during its run, but one includes git send-pack which calls git pack-objects and passes the known have/wants into stdin using object ids. However, the default setting for core.warnAmbiguousRefs requires git pack-objects to check for ref names matching the ref_rev_parse_rules array in refs.c. This means that every object is triggering at least six "file exists?" queries. When there are a lot of refs, this can add up significantly! I observed a simple push spending three seconds checking these paths. The fix here is similar to `4c30d50` "rev-list: disable object/refname ambiguity check with --stdin". While the get_object_list() method reads the objects from stdin, turn warn_on_object_refname_ambiguity flag (which is usually true) to false. Just for code hygiene, save away the original at the beginning and restore it once we are done. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-07 10:49:58 +09:00
Junio C Hamano	a29b8bcf62	Merge branch 'md/exclude-promisor-objects-fix' Operations on promisor objects make sense in the context of only a small subset of the commands that internally use the revisions machinery, but the "--exclude-promisor-objects" option were taken and led to nonsense results by commands like "log", to which it didn't make much sense. This has been corrected. * md/exclude-promisor-objects-fix: exclude-promisor-objects: declare when option is allowed Documentation/git-log.txt: do not show --exclude-promisor-objects	2018-11-06 15:50:21 +09:00
Jeff King	517fe807d6	assert NOARG/NONEG behavior of parse-options callbacks When we define a parse-options callback, the flags we put in the option struct must match what the callback expects. For example, a callback which does not handle the "unset" parameter should only be used with PARSE_OPT_NONEG. But since the callback and the option struct are not defined next to each other, it's easy to get this wrong (as earlier patches in this series show). Fortunately, the compiler can help us here: compiling with -Wunused-parameters can show us which callbacks ignore their "unset" parameters (and likewise, ones that ignore "arg" expect to be triggered with PARSE_OPT_NOARG). But after we've inspected a callback and determined that all of its callers use the right flags, what do we do next? We'd like to silence the compiler warning, but do so in a way that will catch any wrong calls in the future. We can do that by actually checking those variables and asserting that they match our expectations. Because this is such a common pattern, we'll introduce some helper macros. The resulting messages aren't as descriptive as we could make them, but the file/line information from BUG() is enough to identify the problem (and anyway, the point is that these should never be seen). Each of the annotated callbacks in this patch triggers -Wunused-parameters, and was manually inspected to make sure all callers use the correct options (so none of these BUGs should be triggerable). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-06 12:56:29 +09:00
Jeff King	f53c163bcd	pack-objects: mark index-version option as NONEG Running "git pack-objects --no-index-version" will segfault, since the callback is not prepared to handle the "unset" flag. In theory this might be used to counteract an earlier "--index-version", or override a pack.indexversion config setting. But the semantics aren't immediately obvious, and it's unlikely anybody wants this. Let's just disable the broken option for now. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-06 12:55:35 +09:00
Nguyễn Thái Ngọc Duy	9c897c5c2a	pack-objects: remove #ifdef NO_PTHREADS Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-05 13:42:11 +09:00
Junio C Hamano	620b00e167	Merge branch 'js/pack-objects-mutex-init-fix' A mutex used in "git pack-objects" were not correctly initialized and this caused "git repack" to dump core on Windows. * js/pack-objects-mutex-init-fix: pack-objects (mingw): initialize `packing_data` mutex in the correct spot pack-objects (mingw): demonstrate a segmentation fault with large deltas pack-objects: fix typo 'detla' -> 'delta'	2018-10-30 15:43:43 +09:00
Matthew DeVore	669b1d2aae	exclude-promisor-objects: declare when option is allowed The --exclude-promisor-objects option causes some funny behavior in at least two commands: log and blame. It causes a BUG crash: $ git log --exclude-promisor-objects BUG: revision.c:2143: exclude_promisor_objects can only be used when fetch_if_missing is 0 Aborted [134] Fix this such that the option is treated like any other unknown option. The commands that must support it are limited, so declare in those commands that the flag is supported. In particular: pack-objects prune rev-list The commands were found by searching for logic which parses --exclude-promisor-objects outside of revision.c. Extra logic outside of revision.c is needed because fetch_if_missing must be turned on before revision.c sees the option or it will BUG-crash. The above list is supported by the fact that no other command is introspectively invoked by another command passing --exclude-promisor-object. Signed-off-by: Matthew DeVore <matvore@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-23 13:52:57 +09:00
Johannes Schindelin	34204c8166	pack-objects (mingw): initialize `packing_data` mutex in the correct spot In `9ac3f0e5b3` (pack-objects: fix performance issues on packing large deltas, 2018-07-22), a mutex was introduced that is used to guard the call to set the delta size. This commit even added code to initialize it, but at an incorrect spot: in `init_threaded_search()`, while the call to `oe_set_delta_size()` (and hence to `packing_data_lock()`) can happen in the call chain `check_object()` <- `get_object_details()` <- `prepare_pack()` <- `cmd_pack_objects()`, which is long before the `prepare_pack()` function calls `ll_find_deltas()` (which initializes the threaded search). Another tell-tale that the mutex was initialized in an incorrect spot is that the function to initialize it lives in builtin/, while the code that uses the mutex is defined in a libgit.a header file. Let's use a more appropriate function: `prepare_packing_data()`, which not only lives in libgit.a, but has to be called before the `packing_data` struct is used that contains that mutex. This fixes https://github.com/git-for-windows/git/issues/1839. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-10-19 14:28:44 +09:00
Junio C Hamano	11877b9ebe	Merge branch 'nd/the-index' Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...	2018-10-19 13:34:02 +09:00
Junio C Hamano	73b9c6f583	Merge branch 'jk/delta-islands-with-bitmap-reuse-delta-fix' Fix interactions between two recent topics. * jk/delta-islands-with-bitmap-reuse-delta-fix: pack-objects: handle island check for "external" delta base	2018-10-16 16:16:00 +09:00
Junio C Hamano	10de0f802d	Merge branch 'tb/void-check-attr' Code clean-up. * tb/void-check-attr: Make git_check_attr() a void function	2018-09-24 10:30:45 -07:00
Nguyễn Thái Ngọc Duy	2abf350385	revision.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:51:19 -07:00
Jeff King	2fa233a554	pack-objects: handle island check for "external" delta base Two recent topics, jk/pack-delta-reuse-with-bitmap and cc/delta-islands, can have a funny interaction. When checking if we can reuse an on-disk delta, the first topic allows base_entry to be NULL when we find an object that's not in the packing list. But the latter topic introduces a call to in_same_island(), which needs to look at base_entry->idx.oid. When these two features are used together, we might try to dereference a NULL base_entry. In practice, this doesn't really happen. We'd generally only use delta islands when packing to disk, since the whole point is to optimize the pack for serving fetches later. And the new delta-reuse code relies on having used reachability bitmaps to determine the set of objects, which we would typically only do when serving an actual fetch. However, it is technically possible to combine these features. And even without doing so, building with "SANITIZE=address,undefined" will cause t5310.46 to complain. Even though that test does not have delta islands enabled, we still take the address of the NULL entry to pass to in_same_island(). That function then promptly returns without dereferencing the value when it sees that islands are not enabled, but it's enough to trigger a sanitizer error. The solution is straight-forward: when both features are used together, we should pass the oid of the found base to in_same_island(). This is tricky to do inside a single "if" statement. And after the merge in `f3504ea3dd` (Merge branch 'cc/delta-islands', 2018-09-17), that "if" condition is already getting pretty unwieldy. So this patch moves the logic into a helper function, where we can easily use multiple return paths. The result is a bit longer, but the logic should be much easier to follow. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-19 12:03:35 -07:00
Junio C Hamano	769af0fd9e	Merge branch 'jk/cocci' spatch transformation to replace boolean uses of !hashcmp() to newly introduced oideq() is added, and applied, to regain performance lost due to support of multiple hash algorithms. * jk/cocci: show_dirstat: simplify same-content check read-cache: use oideq() in ce_compare functions convert hashmap comparison functions to oideq() convert "hashcmp() != 0" to "!hasheq()" convert "oidcmp() != 0" to "!oideq()" convert "hashcmp() == 0" to hasheq() convert "oidcmp() == 0" to oideq() introduce hasheq() and oideq() coccinelle: use <...> for function exclusion	2018-09-17 13:53:57 -07:00
Junio C Hamano	f3504ea3dd	Merge branch 'cc/delta-islands' Lift code from GitHub to restrict delta computation so that an object that exists in one fork is not made into a delta against another object that does not appear in the same forked repository. * cc/delta-islands: pack-objects: move 'layer' into 'struct packing_data' pack-objects: move tree_depth into 'struct packing_data' t5320: tests for delta islands repack: add delta-islands support pack-objects: add delta-islands support pack-objects: refactor code into compute_layer_order() Add delta-islands.{c,h}	2018-09-17 13:53:55 -07:00
Junio C Hamano	3ebdef2e1b	Merge branch 'jk/pack-delta-reuse-with-bitmap' When creating a thin pack, which allows objects to be made into a delta against another object that is not in the resulting pack but is known to be present on the receiving end, the code learned to take advantage of the reachability bitmap; this allows the server to send a delta against a base beyond the "boundary" commit. * jk/pack-delta-reuse-with-bitmap: pack-objects: reuse on-disk deltas for thin "have" objects pack-bitmap: save "have" bitmap from walk t/perf: add perf tests for fetches from a bitmapped server t/perf: add infrastructure for measuring sizes t/perf: factor out percent calculations t/perf: factor boilerplate out of test_perf	2018-09-17 13:53:53 -07:00
Junio C Hamano	49f210fd52	Merge branch 'ds/multi-pack-index' When there are too many packfiles in a repository (which is not recommended), looking up an object in these would require consulting many pack .idx files; a new mechanism to have a single file that consolidates all of these .idx files is introduced. * ds/multi-pack-index: (32 commits) pack-objects: consider packs in multi-pack-index midx: test a few commands that use get_all_packs treewide: use get_all_packs packfile: add all_packs list midx: fix bug that skips midx with alternates midx: stop reporting garbage midx: mark bad packed objects multi-pack-index: store local property multi-pack-index: provide more helpful usage info midx: clear midx on repack packfile: skip loading index if in multi-pack-index midx: prevent duplicate packfile loads midx: use midx in approximate_object_count midx: use existing midx when writing new one midx: use midx in abbreviation calculations midx: read objects from multi-pack-index config: create core.multiPackIndex setting midx: write object offsets midx: write object id fanout chunk midx: write object ids in a chunk ...	2018-09-17 13:53:50 -07:00
Torsten Bögershausen	d64324cb60	Make git_check_attr() a void function git_check_attr() returns always 0. Remove all the error handling code of the callers, which is never executed. Change git_check_attr() to be a void function. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-12 15:15:34 -07:00
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-29 11:32:49 -07:00
Junio C Hamano	29d9e3e2c4	Merge branch 'nd/pack-deltify-regression-fix' In a recent update in 2.18 era, "git pack-objects" started producing a larger than necessary packfiles by missing opportunities to use large deltas. * nd/pack-deltify-regression-fix: pack-objects: fix performance issues on packing large deltas	2018-08-22 11:17:05 -07:00
Jeff King	6a1e32d532	pack-objects: reuse on-disk deltas for thin "have" objects When we serve a fetch, we pass the "wants" and "haves" from the fetch negotiation to pack-objects. That tells us not only which objects we need to send, but we also use the boundary commits as "preferred bases": their trees and blobs are candidates for delta bases, both for reusing on-disk deltas and for finding new ones. However, this misses some opportunities. Modulo some special cases like shallow or partial clones, we know that every object reachable from the "haves" could be a preferred base. We don't use all of them for two reasons: 1. It's expensive to traverse the whole history and enumerate all of the objects the other side has. 2. The delta search is expensive, so we want to keep the number of candidate bases sane. The boundary commits are the most likely to work. When we have reachability bitmaps, though, reason 1 no longer applies. We can efficiently compute the set of reachable objects on the other side (and in fact already did so as part of the bitmap set-difference to get the list of interesting objects). And using this set conveniently covers the shallow and partial cases, since we have to disable the use of bitmaps for those anyway. The second reason argues against using these bases in the search for new deltas. But there's one case where we can use this information for free: when we have an existing on-disk delta that we're considering reusing, we can do so if we know the other side has the base object. This in fact saves time during the delta search, because it's one less delta we have to compute. And that's exactly what this patch does: when we're considering whether to reuse an on-disk delta, if bitmaps tell us the other side has the object (and we're making a thin-pack), then we reuse it. Here are the results on p5311 using linux.git, which simulates a client fetching after `N` days since their last fetch: Test origin HEAD -------------------------------------------------------------------------- 5311.3: server (1 days) 0.27(0.27+0.04) 0.12(0.09+0.03) -55.6% 5311.4: size (1 days) 0.9M 237.0K -73.7% 5311.5: client (1 days) 0.04(0.05+0.00) 0.10(0.10+0.00) +150.0% 5311.7: server (2 days) 0.34(0.42+0.04) 0.13(0.10+0.03) -61.8% 5311.8: size (2 days) 1.5M 347.7K -76.5% 5311.9: client (2 days) 0.07(0.08+0.00) 0.16(0.15+0.01) +128.6% 5311.11: server (4 days) 0.56(0.77+0.08) 0.13(0.10+0.02) -76.8% 5311.12: size (4 days) 2.8M 566.6K -79.8% 5311.13: client (4 days) 0.13(0.15+0.00) 0.34(0.31+0.02) +161.5% 5311.15: server (8 days) 0.97(1.39+0.11) 0.30(0.25+0.05) -69.1% 5311.16: size (8 days) 4.3M 1.0M -76.0% 5311.17: client (8 days) 0.20(0.22+0.01) 0.53(0.52+0.01) +165.0% 5311.19: server (16 days) 1.52(2.51+0.12) 0.30(0.26+0.03) -80.3% 5311.20: size (16 days) 8.0M 2.0M -74.5% 5311.21: client (16 days) 0.40(0.47+0.03) 1.01(0.98+0.04) +152.5% 5311.23: server (32 days) 2.40(4.44+0.20) 0.31(0.26+0.04) -87.1% 5311.24: size (32 days) 14.1M 4.1M -70.9% 5311.25: client (32 days) 0.70(0.90+0.03) 1.81(1.75+0.06) +158.6% 5311.27: server (64 days) 11.76(26.57+0.29) 0.55(0.50+0.08) -95.3% 5311.28: size (64 days) 89.4M 47.4M -47.0% 5311.29: client (64 days) 5.71(9.31+0.27) 15.20(15.20+0.32) +166.2% 5311.31: server (128 days) 16.15(36.87+0.40) 0.91(0.82+0.14) -94.4% 5311.32: size (128 days) 134.8M 100.4M -25.5% 5311.33: client (128 days) 9.42(16.86+0.49) 25.34(25.80+0.46) +169.0% In all cases we save CPU time on the server (sometimes significant) and the resulting pack is smaller. We do spend more CPU time on the client side, because it has to reconstruct more deltas. But that's the right tradeoff to make, since clients tend to outnumber servers. It just means the thin pack mechanism is doing its job. From the user's perspective, the end-to-end time of the operation will generally be faster. E.g., in the 128-day case, we saved 15s on the server at a cost of 16s on the client. Since the resulting pack is 34MB smaller, this is a net win if the network speed is less than 270Mbit/s. And that's actually the worst case. The 64-day case saves just over 11s at a cost of just under 11s. So it's a slight win at any network speed, and the 40MB saved is pure bonus. That trend continues for the smaller fetches. The implementation itself is mostly straightforward, with the new logic going into check_object(). But there are two tricky bits. The first is that check_object() needs access to the relevant information (the thin flag and bitmap result). We can do this by pushing these into program-lifetime globals. The second is that the rest of the code assumes that any reused delta will point to another "struct object_entry" as its base. But of course the case we are interested in here is the one where don't have such an entry! I looked at a number of options that didn't quite work: - we could use a flag to signal a reused delta, but it's not a single bit. We have to actually store the oid of the base, which is normally done by pointing to the existing object_entry. And we'd have to modify all the code which looks at deltas. - we could add the reused bases to the end of the existing object_entry array. While this does create some extra work as later stages consider the extra entries, it's actually not too bad (we're not sending them, so they don't cost much in the delta search, and at most we'd have 2*N of them). But there's a more subtle problem. Adding to the existing array means we might need to grow it with realloc, which could move the earlier entries around. While many of the references to other entries are done by integer index, some (including ones on the stack) use pointers, which would become invalidated. This isn't insurmountable, but it would require quite a bit of refactoring (and it's hard to know that you've got it all, since it may work _most_ of the time and then fail subtly based on memory allocation patterns). - we could allocate a new one-off entry for the base. In fact, this is what an earlier version of this patch did. However, since the refactoring brought in by `ad635e82d6` (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23), the delta_idx code requires that both entries be in the main packing list. So taking all of those options into account, what I ended up with is a separate list of "external bases" that are not part of the main packing list. Each delta entry that points to an external base has a single-bit flag to do so; we have a little breathing room in the bitfield section of object_entry. This lets us limit the change primarily to the oe_delta() and oe_set_delta_ext() functions. And as a bonus, most of the rest of the code does not consider these dummy entries at all, saving both runtime CPU and code complexity. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-21 12:45:49 -07:00

1 2 3 4 5 ...

333 Commits