git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	6f64eeab60	Merge branch 'es/trace2-log-parent-process-name' trace2 logs learned to show parent process name to see in what context Git was invoked. * es/trace2-log-parent-process-name: tr2: log parent process name tr2: make process info collection platform-generic	2021-08-24 15:32:40 -07:00
Taylor Blau	917a54c017	Documentation: describe MIDX-based bitmaps Update the technical documentation to describe the multi-pack bitmap format. This patch merely introduces the new format, and describes its high-level ideas. Git does not yet know how to read nor write these multi-pack variants, and so the subsequent patches will: - Introduce code to interpret multi-pack bitmaps, according to this document. - Then, introduce code to write multi-pack bitmaps from the 'git multi-pack-index write' sub-command. Finally, the implementation will gain tests in subsequent patches (as opposed to inline with the patch teaching Git how to write multi-pack bitmaps) to avoid a cyclic dependency. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-24 13:21:13 -07:00
Ævar Arnfjörð Bjarmason	98e2d9d6f7	upload-pack: document and rename --advertise-refs The --advertise-refs documentation in git-upload-pack added in `9812f2136b` (upload-pack.c: use parse-options API, 2016-05-31) hasn't been entirely true ever since v2 support was implemented in `e52449b672` (connect: request remote refs using v2, 2018-03-15). Under v2 we don't advertise the refs at all, but rather dump the capabilities header. This option has always been an obscure internal implementation detail, it wasn't even documented for git-receive-pack. Since it has exactly one user let's rename it to --http-backend-info-refs, which is more accurate and points the reader in the right direction. Let's also cross-link this from the protocol v1 and v2 documentation. I'm retaining a hidden --advertise-refs alias in case there's any external users of this, and making both options hidden to the bash completion (as with most other internal-only options). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-05 08:59:37 -07:00
Elijah Newren	b378df72ed	directory-rename-detection.txt: small updates due to merge-ort optimizations In commit `0c4fd732f0` ("Move computation of dir_rename_count from merge-ort to diffcore-rename", 2021-02-27), much of the logic for computing directory renames moved into diffcore-rename. directory-rename-detection.txt had claims that all of that logic was found in merge-recursive. Update the documentation. Acked-by: Derrick Stolee <dstolee@microsoft.com> Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-08-05 08:57:39 -07:00
Emily Shaffer	2f732bf15e	tr2: log parent process name It can be useful to tell who invoked Git - was it invoked manually by a user via CLI or script? By an IDE? In some cases - like 'repo' tool - we can influence the source code and set the GIT_TRACE2_PARENT_SID environment variable from the caller process. In 'repo''s case, that parent SID is manipulated to include the string "repo", which means we can positively identify when Git was invoked by 'repo' tool. However, identifying parents that way requires both that we know which tools invoke Git and that we have the ability to modify the source code of those tools. It cannot scale to keep up with the various IDEs and wrappers which use Git, most of which we don't know about. Learning which tools and wrappers invoke Git, and how, would give us insight to decide where to improve Git's usability and performance. Unfortunately, there's no cross-platform reliable way to gather the name of the parent process. If procfs is present, we can use that; otherwise we will need to discover the name another way. However, the process ID should be sufficient to look up the process name on most platforms, so that code may be shareable. Git for Windows gathers similar information and logs it as a "data_json" event. However, since "data_json" has a variable format, it is difficult to parse effectively in some languages; instead, let's pursue a dedicated "cmd_ancestry" event to record information about the ancestry of the current process and a consistent, parseable way. Git for Windows also gathers information about more than one generation of parent. In Linux further ancestry info can be gathered with procfs, but it's unwieldy to do so. In the interest of later moving Git for Windows ancestry logging to the 'cmd_ancestry' event, and in the interest of later adding more ancestry to the Linux implementation - or of adding this functionality to other platforms which have an easier time walking the process tree - let's make 'cmd_ancestry' accept an array of parentage. Signed-off-by: Emily Shaffer <emilyshaffer@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-07-22 13:35:20 -07:00
Junio C Hamano	a515f26eac	Merge branch 'ar/typofix' Typofixes. * ar/typofix: *: fix typos which duplicate a word	2021-07-08 13:14:59 -07:00
Junio C Hamano	9c7a1fc9b6	Merge branch 'js/trace2-discard-event-docfix' Docfix. * js/trace2-discard-event-docfix: docs: fix api-trace2 doc for "too_many_files" event	2021-07-08 13:14:57 -07:00
Junio C Hamano	40098093c6	Merge branch 'tk/partial-clone-repack-doc' Docfix. * tk/partial-clone-repack-doc: Remove warning that repack only works on non-promisor packfiles	2021-07-08 13:14:56 -07:00
Junio C Hamano	169914ede2	Merge branch 'en/ort-perf-batch-11' Optimize out repeated rename detection in a sequence of mergy operations. * en/ort-perf-batch-11: merge-ort, diffcore-rename: employ cached renames when possible merge-ort: handle interactions of caching and rename/rename(1to1) cases merge-ort: add helper functions for using cached renames merge-ort: preserve cached renames for the appropriate side merge-ort: avoid accidental API mis-use merge-ort: add code to check for whether cached renames can be reused merge-ort: populate caches of rename detection results merge-ort: add data structures for in-memory caching of rename detection t6429: testcases for remembering renames fast-rebase: write conflict state to working tree, index, and HEAD fast-rebase: change assert() to BUG() Documentation/technical: describe remembering renames optimization t6423: rename file within directory that other side renamed	2021-06-14 13:33:27 +09:00
Andrei Rybak	abcb66c614	*: fix typos which duplicate a word Fix typos in documentation, code comments, and RelNotes which repeat various words. In trivial cases, just delete the duplicated word and rewrap text, if needed. Reword the affected sentence in Documentation/RelNotes/1.8.4.txt for it to make sense. Signed-off-by: Andrei Rybak <rybak.a.v@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-14 10:16:06 +09:00
Junio C Hamano	8e1d2fc0cc	Merge branch 'tl/fix-packfile-uri-doc' Doc fix. * tl/fix-packfile-uri-doc: packfile-uri.txt: fix blobPackfileUri description	2021-06-10 12:04:26 +09:00
Josh Steadmon	7ba68e0cf1	docs: fix api-trace2 doc for "too_many_files" event In `87db61a` (trace2: write discard message to sentinel files, 2019-10-04), we added a new "too_many_files" event for when trace2 logging would create too many files in an output directory. Unfortunately, the api-trace2 doc described a "discard" event instead. Fix the doc to use the correct event name. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-04 12:11:16 +09:00
Tao Klerks	ace6d8e3d6	Remove warning that repack only works on non-promisor packfiles The git-repack doc clearly states that it does operate on promisor packfiles (in a separate partition), with "-a" specified. Presumably the statements here are outdated, as they feature from the first doc in 2017 (and the repack support was added in 2018) Signed-off-by: Tao Klerks <tao@klerks.biz> Reviewed-by: Taylor Blau <me@ttaylorr.com> Acked-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-04 09:45:47 +09:00
Teng Long	3127ff90ea	packfile-uri.txt: fix blobPackfileUri description Fix the 'uploadpack.blobPackfileUri' description in packfile-uri.txt and the correct format also can be seen in t5702. Signed-off-by: Teng Long <dyroneteng@gmail.com> Reviewed-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-25 09:31:06 +09:00
Elijah Newren	a22099f552	t6429: testcases for remembering renames We will soon be adding an optimization that caches (in memory only, never written to disk) upstream renames during a sequence of merges such as occurs during a cherry-pick or rebase operation. Add several tests meant to stress such an implementation to ensure it does the right thing, and include a test whose outcome we will later change due to this optimization as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 15:40:39 +09:00
Elijah Newren	bb80333c08	Documentation/technical: describe remembering renames optimization Remembering renames on the upstream side of history in an early merge of a rebase or cherry-pick for re-use in a latter merge of the same operation makes pretty good intuitive sense. However, trying to show that it doesn't cause some subtle behavioral difference or some funny edge or corner case is much more involved. And, in fact, it does introduce a subtle behavioral change. Document all the assumptions, special cases, and logic involved in such an optimization, and describe why this optimization is safe under the current optimizations/features/etc. -- even when the subtle behavioral change is triggered. Part of the point of adding this document that goes over the optimization in such laborious detail, is that it is possible that significant future changes (optimizations or feature changes) could interact with this optimization in interesting ways; this document is here to help folks making big changes sanity check that the assumptions and arguments underlying this optimization are still valid. (As a side note, creating this document forced me to review things in sufficient detail that I found I was not properly caching directory-rename-induced renames, resulting in the code not being aware of those renames and causing unnecessary diffcore_rename_extended() calls in subsequent merges.) A subsequent commit will add several testcases based on this document meant to stress-test the implementation and also document the case with the subtle behavioral change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-20 15:40:39 +09:00
Junio C Hamano	644f4a2046	Merge branch 'jt/push-negotiation' "git push" learns to discover common ancestor with the receiving end over protocol v2. * jt/push-negotiation: send-pack: support push negotiation fetch: teach independent negotiation (no packfile) fetch-pack: refactor command and capability write fetch-pack: refactor add_haves() fetch-pack: refactor process_acks()	2021-05-16 21:05:22 +09:00
Junio C Hamano	eede71149e	Merge branch 'ba/object-info' Over-the-wire protocol learns a new request type to ask for object sizes given a list of object names. * ba/object-info: object-info: support for retrieving object info	2021-05-14 08:26:08 +09:00
Jonathan Tan	9c1e657a8f	fetch: teach independent negotiation (no packfile) Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful. Therefore, make it possible for this negotiation step to be done independently. A subsequent commit will use this for one such application - push negotiation. This feature is for protocol v2 only. (An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.) In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile. This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done". In practice, the client will never say it; instead it will cease requests once it is satisfied. In the client, the main change lies in the transport and transport helper code. fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself. There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover. If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it. But for now, report failure if independent negotiation is requested in these cases. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-05-05 10:41:29 +09:00
Junio C Hamano	a1cac26cc6	Merge branch 'mt/parallel-checkout-part-2' The checkout machinery has been taught to perform the actual write-out of the files in parallel when able. * mt/parallel-checkout-part-2: parallel-checkout: add design documentation parallel-checkout: support progress displaying parallel-checkout: add configuration options parallel-checkout: make it truly parallel unpack-trees: add basic support for parallel checkout	2021-04-30 13:50:26 +09:00
Junio C Hamano	8e97852919	Merge branch 'ds/sparse-index-protections' Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...	2021-04-30 13:50:26 +09:00
Bruno Albuquerque	a2ba162cda	object-info: support for retrieving object info Sometimes it is useful to get information of an object without having to download it completely. Add the "object-info" capability that lets the client ask for object-related information with their full hexadecimal object names. Only sizes are returned for now. Signed-off-by: Bruno Albuquerque <bga@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-20 17:41:13 -07:00
Junio C Hamano	fdef940afe	Merge branch 'ab/usage-error-docs' Documentation updates, with unrelated comment updates, too. * ab/usage-error-docs: api docs: document that BUG() emits a trace2 error event api docs: document BUG() in api-error-handling.txt usage.c: don't copy/paste the same comment three times	2021-04-20 17:23:36 -07:00
Junio C Hamano	196cc525e2	Merge branch 'hn/reftable-tables-doc-update' Doc updte. * hn/reftable-tables-doc-update: reftable: document an alternate cleanup method on Windows	2021-04-20 17:23:35 -07:00
Matheus Tavares	68e66f2987	parallel-checkout: add design documentation Co-authored-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-19 15:05:25 -07:00
Derrick Stolee	839a66349e	sparse-index: API protection strategy Edit and expand the sparse-index design document with the plan for guarding index operations with ensure_full_index(). Notably, the plan has changed to not have an expand_to_path() method in favor of checking for a sparse-directory hit inside of the index_path_pos() API. The changes that follow this one will incrementally add ensure_full_index() guards to iterations over all cache entries. Some iterations over the cache entries are not protected due to a few categories listed in the document. Since these are not being modified, here is a short list of the files and methods that will not receive these guards: Looking for non-zero stage: * builtin/add.c:chmod_pathspec() * builtin/merge.c:count_unmerged_entries() * merge-ort.c:record_conflicted_index_entries() * read-cache.c:unmerged_index() * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining() * revision.c:prepare_show_merge() * sequencer.c:append_conflicts_hint() * wt-status.c:wt_status_collect_changes_initial() Looking for submodules: * builtin/submodule--helper.c:module_list_compute() * submodule.c: several methods * worktree.c:validate_no_submodules() Part of the index API: * name-hash.c: lazy init methods * preload-index.c:preload_thread(), preload_index() * read-cache.c: file format methods Checking for correct order of cache entries: * read-cache.c:check_ce_order() Ignores SKIP_WORKTREE entries or already aware: * unpack-trees.c:mark_new_skip_worktree() * wt-status.c:wt_status_check_sparse_checkout() Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:45:34 -07:00
Ævar Arnfjörð Bjarmason	f6d25d7878	api docs: document that BUG() emits a trace2 error event Correct documentation added in `e544221d97` (trace2: Documentation/technical/api-trace2.txt, 2019-02-22) to state that calling BUG() also emits an "error" event. See `ee4512ed48` (trace2: create new combined trace facility, 2019-02-22) for the initial implementation. The BUG() function did not emit an event then however, that was only changed later in `0a9dde4a04` (usage: trace2 BUG() invocations, 2021-02-05), that commit changed the code, but didn't update any of the docs. Let's also add a cross-reference from api-error-handling.txt. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-13 14:57:13 -07:00
Ævar Arnfjörð Bjarmason	4bf0c6f38f	api docs: document BUG() in api-error-handling.txt When the BUG() function was added in `d8193743e0` (usage.c: add BUG() function, 2017-05-12) these docs added in `1f23cfe0ef` (doc: document error handling functions and conventions, 2014-12-03) were not updated. Let's do that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-13 14:56:58 -07:00
Han-Wen Nienhuys	61a7660516	reftable: document an alternate cleanup method on Windows The new method uses the update_index counter, which isn't susceptible to clock inaccuracies. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-12 14:29:44 -07:00
Junio C Hamano	e6b971fcf5	Merge branch 'tb/reverse-midx' An on-disk reverse-index to map the in-pack location of an object back to its object name across multiple packfiles is introduced. * tb/reverse-midx: midx.c: improve cache locality in midx_pack_order_cmp() pack-revindex: write multi-pack reverse indexes pack-write.c: extract 'write_rev_file_order' pack-revindex: read multi-pack reverse indexes Documentation/technical: describe multi-pack reverse indexes midx: make some functions non-static midx: keep track of the checksum midx: don't free midx_name early midx: allow marking a pack as preferred t/helper/test-read-midx.c: add '--show-objects' builtin/multi-pack-index.c: display usage on unrecognized command builtin/multi-pack-index.c: don't enter bogus cmd_mode builtin/multi-pack-index.c: split sub-commands builtin/multi-pack-index.c: define common usage with a macro builtin/multi-pack-index.c: don't handle 'progress' separately builtin/multi-pack-index.c: inline 'flags' with options	2021-04-08 13:23:25 -07:00
Junio C Hamano	861794b60d	Merge branch 'jh/simple-ipc' A simple IPC interface gets introduced to build services like fsmonitor on top. * jh/simple-ipc: t0052: add simple-ipc tests and t/helper/test-simple-ipc tool simple-ipc: add Unix domain socket implementation unix-stream-server: create unix domain socket under lock unix-socket: disallow chdir() when creating unix domain sockets unix-socket: add backlog size option to unix_stream_listen() unix-socket: eliminate static unix_stream_socket() helper function simple-ipc: add win32 implementation simple-ipc: design documentation for new IPC mechanism pkt-line: add options argument to read_packetized_to_strbuf() pkt-line: add PACKET_READ_GENTLE_ON_READ_ERROR option pkt-line: do not issue flush packets in write_packetized_*() pkt-line: eliminate the need for static buffer in packet_write_gently()	2021-04-02 14:43:14 -07:00
Taylor Blau	b25fd24c00	Documentation/technical: describe multi-pack reverse indexes As a prerequisite to implementing multi-pack bitmaps, motivate and describe the format and ordering of the multi-pack reverse index. The subsequent patch will implement reading this format, and the patch after that will implement writing it while producing a multi-pack index. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-01 13:07:37 -07:00
Taylor Blau	9218c6a40c	midx: allow marking a pack as preferred When multiple packs in the multi-pack index contain the same object, the MIDX machinery must make a choice about which pack it associates with that object. Prior to this patch, the lowest-ordered[1] pack was always selected. Pack selection for duplicate objects is relatively unimportant today, but it will become important for multi-pack bitmaps. This is because we can only invoke the pack-reuse mechanism when all of the bits for reused objects come from the reuse pack (in order to ensure that all reused deltas can find their base objects in the same pack). To encourage the pack selection process to prefer one pack over another (the pack to be preferred is the one a caller would like to later use as a reuse pack), introduce the concept of a "preferred pack". When provided, the MIDX code will always prefer an object found in a preferred pack over any other. No format changes are required to store the preferred pack, since it will be able to be inferred with a corresponding MIDX bitmap, by looking up the pack associated with the object in the first bit position (this ordering is described in detail in a subsequent commit). [1]: the ordering is specified by MIDX internals; for our purposes we can consider the "lowest ordered" pack to be "the one with the most-recent mtime. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-01 13:07:37 -07:00
Derrick Stolee	cd42415fb4	sparse-index: add 'sdir' index extension The index format does not currently allow for sparse directory entries. This violates some expectations that older versions of Git or third-party tools might not understand. We need an indicator inside the index file to warn these tools to not interact with a sparse index unless they are aware of sparse directory entries. Add a new _required_ index extension, 'sdir', that indicates that the index may contain sparse directory entries. This allows us to continue to use the differences in index formats 2, 3, and 4 before we create a new index version 5 in a later change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:46 -07:00
Derrick Stolee	0ad6090bdd	sparse-index: design doc and format update This begins a long effort to update the index format to allow sparse directory entries. This should result in a significant improvement to Git commands when HEAD contains millions of files, but the user has selected many fewer files to keep in their sparse-checkout definition. Currently, the index format is only updated in the presence of extensions.sparseIndex instead of increasing a file format version number. This is temporary, and index v5 is part of the plan for future work in this area. The design document details many of the reasons for embarking on this work, and also the plan for completing it safely. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-30 12:57:44 -07:00
Jeff Hostetler	066d5234d0	simple-ipc: design documentation for new IPC mechanism Brief design documentation for new IPC mechanism allowing foreground Git client to talk with an existing daemon process at a known location using a named pipe or unix domain socket. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-15 14:32:50 -07:00
Junio C Hamano	18aabfaee5	Merge branch 'hn/reftable-tables-doc-update' Documentation update. * hn/reftable-tables-doc-update: doc/reftable: document how to handle windows	2021-03-01 14:02:57 -08:00
Junio C Hamano	660dd97a62	Merge branch 'ds/chunked-file-api' The common code to deal with "chunked file format" that is shared by the multi-pack-index and commit-graph files have been factored out, to help codepaths for both filetypes to become more robust. * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn	2021-03-01 14:02:57 -08:00
Junio C Hamano	09e72204f8	Merge branch 'dl/doc-config-camelcase' A handful of multi-word configuration variable names in documentation that are spelled in all lowercase have been corrected to use the more canonical camelCase. * dl/doc-config-camelcase: index-format doc: camelCase core.excludesFile blame-options.txt: camelcase blame.blankBoundary i18n.txt: camel case and monospace "i18n.commitEncoding"	2021-02-25 16:43:32 -08:00
Junio C Hamano	f47c3328ef	Merge branch 'js/doc-proto-v2-response-end' Docfix. * js/doc-proto-v2-response-end: doc: fix naming of response-end-pkt	2021-02-25 16:43:30 -08:00
Junio C Hamano	11875561bf	Merge branch 'ds/chunked-file-api' into tb/reverse-midx * ds/chunked-file-api: commit-graph.c: display correct number of chunks when writing chunk-format: add technical docs chunk-format: restore duplicate chunk checks midx: use 64-bit multiplication for chunk sizes midx: use chunk-format read API commit-graph: use chunk-format read API chunk-format: create read chunk API midx: use chunk-format API in write_midx_internal() midx: drop chunk progress during write midx: return success/failure in chunk write methods midx: add num_large_offsets to write_midx_context midx: add pack_perm to write_midx_context midx: add entries to write_midx_context midx: use context in write_midx_pack_names() midx: rename pack_info to write_midx_context commit-graph: use chunk-format write API chunk-format: create chunk format write API commit-graph: anonymize data in chunk_write_fn	2021-02-24 15:26:14 -08:00
Junio C Hamano	7dd0eaa39c	index-format doc: camelCase core.excludesFile Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-24 15:21:25 -08:00
Han-Wen Nienhuys	00f68732e5	doc/reftable: document how to handle windows On Windows we can't delete or overwrite files opened by other processes. Here we sketch how to handle this situation. We propose to use a random element in the filename. It's possible to design an alternate solution based on counters, but that would assign semantics to the filenames that complicates implementation. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-23 10:01:21 -08:00
Junio C Hamano	dc24948be9	Merge branch 'ta/hash-function-transition-doc' Update formatting and grammar of the hash transition plan documentation, plus some updates. * ta/hash-function-transition-doc: doc: use https links doc hash-function-transition: move rationale upwards doc hash-function-transition: fix incomplete sentence doc hash-function-transition: use upper case consistently doc hash-function-transition: use SHA-1 and SHA-256 consistently doc hash-function-transition: fix asciidoc output	2021-02-22 16:12:43 -08:00
Derrick Stolee	a43a2e6c2a	chunk-format: add technical docs The chunk-based file format is now an API in the code, but we should also take time to document it as a file format. Specifically, it matches the CHUNK LOOKUP sections of the commit-graph and multi-pack-index files, but there are some commonalities that should be grouped in this document. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-18 13:38:16 -08:00
Junio C Hamano	69571dfe21	Merge branch 'jt/clone-unborn-head' "git clone" tries to locally check out the branch pointed at by HEAD of the remote repository after it is done, but the protocol did not convey the information necessary to do so when copying an empty repository. The protocol v2 learned how to do so. * jt/clone-unborn-head: clone: respect remote unborn HEAD connect, transport: encapsulate arg in struct ls-refs: report unborn targets of symrefs	2021-02-17 17:21:40 -08:00
Junio C Hamano	8b4701ae4f	Merge branch 'ak/corrected-commit-date' The commit-graph learned to use corrected commit dates instead of the generation number to help topological revision traversal. * ak/corrected-commit-date: doc: add corrected commit date info commit-reach: use corrected commit dates in paint_down_to_common() commit-graph: use generation v2 only if entire chain does commit-graph: implement generation data chunk commit-graph: implement corrected commit date commit-graph: return 64-bit generation number commit-graph: add a slab to store topological levels t6600-test-reach: generalize *_three_modes commit-graph: consolidate fill_commit_graph_info revision: parse parent in indegree_walk_step() commit-graph: fix regression when computing Bloom filters	2021-02-17 17:21:40 -08:00
Joey Salazar	9d336655ba	doc: fix naming of response-end-pkt Git Protocol version 2[1] defines 0002 as a Message Packet that indicates the end of a response for stateless connections. Change the naming of the 0002 Packet to 'Response End' to match the parsing introduced in Wireshark's MR !1922 for consistency. A subsequent MR in Wireshark will address additional mismatches. [1] kernel.org/pub/software/scm/git/docs/technical/protocol-v2.html [2] gitlab.com/wireshark/wireshark/-/merge_requests/1922 Signed-off-by: Joey Salazar <jgsal@protonmail.com> Reviewed-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-17 16:30:43 -08:00
Junio C Hamano	3c12d0b885	Merge branch 'tb/pack-revindex-on-disk' Introduce an on-disk file to record revindex for packdata, which traditionally was always created on the fly and only in-core. * tb/pack-revindex-on-disk: t5325: check both on-disk and in-memory reverse index pack-revindex: ensure that on-disk reverse indexes are given precedence t: support GIT_TEST_WRITE_REV_INDEX t: prepare for GIT_TEST_WRITE_REV_INDEX Documentation/config/pack.txt: advertise 'pack.writeReverseIndex' builtin/pack-objects.c: respect 'pack.writeReverseIndex' builtin/index-pack.c: write reverse indexes builtin/index-pack.c: allow stripping arbitrary extensions pack-write.c: prepare to write 'pack-.rev' files packfile: prepare for the existence of '.rev' files	2021-02-12 14:21:04 -08:00
Junio C Hamano	71e83b2e7d	Merge branch 'ma/doc-pack-format-varint-for-sizes' into maint Doc update. * ma/doc-pack-format-varint-for-sizes: pack-format.txt: document sizes at start of delta data	2021-02-08 14:05:54 -08:00
Junio C Hamano	a0a2d75d3b	Merge branch 'ds/cache-tree-basics' Document, clean-up and optimize the code around the cache-tree extension in the index. * ds/cache-tree-basics: cache-tree: speed up consecutive path comparisons cache-tree: use ce_namelen() instead of strlen() index-format: discuss recursion of cache-tree better index-format: update preamble to cache tree extension index-format: use 'cache tree' over 'cached tree' cache-tree: trace regions for prime_cache_tree cache-tree: trace regions for I/O cache-tree: use trace2 in cache_tree_update() unpack-trees: add trace2 regions tree-walk: report recursion counts	2021-02-05 16:40:44 -08:00
Junio C Hamano	93da9662d7	Merge branch 'jt/packfile-as-uri-doc' into maint Doc fix for packfile URI feature. * jt/packfile-as-uri-doc: Doc: clarify contents of packfile sent as URI	2021-02-05 16:31:28 -08:00
Jonathan Tan	59e1205d16	ls-refs: report unborn targets of symrefs When cloning, we choose the default branch based on the remote HEAD. But if there is no remote HEAD reported (which could happen if the target of the remote HEAD is unborn), we'll fall back to using our local init.defaultBranch. Traditionally this hasn't been a big deal, because most repos used "master" as the default. But these days it is likely to cause confusion if the server and client implementations choose different values (e.g., if the remote started with "main", we may choose "master" locally, create commits there, and then the user is surprised when they push to "master" and not "main"). To solve this, the remote needs to communicate the target of the HEAD symref, even if it is unborn, and "git clone" needs to use this information. Currently, symrefs that have unborn targets (such as in this case) are not communicated by the protocol. Teach Git to advertise and support the "unborn" feature in "ls-refs" (by default, this is advertised, but server administrators may turn this off through the lsrefs.unborn config). This feature indicates that "ls-refs" supports the "unborn" argument; when it is specified, "ls-refs" will send the HEAD symref with the name of its unborn target. This change is only for protocol v2. A similar change for protocol v0 would require independent protocol design (there being no analogous position to signal support for "unborn") and client-side plumbing of the data required, so the scope of this patch set is limited to protocol v2. The client side will be updated to use this in a subsequent commit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 13:49:53 -08:00
Thomas Ackermann	6eda9ac9e5	doc: use https links Use only https links for lore.kernel.org. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:10 -08:00
Thomas Ackermann	1d18997007	doc hash-function-transition: move rationale upwards Move rationale for new hash function to beginning of document so that it appears before the concrete move to SHA-256 is described. Remove some of the details about SHA-1 weaknesses and add references to the details on how the new hash function was chosen instead. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:10 -08:00
Thomas Ackermann	cc9f0916bd	doc hash-function-transition: fix incomplete sentence Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	810372f881	doc hash-function-transition: use upper case consistently Use upper case consistently in Document History. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	af9b1e9aba	doc hash-function-transition: use SHA-1 and SHA-256 consistently Use SHA-1 and SHA-256 instead of sha1 and sha256 when referring to the hash type. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Thomas Ackermann	de82095a95	doc hash-function-transition: fix asciidoc output Asciidoc requires lists to start with an empty line and uses different characters for indentation levels ("-", "", "*", ...). For special symbols like a dash "--" has to be used and there is no double arrow "<->", so a left and right arrow "<-->" has to be combined for that. Lastly for verbatim output a newline followed by an indentation has to be used. Fix asciidoc output for lists, special characters and verbatim text while retaining the readabilty of the original text file. Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-02-05 11:57:02 -08:00
Junio C Hamano	d03553ecd1	Merge branch 'jt/packfile-as-uri-doc' Doc fix for packfile URI feature. * jt/packfile-as-uri-doc: Doc: clarify contents of packfile sent as URI	2021-02-03 15:04:49 -08:00
Taylor Blau	2f4ba2a867	packfile: prepare for the existence of '.rev' files Specify the format of the on-disk reverse index 'pack-.rev' file, as well as prepare the code for the existence of such files. The reverse index maps from pack relative positions (i.e., an index into the array of object which is sorted by their offsets within the packfile) to their position within the 'pack-.idx' file. Today, this is done by building up a list of (off_t, uint32_t) tuples for each object (the off_t corresponding to that object's offset, and the uint32_t corresponding to its position in the index). To convert between pack and index position quickly, this array of tuples is radix sorted based on its offset. This has two major drawbacks: First, the in-memory cost scales linearly with the number of objects in a pack. Each 'struct revindex_entry' is sizeof(off_t) + sizeof(uint32_t) + padding bytes for a total of 16. To observe this, force Git to load the reverse index by, for e.g., running 'git cat-file --batch-check="%(objectsize:disk)"'. When asking for a single object in a fresh clone of the kernel, Git needs to allocate 120+ MB of memory in order to hold the reverse index in memory. Second, the cost to sort also scales with the size of the pack. Luckily, this is a linear function since 'load_pack_revindex()' uses a radix sort, but this cost still must be paid once per pack per process. As an example, it takes ~60x longer to print the _size_ of an object as it does to print that entire object's _contents_: Benchmark #1: git.compile cat-file --batch <obj Time (mean ± σ): 3.4 ms ± 0.1 ms [User: 3.3 ms, System: 2.1 ms] Range (min … max): 3.2 ms … 3.7 ms 726 runs Benchmark #2: git.compile cat-file --batch-check="%(objectsize:disk)" <obj Time (mean ± σ): 210.3 ms ± 8.9 ms [User: 188.2 ms, System: 23.2 ms] Range (min … max): 193.7 ms … 224.4 ms 13 runs Instead, avoid computing and sorting the revindex once per process by writing it to a file when the pack itself is generated. The format is relatively straightforward. It contains an array of uint32_t's, the length of which is equal to the number of objects in the pack. The ith entry in this table contains the index position of the ith object in the pack, where "ith object in the pack" is determined by pack offset. One thing that the on-disk format does _not_ contain is the full (up to) eight-byte offset corresponding to each object. This is something that the in-memory revindex contains (it stores an off_t in 'struct revindex_entry' along with the same uint32_t that the on-disk format has). Omit it in the on-disk format, since knowing the index position for some object is sufficient to get a constant-time lookup in the pack-.idx file to ask for an object's offset within the pack. This trades off between the on-disk size of the 'pack-.rev' file for runtime to chase down the offset for some object. Even though the lookup is constant time, the constant is heavier, since it can potentially involve two pointer walks in v2 indexes (one to access the 4-byte offset table, and potentially a second to access the double wide offset table). Consider trying to map an object's pack offset to a relative position within that pack. In a cold-cache scenario, more page faults occur while switching between binary searching through the reverse index and searching through the .idx file for an object's offset. Sure enough, with a cold cache (writing '3' into '/proc/sys/vm/drop_caches' after 'sync'ing), printing out the entire object's contents is still marginally faster than printing its size: Benchmark #1: git.compile cat-file --batch-check="%(objectsize:disk)" <obj >/dev/null Time (mean ± σ): 22.6 ms ± 0.5 ms [User: 2.4 ms, System: 7.9 ms] Range (min … max): 21.4 ms … 23.5 ms 41 runs Benchmark #2: git.compile cat-file --batch <obj >/dev/null Time (mean ± σ): 17.2 ms ± 0.7 ms [User: 2.8 ms, System: 5.5 ms] Range (min … max): 15.6 ms … 18.2 ms 45 runs (Numbers taken in the kernel after cheating and using the next patch to generate a reverse index). There are a couple of approaches to improve cold cache performance not pursued here: - We could include the object offsets in the reverse index format. Predictably, this does result in fewer page faults, but it triples the size of the file, while simultaneously duplicating a ton of data already available in the .idx file. (This was the original way I implemented the format, and it did show `--batch-check='%(objectsize:disk)'` winning out against `--batch`.) On the other hand, this increase in size also results in a large block-cache footprint, which could potentially hurt other workloads. - We could store the mapping from pack to index position in more cache-friendly way, like constructing a binary search tree from the table and writing the values in breadth-first order. This would result in much better locality, but the price you pay is trading O(1) lookup in 'pack_pos_to_index()' for an O(log n) one (since you can no longer directly index the table). So, neither of these approaches are taken here. (Thankfully, the format is versioned, so we are free to pursue these in the future.) But, cold cache performance likely isn't interesting outside of one-off cases like asking for the size of an object directly. In real-world usage, Git is often performing many operations in the revindex (i.e., asking about many objects rather than a single one). The trade-off is worth it, since we will avoid the vast majority of the cost of generating the revindex that the extra pointer chase will look like noise in the following patch's benchmarks. This patch describes the format and prepares callers (like in pack-revindex.c) to be able to read *.rev files once they exist. An implementation of the writer will appear in the next patch, and callers will gradually begin to start using the writer in the patches that follow after that. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-25 18:32:43 -08:00
Jonathan Tan	bfc2a36ff2	Doc: clarify contents of packfile sent as URI Clarify that, when the packfile-uri feature is used, the client should not assume that the extra packfiles downloaded would only contain a single blob, but support packfiles containing multiple objects of all types. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-20 19:06:50 -08:00
Abhishek Kumar	5a3b130cad	doc: add corrected commit date info With generation data chunk and corrected commit dates implemented, let's update the technical documentation for commit-graph. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Reviewed-by: Taylor Blau <me@ttaylorr.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-18 16:21:18 -08:00
Derrick Stolee	4bdde337f4	index-format: discuss recursion of cache-tree better The end of the cache tree index extension format trails off with ellipses ever since `23fcc98` (doc: technical details about the index file format, 2011-03-01). While an intuitive reader could gather what this means, it could be better to use "and so on" instead. Really, this is only justified because I also wanted to point out that the number of subtrees in the index format is used to determine when the recursive depth-first-search stack should be "popped." This should help to add clarity to the format. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:59 -08:00
Derrick Stolee	22ad8600c1	index-format: update preamble to cache tree extension I had difficulty in my efforts to learn about the cache tree extension based on the documentation and code because I had an incorrect assumption about how it behaved. This might be due to some ambiguity in the documentation, so this change modifies the beginning of the cache tree format by expanding the description of the feature. My hope is that this documentation clarifies a few things: 1. There is an in-memory recursive tree structure that is constructed from the extension data. This structure has a few differences, such as where the name is stored. 2. What does it mean for an entry to be invalid? 3. When exactly are "new" trees created? Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:46 -08:00
Derrick Stolee	845d15d4d0	index-format: use 'cache tree' over 'cached tree' The index has a "cache tree" extension. This corresponds to a significant API implemented in cache-tree.[ch]. However, there are a few places that refer to this erroneously as "cached tree". These are rare, but notably the index-format.txt file itself makes this error. The only other reference is in t7104-reset-hard.sh. Reported-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-15 23:04:38 -08:00
Junio C Hamano	16a8055dae	Merge branch 'ma/doc-pack-format-varint-for-sizes' Doc update. * ma/doc-pack-format-varint-for-sizes: pack-format.txt: document sizes at start of delta data	2021-01-15 15:20:29 -08:00
Martin Ågren	7b77f5a13e	pack-format.txt: document sizes at start of delta data We document the delta data as a set of instructions, but forget to document the two sizes that precede those instructions: the size of the base object and the size of the object to be reconstructed. Fix this omission. Rather than cramming all the details about the encoding into the running text, introduce a separate section detailing our "size encoding" and refer to it. Reported-by: Ross Light <ross@zombiezen.com> Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:00:28 -08:00
Thomas Ackermann	7efc378205	doc: fix some typos Signed-off-by: Thomas Ackermann <th.acker@arcor.de> Acked-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 11:27:48 -08:00
Junio C Hamano	7bceb83bfe	Merge branch 'jh/index-v2-doc-on-fsmn' Doc update. * jh/index-v2-doc-on-fsmn: index-format.txt: document v2 format of file system monitor extension	2020-12-17 15:06:42 -08:00
Junio C Hamano	94dc98d1d2	Merge branch 'jb/midx-doc-update' Doc update. * jb/midx-doc-update: docs: multi-pack-index: remove note about future 'verify' work	2020-12-17 15:06:41 -08:00
Jeff Hostetler	5885367e8f	index-format.txt: document v2 format of file system monitor extension Update the documentation of the file system monitor extension to describe version 2. The format was extended to support opaque tokens in: `56c6910028` fsmonitor: change last update timestamp on the index_state to opaque token Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:42:23 -08:00
Johannes Berg	633eebe142	docs: multi-pack-index: remove note about future 'verify' work This was implemented in the 'git multi-pack-index' command and merged in `468b3221` (Merge branch 'ds/multi-pack-verify', 2018-10-10). And there's no 'git midx' command. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-14 08:39:08 -08:00
Junio C Hamano	01b8886a62	Merge branch 'js/trace2-session-id' The transport layer was taught to optionally exchange the session ID assigned by the trace2 subsystem during fetch/push transactions. * js/trace2-session-id: receive-pack: log received client session ID send-pack: advertise session ID in capabilities upload-pack, serve: log received client session ID fetch-pack: advertise session ID in capabilities transport: log received server session ID serve: advertise session ID in v2 capabilities receive-pack: advertise session ID in v0 capabilities upload-pack: advertise session ID in v0 capabilities trace2: add a public function for getting the SID docs: new transfer.advertiseSID option docs: new capability to advertise session IDs	2020-12-08 15:11:20 -08:00
Junio C Hamano	2aeafbc896	Merge branch 'jt/trace-error-on-warning' Like die() and error(), a call to warning() will also trigger a trace2 event. * jt/trace-error-on-warning: usage: add trace2 entry upon warning()	2020-12-08 15:11:17 -08:00
Jonathan Tan	0ee10fd129	usage: add trace2 entry upon warning() Emit a trace2 error event whenever warning() is called, just like when die(), error(), or usage() is called. This helps debugging issues that would trigger warnings but not errors. In particular, this might have helped debugging an issue I encountered with commit graphs at $DAYJOB [1]. There is a tradeoff between including potentially relevant messages and cluttering up the trace output produced. I think that warning() messages should be included in traces, because by its nature, Git is used over multiple invocations of the Git tool, and a failure (currently traced) in a Git invocation might be caused by an unexpected interaction in a previous Git invocation that only has a warning (currently untraced) as a symptom - as is the case in [1]. [1] https://lore.kernel.org/git/20200629220744.1054093-1-jonathantanmy@google.com/ Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-24 17:39:38 -08:00
Josh Steadmon	f5cdbe485f	docs: new capability to advertise session IDs In future patches, we will add the ability for Git servers and clients to advertise unique session IDs via protocol capabilities. This allows for easier debugging when both client and server logs are available. Signed-off-by: Josh Steadmon <steadmon@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 18:26:52 -08:00
Elijah Newren	c64432aacd	t6423: more involved rules for renaming directories into each other Testcases 12b and 12c were both slightly weird; they were marked as having a weird resolution, but with the note that even straightforward simple rules can give weird results when the input is bizarre. However, during optimization work for merge-ort, I discovered a significant speedup that is possible if we add one more fairly straightforward rule: we don't bother doing directory rename detection if there are no new files added to the directory on the other side of the history to be affected by the directory rename. This seems like an obvious and straightforward rule, but there was one funny corner case where directory rename detection could affect only existing files: the funny corner case where two directories are renamed into each other on opposite sides of history. In other words, it only results in a different output for testcases 12b and 12c. Since we already thought testcases 12b and 12c were weird anyway, and because the optimization often has a significant effect on common cases (but is entirely prevented if we can't change how 12b and 12c function), let's add the additional rule and tweak how 12b and 12c work. Split both testcases into two (one where we add no new files, and one where the side that doesn't rename a given directory will add files to it), and mark them with the new expectation. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-16 12:29:28 -07:00
Elijah Newren	8536821d05	t6423: update directory rename detection tests with new rule While investigating the issues highlighted by the testcase in the previous patch, I also found a shortcoming in the directory rename detection rules. Split testcase 6b into two to explain this issue and update directory-rename-detection.txt to remove one of the previous rules that I know believe to be detrimental. Also, update the wording around testcase 8e; while we are not modifying the results of that testcase, we were previously unsure of the appropriate resolution of that test and the new rule makes the previously chosen resolution for that testcase a bit more solid. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-16 12:29:28 -07:00
Elijah Newren	b9718d0cc9	directory-rename-detection.txt: update references to regression tests The regression tests for directory rename detection were renamed from t6043 to t6423 in commit `919df31955` ("Collect merge-related tests to t64xx", 2020-08-10); update this file to match. Also, add a small clarification to nearby text while we're at it. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-16 12:29:27 -07:00
Junio C Hamano	288ed98bf7	Merge branch 'tb/bloom-improvements' "git commit-graph write" learned to limit the number of bloom filters that are computed from scratch with the --max-new-filters option. * tb/bloom-improvements: commit-graph: introduce 'commitGraph.maxNewFilters' builtin/commit-graph.c: introduce '--max-new-filters=<n>' commit-graph: rename 'split_commit_graph_opts' bloom: encode out-of-bounds filters as non-empty bloom/diff: properly short-circuit on max_changes bloom: use provided 'struct bloom_filter_settings' bloom: split 'get_bloom_filter()' in two commit-graph.c: store maximum changed paths commit-graph: respect 'commitGraph.readChangedPaths' t/helper/test-read-graph.c: prepare repo settings commit-graph: pass a 'struct repository *' in more places t4216: use an '&&'-chain commit-graph: introduce 'get_bloom_filter_settings()'	2020-09-29 14:01:20 -07:00
Junio C Hamano	6c430a647c	Merge branch 'jx/proc-receive-hook' "git receive-pack" that accepts requests by "git push" learned to outsource most of the ref updates to the new "proc-receive" hook. * jx/proc-receive-hook: doc: add documentation for the proc-receive hook transport: parse report options for tracking refs t5411: test updates of remote-tracking branches receive-pack: new config receive.procReceiveRefs doc: add document for capability report-status-v2 New capability "report-status-v2" for git-push receive-pack: feed report options to post-receive receive-pack: add new proc-receive hook t5411: add basic test cases for proc-receive hook transport: not report a non-head push as a branch	2020-09-25 15:25:39 -07:00
Junio C Hamano	4d515253af	Merge branch 'cd/commit-graph-doc' Doc update. * cd/commit-graph-doc: commit-graph-format.txt: fix no-parent value	2020-09-22 12:36:30 -07:00
Taylor Blau	59f0d5073f	bloom: encode out-of-bounds filters as non-empty When a changed-path Bloom filter has either zero, or more than a certain number (commonly 512) of entries, the commit-graph machinery encodes it as "missing". More specifically, it sets the indices adjacent in the BIDX chunk as equal to each other to indicate a "length 0" filter; that is, that the filter occupies zero bytes on disk. This has heretofore been fine, since the commit-graph machinery has no need to care about these filters with too few or too many changed paths. Both cases act like no filter has been generated at all, and so there is no need to store them. In a subsequent commit, however, the commit-graph machinery will learn to only compute Bloom filters for some commits in the current commit-graph layer. This is a change from the current implementation which computes Bloom filters for all commits that are in the layer being written. Critically for this patch, only computing some of the Bloom filters means adding a third state for length 0 Bloom filters: zero entries, too many entries, or "hasn't been computed". It will be important for that future patch to distinguish between "not representable" (i.e., zero or too-many changed paths), and "hasn't been computed". In particular, we don't want to waste time recomputing filters that have already been computed. To that end, change how we store Bloom filters in the "computed but not representable" category: - Bloom filters with no entries are stored as a single byte with all bits low (i.e., all queries to that Bloom filter will return "definitely not") - Bloom filters with too many entries are stored as a single byte with all bits set high (i.e., all queries to that Bloom filter will return "maybe"). These rules are sufficient to not incur a behavior change by changing the on-disk representation of these two classes. Likewise, no specification changes are necessary for the commit-graph format, either: - Filters that were previously empty will be recomputed and stored according to the new rules, and - old clients reading filters generated by new clients will interpret the filters correctly and be none the wiser to how they were generated. Clients will invoke the Bloom machinery in more cases than before, but this can be addressed by returning a NULL filter when all bits are set high. This can be addressed in a future patch. Note that this does increase the size of on-disk commit-graphs, but far less than other proposals. In particular, this is generally more efficient than storing a bitmap for which commits haven't computed their Bloom filters. Storing a bitmap incurs a penalty of one bit per commit, whereas storing explicit filters as above incurs a penalty of one byte per too-large or empty commit. In practice, these boundary commits likely occupy a small proportion of the overall number of commits, and so the size penalty is likely smaller than storing a bitmap for all commits. See, for example, these relative proportions of such boundary commits (collected by SZEDER Gábor): \| Percentage of \| commit-graph \| \| \| commits modifying \| file size \| \| ├────────┬──────────────┼───────────────────┤ pct. \| \| 0 path \| >= 512 paths \| before \| after \| change \| ┌────────────────┼────────┼──────────────┼─────────┼─────────┼───────────┤ \| android-base \| 13.20% \| 0.13% \| 37.468M \| 37.534M \| +0.1741 % \| \| cmssw \| 0.15% \| 0.23% \| 17.118M \| 17.119M \| +0.0091 % \| \| cpython \| 3.07% \| 0.01% \| 7.967M \| 7.971M \| +0.0423 % \| \| elasticsearch \| 0.70% \| 1.00% \| 8.833M \| 8.835M \| +0.0128 % \| \| gcc \| 0.00% \| 0.08% \| 16.073M \| 16.074M \| +0.0030 % \| \| gecko-dev \| 0.14% \| 0.64% \| 59.868M \| 59.874M \| +0.0105 % \| \| git \| 0.11% \| 0.02% \| 3.895M \| 3.895M \| +0.0020 % \| \| glibc \| 0.02% \| 0.10% \| 3.555M \| 3.555M \| +0.0021 % \| \| go \| 0.00% \| 0.07% \| 3.186M \| 3.186M \| +0.0018 % \| \| homebrew-cask \| 0.40% \| 0.02% \| 7.035M \| 7.035M \| +0.0065 % \| \| homebrew-core \| 0.01% \| 0.01% \| 11.611M \| 11.611M \| +0.0002 % \| \| jdk \| 0.26% \| 5.64% \| 5.537M \| 5.540M \| +0.0590 % \| \| linux \| 0.01% \| 0.51% \| 63.735M \| 63.740M \| +0.0073 % \| \| llvm-project \| 0.12% \| 0.03% \| 25.515M \| 25.516M \| +0.0050 % \| \| rails \| 0.10% \| 0.10% \| 6.252M \| 6.252M \| +0.0027 % \| \| rust \| 0.07% \| 0.17% \| 9.364M \| 9.364M \| +0.0033 % \| \| tensorflow \| 0.09% \| 1.02% \| 7.009M \| 7.010M \| +0.0158 % \| \| webkit \| 0.05% \| 0.31% \| 17.405M \| 17.406M \| +0.0047 % \| (where the above increase is determined by computing a non-split commit-graph before and after this patch). Given that these projects are all "large" by commit count, the storage cost by writing these filters explicitly is negligible. In the most extreme example, android-base (which has 494,848 commits at the time of writing) would have its commit-graph increase by a modest 68.4 KB. Finally, a test to exercise filters which contain too many changed path entries will be introduced in a subsequent patch. Suggested-by: SZEDER Gábor <szeder.dev@gmail.com> Suggested-by: Jakub Narębski <jnareb@gmail.com> Helped-by: Derrick Stolee <dstolee@microsoft.com> Helped-by: SZEDER Gábor <szeder.dev@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 21:55:50 -07:00
Conor Davis	e40e936551	commit-graph-format.txt: fix no-parent value The correct value from commit-graph.c: #define GRAPH_PARENT_NONE 0x70000000 Signed-off-by: Conor Davis <git@conor.fastmail.fm> Reviewed-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-15 14:34:34 -07:00
Junio C Hamano	b4100f366c	Merge branch 'jt/lazy-fetch' Updates to on-demand fetching code in lazily cloned repositories. * jt/lazy-fetch: fetch: no FETCH_HEAD display if --no-write-fetch-head fetch-pack: remove no_dependents code promisor-remote: lazy-fetch objects in subprocess fetch-pack: do not lazy-fetch during ref iteration fetch: only populate existing_refs if needed fetch: avoid reading submodule config until needed fetch: allow refspecs specified through stdin negotiator/noop: add noop fetch negotiator	2020-09-03 12:37:04 -07:00
Jiang Xin	b913075cb8	doc: add document for capability report-status-v2 Add ABNF notation for capability 'report-status-v2' which extends capability 'report-status' by adding additional option lines. Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-27 12:47:47 -07:00
Junio C Hamano	6f8a2138b9	Merge branch 'ds/sha256-leftover-bits' midx and commit-graph files now use the byte defined in their file format specification for identifying the hash function used for object names. * ds/sha256-leftover-bits: multi-pack-index: use hash version byte commit-graph: use the "hash version" byte t/README: document GIT_TEST_DEFAULT_HASH	2020-08-19 16:14:53 -07:00
Junio C Hamano	74a395c484	Merge branch 'ma/sha-256-docs' Further update of docs to adjust to the recent SHA-256 work. * ma/sha-256-docs: shallow.txt: document SHA-256 shallow format protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 index-format.txt: document SHA-256 index format http-protocol.txt: document SHA-256 "want"/"have" format	2020-08-19 16:14:52 -07:00
Junio C Hamano	336fbd18bb	Merge branch 'bc/sha-256-doc-updates' Further update of docs to adjust to the recent SHA-256 work. * bc/sha-256-doc-updates: docs: fix step in transition plan docs: document SHA-256 pack and indices	2020-08-19 16:14:51 -07:00
Junio C Hamano	ecc796caa2	Merge branch 'jb/commit-graph-doc-fix' Docfix. * jb/commit-graph-doc-fix: docs: commit-graph: fix some whitespace in the diagram	2020-08-19 16:14:49 -07:00
Jonathan Tan	7ca3c0ac37	promisor-remote: lazy-fetch objects in subprocess Teach Git to lazy-fetch missing objects in a subprocess instead of doing it in-process. This allows any fatal errors that occur during the fetch to be isolated and converted into an error return value, instead of causing the current command being run to terminate. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-18 16:46:53 -07:00
Derrick Stolee	d96075428a	multi-pack-index: use hash version byte Similar to the commit-graph format, the multi-pack-index format has a byte in the header intended to track the hash version used to write the file. This allows one to interpret the hash length without having the context of the repository config specifying the hash length. This was not modified as part of the SHA-256 work because the hash length was automatically up-shifted due to that config. Since we have this byte available, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Add a new oid_version() method in midx.c similar to the one in commit-graph.c. This is specifically made separate from that implementation to avoid artificially linking the formats. The test impact requires a few more things than the corresponding change in the commit-graph format. Specifically, 'test-tool read-midx' was not writing anything about this header value to output. Since the value available in 'struct multi_pack_index' is hash_len instead of a version value, we output "20" or "32" instead of "1" or "2". Since we want a user to not have their Git commands fail if their multi-pack-index has the incorrect hash version compared to the repository's hash version, we relax the die() to an error() in load_multi_pack_index(). This has some effect on 'git multi-pack-index verify' as we need to check that a failed parse of a file that exists is actually a verify error. For that test that checks the hash version matches, we change the corrupted byte from "2" to "3" to ensure the test fails for both hash algorithms. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 16:45:20 -07:00
Derrick Stolee	665d70ad03	commit-graph: use the "hash version" byte The commit-graph format reserved a byte among the header of the file to store a "hash version". During the SHA-256 work, this was not modified because file formats are not necessarily intended to work across hash versions. If a repository has SHA-256 as its hash algorithm, it automatically up-shifts the lengths of object names in all necessary formats. However, since we have this byte available for adjusting the version, we can make the file formats more obviously incompatible instead of relying on other context from the repository. Update the oid_version() method in commit-graph.c to add a new value, 2, for sha-256. This automatically writes the new value in a SHA-256 repository _and_ verifies the value is correct. This is a breaking change relative to the current 'master' branch since `092b677` (Merge branch 'bc/sha-256-cvs-svn-updates', 2020-08-13) but it is not breaking relative to any released version of Git. The test impact is relatively minor: the output of 'test-tool read-graph' lists the header information, so those instances of '1' need to be replaced with a variable determined by GIT_TEST_DEFAULT_HASH. A more careful test is added that specifically creates a repository of each type then swaps the commit-graph files. The important value here is that the "git log" command succeeds while writing a message to stderr. Helped-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 16:45:14 -07:00
Martin Ågren	8afa50aabc	shallow.txt: document SHA-256 shallow format Similar to recent commits, document that we list object names rather than SHA-1s. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 10:35:13 -07:00
Martin Ågren	0756e61078	protocol-capabilities.txt: clarify "allow-x-sha1-in-want" re SHA-256 Two of our capabilities contain "sha1" in their names, but that's historical. Clarify that object names are still to be given using whatever object format has been negotiated using the "object-format" capability. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 10:35:12 -07:00
Martin Ågren	123712ba41	index-format.txt: document SHA-256 index format Document that in SHA-1 repositories, we use SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all other uses of "SHA-1" with something more neutral. Avoid referring to "160-bit" hash values. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 10:35:12 -07:00
Martin Ågren	5b6422a616	http-protocol.txt: document SHA-256 "want"/"have" format Document that rather than always naming objects using SHA-1, we should use whatever has been negotiated using the object-format capability. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-17 10:35:12 -07:00
brian m. carlson	2ae12e568b	docs: fix step in transition plan One of the required steps for the objectFormat extension is to implement the loose object index. However, without support for compatObjectFormat, we don't even know if the loose object index is needed, so it makes sense to move that step to the compatObjectFormat section. Do so. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-13 18:09:55 -07:00
brian m. carlson	17420eafa9	docs: document SHA-256 pack and indices Now that we have SHA-256 support for packs and indices, let's document that in SHA-256 repositories, we use SHA-256 instead of SHA-1 for object names and checksums. Instead of duplicating this information throughout the document, let's just document that in SHA-1 repositories, we use SHA-1 for these purposes, and in SHA-256 repositories, we use SHA-256. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-13 18:09:36 -07:00
Johannes Berg	6dfefe70a9	docs: commit-graph: fix some whitespace in the diagram In the merge diagram, some whitespace is missing which makes it a bit confusing, fix that. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-13 11:06:04 -07:00
Junio C Hamano	e0ad9574dd	Merge branch 'bc/sha-256-part-3' The final leg of SHA-256 transition. * bc/sha-256-part-3: (39 commits) t: remove test_oid_init in tests docs: add documentation for extensions.objectFormat ci: run tests with SHA-256 t: make SHA1 prerequisite depend on default hash t: allow testing different hash algorithms via environment t: add test_oid option to select hash algorithm repository: enable SHA-256 support by default setup: add support for reading extensions.objectformat bundle: add new version for use with SHA-256 builtin/verify-pack: implement an --object-format option http-fetch: set up git directory before parsing pack hashes t0410: mark test with SHA1 prerequisite t5308: make test work with SHA-256 t9700: make hash size independent t9500: ensure that algorithm info is preserved in config t9350: make hash size independent t9301: make hash size independent t9300: use $ZERO_OID instead of hard-coded object ID t9300: abstract away SHA-1-specific constants t8011: make hash size independent ...	2020-08-11 18:04:11 -07:00
Junio C Hamano	46b225f153	Merge branch 'jk/strvec' The argv_array API is useful for not just managing argv but any "vector" (NULL-terminated array) of strings, and has seen adoption to a certain degree. It has been renamed to "strvec" to reduce the barrier to adoption. * jk/strvec: strvec: rename struct fields strvec: drop argv_array compatibility layer strvec: update documention to avoid argv_array strvec: fix indentation in renamed calls strvec: convert remaining callers away from argv_array name strvec: convert more callers away from argv_array name strvec: convert builtin/ callers away from argv_array name quote: rename sq_dequote_to_argv_array to mention strvec strvec: rename files from argv-array to strvec argv-array: rename to strvec argv-array: use size_t for count and alloc	2020-08-10 10:23:57 -07:00
Junio C Hamano	de6dda0dc3	Merge branch 'sg/commit-graph-cleanups' into master The changed-path Bloom filter is improved using ideas from an independent implementation. * sg/commit-graph-cleanups: commit-graph: simplify write_commit_graph_file() #2 commit-graph: simplify write_commit_graph_file() #1 commit-graph: simplify parse_commit_graph() #2 commit-graph: simplify parse_commit_graph() #1 commit-graph: clean up #includes diff.h: drop diff_tree_oid() & friends' return value commit-slab: add a function to deep free entries on the slab commit-graph-format.txt: all multi-byte numbers are in network byte order commit-graph: fix parsing the Chunk Lookup table tree-walk.c: don't match submodule entries for 'submod/anything'	2020-07-30 13:20:30 -07:00
brian m. carlson	c5aecfc866	bundle: add new version for use with SHA-256 Currently we detect the hash algorithm in use by the length of the object ID. This is inelegant and prevents us from using a different hash algorithm that is also 256 bits in length. Since we cannot extend the v2 format in a backward-compatible way, let's add a v3 format, which is identical, except for the addition of capabilities, which are prefixed by an at sign. We add "object-format" as the only capability and reject unknown capabilities, since we do not have a network connection and therefore cannot negotiate with the other side. For compatibility, default to the v2 format for SHA-1 and require v3 for SHA-256. In t5510, always use format v3 so we can be sure we produce consistent results across hash algorithms. Since head -n N lists the top N lines instead of the Nth line, let's run our output through sed to normalize it and compare it against a fixed value, which will make sure we get exactly what we're expecting. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-30 09:16:48 -07:00
Jeff King	837dc425cf	strvec: update documention to avoid argv_array There were a few mentions of argv_array in a non-code file which didn't get picked up in the previous commits (note that even comments in code files were already covered because of the mechanical conversion via perl). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-07-28 15:02:18 -07:00
Junio C Hamano	12210859da	Merge branch 'bc/sha-256-part-2' SHA-256 migration work continues. * bc/sha-256-part-2: (44 commits) remote-testgit: adapt for object-format bundle: detect hash algorithm when reading refs t5300: pass --object-format to git index-pack t5704: send object-format capability with SHA-256 t5703: use object-format serve option t5702: offer an object-format capability in the test t/helper: initialize the repository for test-sha1-array remote-curl: avoid truncating refs with ls-remote t1050: pass algorithm to index-pack when outside repo builtin/index-pack: add option to specify hash algorithm remote-curl: detect algorithm for dumb HTTP by size builtin/ls-remote: initialize repository based on fetch t5500: make hash independent serve: advertise object-format capability for protocol v2 connect: parse v2 refs with correct hash algorithm connect: pass full packet reader when parsing v2 refs Documentation/technical: document object-format for protocol v2 t1302: expect repo format version 1 for SHA-256 builtin/show-index: provide options to determine hash algo t5302: modernize test formatting ...	2020-07-06 22:09:13 -07:00
Junio C Hamano	34e849b05a	Merge branch 'jt/cdn-offload' The "fetch/clone" protocol has been updated to allow the server to instruct the clients to grab pre-packaged packfile(s) in addition to the packed object data coming over the wire. * jt/cdn-offload: upload-pack: fix a sparse '0 as NULL pointer' warning upload-pack: send part of packfile response as uri fetch-pack: support more than one pack lockfile upload-pack: refactor reading of pack-objects out Documentation: add Packfile URIs design doc Documentation: order protocol v2 sections http-fetch: support fetching packfiles by URL http-fetch: refactor into function http: refactor finish_http_pack_request() http: use --stdin when indexing dumb HTTP pack	2020-06-25 12:27:47 -07:00
Junio C Hamano	eebb51ba8c	Merge branch 'hn/refs-cleanup' Preliminary clean-ups around refs API, plus file format specification documentation for the reftable backend. * hn/refs-cleanup: reftable: define version 2 of the spec to accomodate SHA256 reftable: clarify how empty tables should be written reftable: file format documentation refs: improve documentation for ref iterator t: use update-ref and show-ref to reading/writing refs refs.h: clarify reflog iteration order	2020-06-12 13:57:13 -07:00
Jonathan Tan	cd8402e0fd	Documentation: add Packfile URIs design doc Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-10 18:06:34 -07:00
Jonathan Tan	fd194dd56a	Documentation: order protocol v2 sections The current C Git implementation expects Git servers to follow a specific order of sections when transmitting protocol v2 responses, but this is not explicit in the documentation. Make the order explicit. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-10 18:06:34 -07:00
Han-Wen Nienhuys	ee9681d949	reftable: define version 2 of the spec to accomodate SHA256 Version appends a hash ID to the file header, making it slightly larger. This commit also changes "SHA-1" into "object ID" in many places. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-09 13:48:36 -07:00
Han-Wen Nienhuys	10f007c370	reftable: clarify how empty tables should be written The format allows for some ambiguity, as a lone footer also starts with a valid file header. However, the current JGit code will barf on this. This commit codifies this behavior into the standard. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-09 13:48:36 -07:00
Jonathan Nieder	35e6c47404	reftable: file format documentation Shawn Pearce explains: Some repositories contain a lot of references (e.g. android at 866k, rails at 31k). The reftable format provides: - Near constant time lookup for any single reference, even when the repository is cold and not in process or kernel cache. - Near constant time verification if a SHA-1 is referred to by at least one reference (for allow-tip-sha1-in-want). - Efficient lookup of an entire namespace, such as `refs/tags/`. - Support atomic push `O(size_of_update)` operations. - Combine reflog storage with ref storage. This file format spec was originally written in July, 2017 by Shawn Pearce. Some refinements since then were made by Shawn and by Han-Wen Nienhuys based on experiences implementing and experimenting with the format. (All of this was in the context of our work at Google and Google is happy to contribute the result to the Git project.) Imported from JGit[1]'s current version (c217d33ff, "Documentation/technical/reftable: improve repo layout", 2020-02-04) of Documentation/technical/reftable.md and converted to asciidoc by running pandoc -t asciidoc -f markdown reftable.md >reftable.txt using pandoc 2.2.1. The result required the following additional minor changes: - removed the [TOC] directive to add a table of contents, since asciidoc does not support it - replaced git-scm.com/docs links with linkgit: directives that link to other pages within Git's documentation [1] https://eclipse.googlesource.com/jgit/jgit Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-09 13:48:17 -07:00
Junio C Hamano	b37fd14beb	Merge branch 'dl/remote-curl-deadlock-fix' On-the-wire protocol v2 easily falls into a deadlock between the remote-curl helper and the fetch-pack process when the server side prematurely throws an error and disconnects. The communication has been updated to make it more robust. * dl/remote-curl-deadlock-fix: stateless-connect: send response end packet pkt-line: define PACKET_READ_RESPONSE_END remote-curl: error on incomplete packet pkt-line: extern packet_length() transport: extract common fetch_pack() call remote-curl: remove label indentation remote-curl: fix typo	2020-06-08 18:06:30 -07:00
SZEDER Gábor	6141cdfdcb	commit-graph-format.txt: all multi-byte numbers are in network byte order The commit-graph format specifies that "All 4-byte numbers are in network order", but the commit-graph contains 8-byte integers as well (file offsets in the Chunk Lookup table), and their byte order is unspecified. Clarify that all multi-byte integers are in network byte order. Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-08 12:28:49 -07:00
Junio C Hamano	70a1e331b0	Merge branch 'jx/pkt-line-doc-count-fix' Docfix. * jx/pkt-line-doc-count-fix: doc: fix wrong 4-byte length of pkt-line message	2020-06-02 13:35:02 -07:00
brian m. carlson	7f46e7ead1	Documentation/technical: document object-format for protocol v2 Document the object-format extension for protocol v2. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-27 10:07:07 -07:00
Denton Liu	b0df0c16ea	stateless-connect: send response end packet Currently, remote-curl acts as a proxy and blindly forwards packets between an HTTP server and fetch-pack. In the case of a stateless RPC connection where the connection is terminated before the transaction is complete, remote-curl will blindly forward the packets before waiting on more input from fetch-pack. Meanwhile, fetch-pack will read the transaction and continue reading, expecting more input to continue the transaction. This results in a deadlock between the two processes. This can be seen in the following command which does not terminate: $ git -c protocol.version=2 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... whereas the v1 version does terminate as expected: $ git -c protocol.version=1 clone https://github.com/git/git.git --shallow-since=20151012 Cloning into 'git'... fatal: the remote end hung up unexpectedly Instead of blindly forwarding packets, make remote-curl insert a response end packet after proxying the responses from the remote server when using stateless_connect(). On the RPC client side, ensure that each response ends as described. A separate control packet is chosen because we need to be able to differentiate between what the remote server sends and remote-curl's control packets. By ensuring in the remote-curl code that a server cannot send response end packets, we prevent a malicious server from being able to perform a denial of service attack in which they spoof a response end packet and cause the described deadlock to happen. Reported-by: Force Charlie <charlieio@outlook.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-24 16:26:00 -07:00
Jiuyang Xie	2c31a7aa44	doc: fix wrong 4-byte length of pkt-line message The first four bytes of the line, the pkt-len, indicates the total length of the pkt-line in hexadecimal. Fix wrong pkt-len headers of some pkt-line messages in `http-protocol.txt` and `pack-protocol.txt`. Reviewed-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Jiuyang Xie <jiuyang.xjy@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-21 10:52:01 -07:00
Junio C Hamano	4b1e5e5d8c	Merge branch 'ds/bloom-cleanup' Code cleanup and typofixes * ds/bloom-cleanup: completion: offer '--(no-)patch' among 'git log' options bloom: use num_changes not nr for limit detection bloom: de-duplicate directory entries Documentation: changed-path Bloom filters use byte words bloom: parse commit before computing filters test-bloom: fix usage typo bloom: fix whitespace around tab length	2020-05-14 14:39:44 -07:00
brian m. carlson	b8615c3c63	Documentation: document v1 protocol object-format capability Document a capability that indicates which hash algorithms are in use by both sides of a remote connection. Use the term "object-format", since this is the term used for the repository extension as well. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-13 18:14:08 -07:00
Derrick Stolee	88093289cd	Documentation: changed-path Bloom filters use byte words In Documentation/technical/commit-graph-format.txt, the definition of the BIDX chunk specifies the length is a number of 8-byte words. During development we discovered that using 8-byte words in the Murmur3 hash algorithm causes issues with big-endian versus little- endian machines. Thus, the hash algorithm was adapted to work on a byte-by-byte basis. However, this caused a change in the definition of a "word" in bloom.h. Now, a "word" is a single byte, which allows filters to be as small as two bytes. These length-two filters are demonstrated in t0095-bloom.sh, and a larger filter of length 25 is demonstrated as well. The original point of using 8-byte words was for alignment reasons. It also presented opportunities for extremely sparse Bloom filters when there were a small number of changes at a commit, creating a very low false-positive rate. However, modifying the format at this point is unlikely to be a valuable exercise. Also, this use of single-byte granularity does present opportunities to save space. It is unclear if 8-byte alignment of the filters would present any meaningful performance benefits. Modify the format document to reflect reality. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-11 09:33:56 -07:00
Junio C Hamano	9b6606f43d	Merge branch 'gs/commit-graph-path-filter' Introduce an extension to the commit-graph to make it efficient to check for the paths that were modified at each commit using Bloom filters. * gs/commit-graph-path-filter: bloom: ignore renames when computing changed paths commit-graph: add GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS test flag t4216: add end to end tests for git log with Bloom filters revision.c: add trace2 stats around Bloom filter usage revision.c: use Bloom filters to speed up path based revision walks commit-graph: add --changed-paths option to write subcommand commit-graph: reuse existing Bloom filters during write commit-graph: write Bloom filters to commit graph file commit-graph: examine commits by generation number commit-graph: examine changed-path objects in pack order commit-graph: compute Bloom filters for changed paths diff: halt tree-diff early after max_changes bloom.c: core Bloom filter implementation for changed paths. bloom.c: introduce core Bloom filter constructs bloom.c: add the murmur3 hash implementation commit-graph: define and use MAX_NUM_CHUNKS	2020-05-01 13:39:53 -07:00
Garima Singh	76ffbca71a	commit-graph: write Bloom filters to commit graph file Update the technical documentation for commit-graph-format with the formats for the Bloom filter index (BIDX) and Bloom filter data (BDAT) chunks. Write the computed Bloom filters information to the commit graph file using this format. Helped-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Garima Singh <garima.singh@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-04-06 11:08:37 -07:00
Josh Steadmon	3d3adaad91	trace2: teach Git to log environment variables Via trace2, Git can already log interesting config parameters (see the trace2_cmd_list_config() function). However, this can grant an incomplete picture because many config parameters also allow overrides via environment variables. To allow for more complete logs, we add a new trace2_cmd_list_env_vars() function and supporting implementation, modeled after the pre-existing config param logging implementation. Signed-off-by: Josh Steadmon <steadmon@google.com> Acked-by: Jeff Hostetler <jeffhost@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-03-23 13:14:53 -07:00
Junio C Hamano	0410c2ba31	Merge branch 'jb/multi-pack-index-docfix' Doc fix. * jb/multi-pack-index-docfix: pack-format: correct multi-pack-index description	2020-02-12 12:41:39 -08:00
Junio C Hamano	e99c325bb4	Merge branch 'ms/doc-bundle-format' Technical details of the bundle format has been documented. * ms/doc-bundle-format: doc: describe Git bundle format	2020-02-12 12:41:38 -08:00
Johannes Berg	eb31044ff7	pack-format: correct multi-pack-index description The description of the multi-pack-index contains a small bug, if all offsets are < 2^32 then there will be no LOFF chunk, not only if they're all < 2^31 (since the highest bit is only needed as the "LOFF-escape" when that's actually needed.) Correct this, and clarify that in that case only offsets up to 2^31-1 can be stored in the OOFF chunk. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Acked-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-10 09:01:48 -08:00
Masaya Suzuki	7378ec90e1	doc: describe Git bundle format The bundle format was not documented. Describe the format with ABNF and explain the meaning of each part. Signed-off-by: Masaya Suzuki <masayasuzuki@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-07 12:47:02 -08:00
Junio C Hamano	7e65f8638e	Merge branch 'jb/doc-multi-pack-idx-fix' Typofix. * jb/doc-multi-pack-idx-fix: multi-pack-index: correct configuration in documentation	2020-01-08 12:44:12 -08:00
Johannes Berg	421c0ffb02	multi-pack-index: correct configuration in documentation It's core.multiPackIndex, not pack.multiIndex. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-04 15:02:06 -08:00
Junio C Hamano	26c816a67d	Merge branch 'hw/doc-in-header' * hw/doc-in-header: trace2: move doc to trace2.h submodule-config: move doc to submodule-config.h tree-walk: move doc to tree-walk.h trace: move doc to trace.h run-command: move doc to run-command.h parse-options: add link to doc file in parse-options.h credential: move doc to credential.h argv-array: move doc to argv-array.h cache: move doc to cache.h sigchain: move doc to sigchain.h pathspec: move doc to pathspec.h revision: move doc to revision.h attr: move doc to attr.h refs: move doc to refs.h remote: move doc to remote.h and refspec.h sha1-array: move doc to sha1-array.h merge: move doc to ll-merge.h graph: move doc to graph.h and graph.c dir: move doc to dir.h diff: move doc to diff.h and diffcore.h	2019-12-16 13:08:39 -08:00
Junio C Hamano	56e6c16394	Merge branch 'dl/lore-is-the-archive' Publicize lore.kernel.org mailing list archive and use URLs pointing into it to refer to notable messages in the documentation. * dl/lore-is-the-archive: doc: replace LKML link with lore.kernel.org RelNotes: replace Gmane with real Message-IDs doc: replace MARC links with lore.kernel.org	2019-12-06 15:09:24 -08:00
Junio C Hamano	3b3d9ea6a8	Merge branch 'jk/lore-is-the-archive' Doc update for the mailing list archiving and nntp service. * jk/lore-is-the-archive: doc: replace public-inbox links with lore.kernel.org doc: recommend lore.kernel.org over public-inbox.org	2019-12-06 15:09:23 -08:00
Denton Liu	14b7664df8	doc: replace LKML link with lore.kernel.org Since we're now recommending lore.kernel.org, replace LKML link with lore.kernel.org. Although LKML has been around for a long time, nothing lasts forever (see Gmane). Since LKML uses opaque message identifiers, switching to lore.kernel.org should be a strict improvement since, even if lore.kernel.org goes down, the Message-ID will allow future readers to look up the referenced messages on any other archive. Signed-off-by: Denton Liu <liu.denton@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-04 10:26:52 -08:00
Junio C Hamano	d3096d2ba6	Merge branch 'en/doc-typofix' Docfix. * en/doc-typofix: Fix spelling errors in no-longer-updated-from-upstream modules multimail: fix a few simple spelling errors sha1dc: fix trivial comment spelling error Fix spelling errors in test commands Fix spelling errors in messages shown to users Fix spelling errors in names of tests Fix spelling errors in comments of testcases Fix spelling errors in code comments Fix spelling errors in documentation outside of Documentation/ Documentation: fix a bunch of typos, both old and new	2019-12-01 09:04:35 -08:00
Junio C Hamano	3c90710c0c	Merge branch 'hw/config-doc-in-header' Follow recent push to move API docs from Documentation/ to header files and update config.h * hw/config-doc-in-header: config: move documentation to config.h	2019-12-01 09:04:32 -08:00
Jeff King	3eae30e464	doc: replace public-inbox links with lore.kernel.org Since we're now recommending lore.kernel.org (and because the public-inbox.org domain might eventually go away), let's update our internal references to use it, too. That future-proofs our references, and sets the example we want people to follow. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-30 09:12:26 -08:00
Heba Waly	6c51cb525d	trace2: move doc to trace2.h Move the functions documentation from Documentation/technical/api-trace2.txt to trace2.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Only the functions documentation section is removed from Documentation/technical/api-trace2.txt as the file is full of details that seemed more appropriate to be in a separate doc file as it is, with a link to the doc file added in the trace2.h. Also the functions doc is removed to avoid having redundandt info which will be hard to keep syncronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	d95a77d059	submodule-config: move doc to submodule-config.h Move the documentation from Documentation/technical/api-submodule-config.txt to submodule-config.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Documentation/technical/api-submodule-config.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. The documentation of parse_submodule_config_option() is discarded as the function was removed 2 years ago. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	bbcfa3002a	tree-walk: move doc to tree-walk.h Move the documentation from Documentation/technical/api-tree-walking.txt to tree-walk.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Documentation/technical/api-tree-walking.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	f1ecbe0f53	trace: move doc to trace.h Move the documentation from Documentation/technical/api-trace.txt to trace.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Documentation/technical/api-trace.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	4c4066d95d	run-command: move doc to run-command.h Move the documentation from Documentation/technical/api-run-command.txt to run-command.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Documentation/technical/api-run-command.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	f3b9055624	credential: move doc to credential.h Move the documentation from Documentation/technical/api-credentials.txt to credential.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Documentation/technical/api-credentials.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Documentation/git-credential.txt and Documentation/gitcredentials.txt now link to credential.h instead of Documentation/technical/api-credentials.txt for details about the credetials API. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	971b1f24a2	argv-array: move doc to argv-array.h Move the documentation from Documentation/technical/api-argv-array.txt to argv-array.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-argv-array.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	13aa9c8b70	cache: move doc to cache.h Move the documentation from Documentation/technical/api-allocation-growing.txt to cache.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-allocation-growing.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	c0be43f898	sigchain: move doc to sigchain.h Move the documentation from Documentation/technical/api-sigchain.txt to sigchain.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-sigchain.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:29 +09:00
Heba Waly	19ef3ddd36	pathspec: move doc to pathspec.h Move the documentation from Documentation/technical/api-setup.txt to pathspec.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-setup.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:28 +09:00
Heba Waly	301d595e72	revision: move doc to revision.h Move the documentation from Documentation/technical/api-revision-walking.txt to revision.h as it's easier for the developers to find the usage information beside the code instead of looking for it in another doc file. Also documentation/technical/api-revision-walking.txt is removed because the information it has is now redundant and it'll be hard to keep it up to date and synchronized with the documentation in the header file. Signed-off-by: Heba Waly <heba.waly@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-11-18 15:21:28 +09:00

1 2 3 4 5 ...

944 Commits