git-commit-vandalism

Author	SHA1	Message	Date
brian m. carlson	14228447c9	hash: provide per-algorithm null OIDs Up until recently, object IDs did not have an algorithm member, only a hash. Consequently, it was possible to share one null (all-zeros) object ID among all hash algorithms. Now that we're going to be handling objects from multiple hash algorithms, it's important to make sure that all object IDs have a correct algorithm field. Introduce a per-algorithm null OID, and add it to struct hash_algo. Introduce a wrapper function as well, and use it everywhere we used to use the null_oid constant. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-27 16:31:39 +09:00
ZheNing Hu	e73fe3dd02	builtin/*: update usage format According to the guidelines in parse-options.h, we should not end in a full stop or start with a capital letter. Fix old error and usage messages to match this expectation. Signed-off-by: ZheNing Hu <adlternative@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-06 15:10:49 -08:00
Ævar Arnfjörð Bjarmason	56f56ac50b	style: do not "break" in switch() after "return" Remove this unreachable code. It was found by SunCC, it's found by a non-fatal warning emitted by SunCC. It's one of the things it's more vehement about than GCC & Clang. It complains about a lot of other similarly unreachable code, e.g. a BUG(...) without a "return", and a "return 0" after a long if/else, both of whom have "return" statements. Those are also genuine redundancies to a compiler, but arguably make the code a bit easier to read & less fragile to maintain. These return/break cases are just unnecessary however, and as seen here the surrounding code just did a plain "return" without a "break" already. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-15 16:32:50 -08:00
Junio C Hamano	58138d3f26	Merge branch 'js/default-branch-name-part-2' Update the tests to drop word 'master' from them. * js/default-branch-name-part-2: t9902: avoid using the branch name `master` tests: avoid variations of the `master` branch name t3200: avoid variations of the `master` branch name fast-export: avoid using unnecessary language in a code comment t/test-terminal: avoid non-inclusive language	2020-10-05 14:01:50 -07:00
Junio C Hamano	86cca370e1	Merge branch 'jk/drop-unaligned-loads' Compilation fix around type punning. * jk/drop-unaligned-loads: Revert "fast-export: use local array to store anonymized oid" bswap.h: drop unaligned loads	2020-10-04 12:49:06 -07:00
Jeff King	176380fd11	Revert "fast-export: use local array to store anonymized oid" This reverts commit `f39ad38410`. That commit was trying to silence a type-punning warning on older versions of gcc. However, its analysis was all wrong. I didn't notice that we _were_ in fact type-punning because there are two versions of put_be32(): one that uses casts and unaligned loads, and another that uses bitshifts. I looked at the latter, but on my platform we were defaulting to the former. However, as of the previous commit, we'll always use the bitshift version. So we can drop this hackery to avoid the warning, making the code slightly cleaner. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-24 12:30:11 -07:00
Johannes Schindelin	5a0c32bd4b	fast-export: avoid using unnecessary language in a code comment In an ongoing effort to avoid non-inclusive language, let's avoid using the branch name "master" in a code comment. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-21 15:19:27 -07:00
Jonathan Tan	f24c30e0b6	wt-status: tolerate dangling marks When a user checks out the upstream branch of HEAD, the upstream branch not being a local branch, and then runs "git status", like this: git clone $URL client cd client git checkout @{u} git status no status is printed, but instead an error message: fatal: HEAD does not point to a branch (This error message when running "git branch" persists even after checking out other things - it only stops after checking out a branch.) This is because "git status" reads the reflog when determining the "HEAD detached" message, and thus attempts to DWIM "@{u}", but that doesn't work because HEAD no longer points to a branch. Therefore, when calculating the status of a worktree, tolerate dangling marks. This is done by adding an additional parameter to dwim_ref() and repo_dwim_ref(). Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-02 14:39:25 -07:00
Jeff King	f39ad38410	fast-export: use local array to store anonymized oid Some older versions of gcc complain about this line: builtin/fast-export.c:412:2: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing] put_be32(oid.hash + hashsz - 4, counter++); ^ This seems to be a false positive, as there's no type-punning at all here. oid.hash is an array of unsigned char; when we pass it to a function it decays to a pointer to unsigned char. We do take a void pointer in put_be32(), but it's immediately aliased with another pointer to unsigned char (and clearly the compiler is looking inside the inlined put_be32(), since the warning doesn't happen with -O0). This happens on gcc 4.8 and 4.9, but not later versions (I tested gcc 6, 7, 8, and 9). We can work around it by using a local array instead of an object_id struct. This is a little more intimate with the details of object_id, but for whatever reason doesn't seem to trigger the compiler warning. We can revert this patch once we decide that those gcc versions are too old to care about for a warning like this (gcc 4.8 is the default compiler for Ubuntu Trusty, which is out-of-support but not fully end-of-life'd until April 2022). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-25 14:19:23 -07:00
Jeff King	8a49495583	fast-export: anonymize "master" refname Running "fast-export --anonymize" will leave "refs/heads/master" untouched in the output, for two reasons: - it helped to have some known reference point between the original and anonymized repository - since it's historically the default branch name, it doesn't leak any information Now that we can ask fast-export to retain particular tokens, we have a much better tool for the first one (because it works for any ref, not just master). For the second, the notion of "default branch name" is likely to become configurable soon, at which point the name _does_ leak information. Let's drop this special case in preparation. Note that we have to adjust the test a bit, since it relied on using the name "master" in the anonymized repos. We could just use --anonymize-map=master to keep the same output, but then we wouldn't know if it works because of our hard-coded master or because of the explicit map. So let's flip the test a bit, and confirm that we anonymize "master", but keep "other" in the output. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-25 14:19:23 -07:00
Jeff King	65b5d9fae7	fast-export: allow seeding the anonymized mapping After you anonymize a repository, it can be hard to find which commits correspond between the original and the result, and thus hard to reproduce commands that triggered bugs in the original. Let's make it possible to seed the anonymization map. This lets users either: - mark names to be retained as-is, if they don't consider them secret (in which case their original commands would just work) - map names to new values, which lets them adapt the reproduction recipe to the new names without revealing the originals The implementation is fairly straight-forward. We already store each anonymized token in a hashmap (so that the same token appearing twice is converted to the same result). We can just introduce a new "seed" hashmap which is consulted first. This does make a few more promises to the user about how we'll anonymize things (e.g., token-splitting pathnames). But it's unlikely that we'd want to change those rules, even if the actual anonymization of a single token changes. And it makes things much easier for the user, who can unblind only a directory name without having to specify each path within it. One alternative to this approach would be to anonymize as we see fit, and then dump the whole refname and pathname mappings to a file. This does work, but it's a bit awkward to use (you have to manually dig the items you care about out of the mapping). Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-25 14:19:23 -07:00
Jeff King	d5bf91fde4	fast-export: add a "data" callback parameter to anonymize_str() The anonymize_str() function takes a generator callback, but there's no way to pass extra context to it. Let's add the usual "void *data" parameter to the generator interface and pass it along. This is mildly annoying for existing callers, all of which pass NULL, but is necessary to avoid extra globals in some cases we'll add in a subsequent patch. While we're touching each of these callbacks, we can further observe that none of them use the existing orig/len parameters at all. This makes sense, since the point is for their output to have no discernable basis in the original (my original version had some notion that we might use a one-way function to obfuscate the names, but it was never implemented). So let's drop those extra parameters. If a caller really wants to do something with them, it can pass a struct through the new data parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	6416a865da	fast-export: move global "idents" anonymize hashmap into function All of the other anonymization functions keep their static mappings inside the function to avoid polluting the global namespace. Let's do the same for "idents", as nobody needs it outside of anonymize_ident_line(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	55b01456a9	fast-export: use a flex array to store anonymized entries Now that we're using a separate keydata struct for hash lookups, we have more flexibility in how we allocate anonymized_entry structs. Let's push the "orig" key into a flex member within the struct. That should save us a few bytes of memory per entry (a pointer plus any malloc overhead), and may make lookups a little faster (since it's one less pointer to chase in the comparison function). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	a0f65641df	fast-export: stop storing lengths in anonymized hashmaps Now that the anonymize_str() interface is restricted to NUL-terminated strings, there's no need for us to keep track of the length of each entry in the hashmap. This simplifies the code and saves a bit of memory. Note that we do still need to compare the stored results to partial strings passed in by the callers. We can do that by using hashmap's keydata feature to get the ptr/len pair into the comparison function, and then using strncmp(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	7f40759496	fast-export: tighten anonymize_mem() interface to handle only strings While the anonymize_mem() interface _can_ store arbitrary byte sequences, none of the callers uses this feature (as of the previous commit). We'd like to keep it that way, as we'll be exposing the string-like nature of the anonymization routines to the user. So let's tighten up the interface a bit: - don't treat "len" as an out-parameter from anonymize_mem(); this ensures callers treat the pointer result as a NUL-terminated string - likewise, don't treat "len" as an out-parameter from generator functions - swap out "void " for "char " as appropriate to signal that we don't handle arbitrary memory - rename the function to anonymize_str() This will also open up some optimization opportunities in a future patch. Note that we can't drop the "len" parameter entirely. Some callers do pass in partial strings (e.g., "foo/bar", len=3) to avoid copying, and we need to handle those still. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	750bb32589	fast-export: store anonymized oids as hex strings When fast-export stores anonymized oids, it does so as binary strings. And while the anonymous mapping storage is binary-clean (at least as of the previous commit), this will become awkward when we start exposing more of it to the user. In particular, if we allow a method for retaining token "foo", then users may want to specify a hex oid as such a token. Let's just switch to storing the hex strings. The difference in memory usage is negligible (especially considering how infrequently we'd generally store an oid compared to, say, path components). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Jeff King	b897bf5f37	fast-export: use xmemdupz() for anonymizing oids Our anonymize_mem() function is careful to take a ptr/len pair to allow storing binary tokens like object ids, as well as partial strings (e.g., just "foo" of "foo/bar"). But it duplicates the hash key using xstrdup()! That means that: - for a partial string, we'd store all bytes up to the NUL, even though we'd never look at anything past "len". This didn't produce wrong behavior, but was wasteful. - for a binary oid that doesn't contain a zero byte, we'd copy garbage bytes off the end of the array (though as long as nothing complained about reading uninitialized bytes, further reads would be limited by "len", and we'd produce the correct results) - for a binary oid that does contain a zero byte, we'd copy _fewer_ bytes than intended into the hashmap struct. When we later try to look up a value, we'd access uninitialized memory and potentially falsely claim that a particular oid is not present. The most common reason to store an oid is an anonymized gitlink, but our test case doesn't have any gitlinks at all. So let's add one whose oid contains a NUL and is present at two different paths. ASan catches the memory error, but even without it we can detect the bug because the oid is not anonymized the same way for both paths. And of course the fix is to copy the correct number of bytes. We don't technically need the appended NUL from xmemdupz(), but it doesn't hurt as an extra protection against anybody treating it like a string (plus a future patch will push us more in that direction). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-23 19:56:26 -07:00
Junio C Hamano	78e67cda42	Merge branch 'mt/use-passed-repo-more-in-funcs' Some codepaths were given a repository instance as a parameter to work in the repository, but passed the_repository instance to its callees, which has been cleaned up (somewhat). * mt/use-passed-repo-more-in-funcs: sha1-file: allow check_object_signature() to handle any repo sha1-file: pass git_hash_algo to hash_object_file() sha1-file: pass git_hash_algo to write_object_file_prepare() streaming: allow open_istream() to handle any repo pack-check: use given repo's hash_algo at verify_packfile() cache-tree: use given repo's hash_algo at verify_one() diff: make diff_populate_filespec() honor its repo argument	2020-02-14 12:54:22 -08:00
Junio C Hamano	145136a95a	C: use skip_prefix() to avoid hardcoded string length We often skip an optional prefix in a string with a hardcoded constant, e.g. if (starts_with(string, "prefix")) string += 6; which is less error prone when written skip_prefix(string, "prefix", &string); Note that this changes a few error messages from "git reflog expire --expire=nonsense.timestamp", which used to complain by saying '--expire=nonsense.timestamp' is not a valid timestamp but with this change, we say 'nonsense.timestamp' is not a valid timestamp which is more technically correct (the string with --expire= as a prefix obviously cannot be a valid timestamp, but the error is about the part of the input without that prefix). Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 13:03:45 -08:00
Matheus Tavares	b98d188581	sha1-file: allow check_object_signature() to handle any repo Some callers of check_object_signature() can work on arbitrary repositories, but the repo does not get passed to this function. Instead, the_repository is always used internally. To fix possible inconsistencies, allow the function to receive a struct repository and make those callers pass on the repo being handled. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-31 10:45:39 -08:00
Junio C Hamano	5efabc7ed9	Merge branch 'ew/hashmap' Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use _entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry " hashmap: use _entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry " introduce container_of macro hashmap_put takes "struct hashmap_entry " hashmap_remove takes "const struct hashmap_entry " hashmap_get takes "const struct hashmap_entry " hashmap_add takes "struct hashmap_entry " hashmap_get_next takes "const struct hashmap_entry " hashmap_entry_init takes "struct hashmap_entry " packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent	2019-10-15 13:48:02 +09:00
Eric Wong	404ab78e39	hashmap: remove type arg from hashmap_{get,put,remove}_entry Since these macros already take a `keyvar' pointer of a known type, we can rely on OFFSETOF_VAR to get the correct offset without relying on non-portable `__typeof__' and `offsetof'. Argument order is also rearranged, so `keyvar' and `member' are sequential as they are used as: `keyvar->member' Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:12 +09:00
Eric Wong	939af16eac	hashmap_cmp_fn takes hashmap_entry params Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f23a465132	hashmap_get{,_from_hash} return "struct hashmap_entry *" Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	26b455f21e	hashmap_put takes "struct hashmap_entry " This is less error-prone than "void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b6c5241606	hashmap_get takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	d22245a2e3	hashmap_entry_init takes "struct hashmap_entry " C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry " to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
Elijah Newren	941790d7de	fast-export: handle nested tags Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-04 07:33:21 +09:00
Elijah Newren	a1638cfe12	fast-export: allow user to request tags be marked with --mark-tags Add a new option, --mark-tags, which will output mark identifiers with each tag object. This improves the incremental export story with --export-marks since it will allow us to record that annotated tags have been exported, and it is also needed as a step towards supporting nested tags. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-04 07:33:21 +09:00
Elijah Newren	208d69246e	fast-export: add support for --import-marks-if-exists fast-import has support for both an --import-marks flag and an --import-marks-if-exists flag; the latter of which will not die() if the file does not exist. fast-export only had support for an --import-marks flag; add an --import-marks-if-exists flag for consistency. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-04 07:33:21 +09:00
Elijah Newren	af2abd870b	fast-export: fix exporting a tag and nothing else fast-export allows specifying revision ranges, which can be used to export a tag without exporting the commit it tags. fast-export handled this rather poorly: it would emit a "from :0" directive. Since marks start at 1 and increase, this means it refers to an unknown commit and fast-import will choke on the input. When we are unable to look up a mark for the object being tagged, use a "from $HASH" directive instead to fix this problem. Note that this is quite similar to the behavior fast-export exhibits with commits and parents when --reference-excluded-parents is passed along with an excluded commit range. For tags of excluded commits we do not require the --reference-excluded-parents flag because we always have to tag something. By contrast, when dealing with commits, pruning a parent is always a viable option, so we need the flag to specify that parent pruning is not wanted. (It is slightly weird that --reference-excluded-parents isn't the default with a separate --prune-excluded-parents flag, but backward compatibility concerns resulted in the current defaults.) Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-28 18:54:40 +09:00
Jeff King	d0229abd93	object: convert lookup_object() to use object_id There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 10:18:09 -07:00
Elijah Newren	e80001f8fd	fast-export: do automatic reencoding of commit messages only if requested Automatic re-encoding of commit messages (and dropping of the encoding header) hurts attempts to do reversible history rewrites (e.g. sha1sum <-> sha256sum transitions, some subtree rewrites), and seems inconsistent with the general principle followed elsewhere in fast-export of requiring explicit user requests to modify the output (e.g. --signed-tags=strip, --tag-of-filtered-object=rewrite). Add a --reencode flag that the user can use to specify, and like other fast-export flags, default it to 'abort'. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-14 16:48:56 +09:00
Elijah Newren	57a8be2cb0	fast-export: differentiate between explicitly UTF-8 and implicitly UTF-8 The find_encoding() function returned the encoding used by a commit message, returning a default of git_commit_encoding (usually UTF-8). Although the current code does not differentiate between a commit which explicitly requested UTF-8 and one where we just assume UTF-8 because no encoding is set, it will become important when we try to preserve the encoding header. Since is_encoding_utf8() returns true when passed NULL, we can just return NULL from find_encoding() instead of returning git_commit_encoding. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-14 16:48:56 +09:00
Elijah Newren	ccbfc96dc4	fast-export: avoid stripping encoding header if we cannot reencode When fast-export encounters a commit with an 'encoding' header, it tries to reencode in UTF-8 and then drops the encoding header. However, if it fails to reencode in UTF-8 because e.g. one of the characters in the commit message was invalid in the old encoding, then we need to retain the original encoding or otherwise we lose information needed to understand all the other (valid) characters in the original commit message. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-05-14 16:48:56 +09:00
Junio C Hamano	4d59753227	Merge branch 'en/fast-export-import' Small fixes and features for fast-export and fast-import, mostly on the fast-export side. * en/fast-export-import: fast-export: add a --show-original-ids option to show original names fast-import: remove unmaintained duplicate documentation fast-export: add --reference-excluded-parents option fast-export: ensure we export requested refs fast-export: when using paths, avoid corrupt stream with non-existent mark fast-export: move commit rewriting logic into a function for reuse fast-export: avoid dying when filtering by paths and old tags exist fast-export: use value from correct enum git-fast-export.txt: clarify misleading documentation about rev-list args git-fast-import.txt: fix documentation for --quiet option fast-export: convert sha1 to oid	2019-01-04 13:33:33 -08:00
Elijah Newren	a965bb3116	fast-export: add a --show-original-ids option to show original names Knowing the original names (hashes) of commits can sometimes enable post-filtering that would otherwise be difficult or impossible. In particular, the desire to rewrite commit messages which refer to other prior commits (on top of whatever other filtering is being done) is very difficult without knowing the original names of each commit. In addition, knowing the original names (hashes) of blobs can allow filtering by blob-id without requiring re-hashing the content of the blob, and is thus useful as a small optimization. Once we add original ids for both commits and blobs, we may as well add them for tags too for completeness. Perhaps someone will have a use for them. This commit teaches a new --show-original-ids option to fast-export which will make it add a 'original-oid <hash>' line to blob, commits, and tags. It also teaches fast-import to parse (and ignore) such lines. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:52 +09:00
Elijah Newren	530ca19c02	fast-export: add --reference-excluded-parents option git filter-branch has a nifty feature allowing you to rewrite, e.g. just the last 8 commits of a linear history git filter-branch $OPTIONS HEAD~8..HEAD If you try the same with git fast-export, you instead get a history of only 8 commits, with HEAD~7 being rewritten into a root commit. There are two alternatives: 1) Don't use the negative revision specification, and when you're filtering the output to make modifications to the last 8 commits, just be careful to not modify any earlier commits somehow. 2) First run 'git fast-export --export-marks=somefile HEAD~8', then run 'git fast-export --import-marks=somefile HEAD~8..HEAD'. Both are more error prone than I'd like (the first for obvious reasons; with the second option I have sometimes accidentally included too many revisions in the first command and then found that the corresponding extra revisions were not exported by the second command and thus were not modified as I expected). Also, both are poor from a performance perspective. Add a new --reference-excluded-parents option which will cause fast-export to refer to commits outside the specified rev-list-args range by their sha1sum. Such a stream will only be useful in a repository which already contains the necessary commits (much like the restriction imposed when using --no-data). Note from Peff: I think we might be able to do a little more optimization here. If we're exporting HEAD^..HEAD and there's an object in HEAD^ which is unchanged in HEAD, I think we'd still print it (because it would not be marked SHOWN), but we could omit it (by walking the tree of the boundary commits and marking them shown). I don't think it's a blocker for what you're doing here, but just a possible future optimization. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:52 +09:00
Elijah Newren	fdf31b6369	fast-export: ensure we export requested refs If file paths are specified to fast-export and a ref points to a commit that does not touch any of the relevant paths, then that ref would sometimes fail to be exported. (This depends on whether any ancestors of the commit which do touch the relevant paths would be exported with that same ref name or a different ref name.) To avoid this problem, put all specified refs into extra_refs to start, and then as we export each commit, remove the refname used in the 'commit $REFNAME' directive from extra_refs. Then, in handle_tags_and_duplicates() we know which refs actually do need a manual reset directive in order to be included. This means that we do need some special handling for excluded refs; e.g. if someone runs git fast-export ^master master then they've asked for master to be exported, but they have also asked for the commit which master points to and all of its history to be excluded. That logically means ref deletion. Previously, such refs were just silently omitted from being exported despite having been explicitly requested for export. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:52 +09:00
Elijah Newren	cd13762d8f	fast-export: when using paths, avoid corrupt stream with non-existent mark If file paths are specified to fast-export and multiple refs point to a commit that does not touch any of the relevant file paths, then fast-export can hit problems. fast-export has a list of additional refs that it needs to explicitly set after exporting all blobs and commits, and when it tries to get_object_mark() on the relevant commit, it can get a mark of 0, i.e. "not found", because the commit in question did not touch the relevant paths and thus was not exported. Trying to import a stream with a mark corresponding to an unexported object will cause fast-import to crash. Avoid this problem by taking the commit the ref points to and finding an ancestor of it that was exported, and make the ref point to that commit instead. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:51 +09:00
Elijah Newren	f129c4275c	fast-export: move commit rewriting logic into a function for reuse Logic to replace a filtered commit with an unfiltered ancestor is useful elsewhere; put it into a function we can call. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:51 +09:00
Elijah Newren	1f30c904b3	fast-export: avoid dying when filtering by paths and old tags exist If --tag-of-filtered-object=rewrite is specified along with a set of paths to limit what is exported, then any tags pointing to old commits that do not contain any of those specified paths cause problems. Since the old tagged commit is not exported, fast-export attempts to rewrite such tags to an ancestor commit which was exported. If no such commit exists, then fast-export currently die()s. Five years after the tag rewriting logic was added to fast-export (see commit `2d8ad46919`, "fast-export: Add a --tag-of-filtered-object option for newly dangling tags", 2009-06-25), fast-import gained the ability to delete refs (see commit `4ee1b225b9`, "fast-import: add support to delete refs", 2014-04-20), so now we do have a valid option to rewrite the tag to. Delete these tags instead of dying. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:51 +09:00
Elijah Newren	b93b81e799	fast-export: use value from correct enum ABORT and ERROR happen to have the same value, but come from differnt enums. Use the one from the correct enum, and while at it, rename the values to avoid such problems. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:51 +09:00
Elijah Newren	843b9e6d48	fast-export: convert sha1 to oid Rename anonymize_sha1() to anonymize_oid(() and change its signature, and switch from sha1_to_hex() to oid_to_hex() and from GIT_SHA1_RAWSZ to the_hash_algo->rawsz. Also change a comment and a die string to mention oid instead of sha1. Signed-off-by: Elijah Newren <newren@gmail.com> Acked-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-17 18:43:50 +09:00
Torsten Bögershausen	ca473cef91	Upcast size_t variables to uintmax_t when printing When printing variables which contain a size, today "unsigned long" is used at many places. In order to be able to change the type from "unsigned long" into size_t some day in the future, we need to have a way to print 64 bit variables on a system that has "unsigned long" defined to be 32 bit, like Win64. Upcast all those variables into uintmax_t before they are printed. This is to prepare for a bigger change, when "unsigned long" will be converted into size_t for variables which may be > 4Gib. Signed-off-by: Torsten Bögershausen <tboegi@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-11-12 16:43:52 +09:00
Junio C Hamano	11877b9ebe	Merge branch 'nd/the-index' Various codepaths in the core-ish part learn to work on an arbitrary in-core index structure, not necessarily the default instance "the_index". * nd/the-index: (23 commits) revision.c: reduce implicit dependency the_repository revision.c: remove implicit dependency on the_index ws.c: remove implicit dependency on the_index tree-diff.c: remove implicit dependency on the_index submodule.c: remove implicit dependency on the_index line-range.c: remove implicit dependency on the_index userdiff.c: remove implicit dependency on the_index rerere.c: remove implicit dependency on the_index sha1-file.c: remove implicit dependency on the_index patch-ids.c: remove implicit dependency on the_index merge.c: remove implicit dependency on the_index merge-blobs.c: remove implicit dependency on the_index ll-merge.c: remove implicit dependency on the_index diff-lib.c: remove implicit dependency on the_index read-cache.c: remove implicit dependency on the_index diff.c: remove implicit dependency on the_index grep.c: remove implicit dependency on the_index diff.c: remove the_index dependency in textconv() functions blame.c: rename "repo" argument to "r" combine-diff.c: remove implicit dependency on the_index ...	2018-10-19 13:34:02 +09:00
Nguyễn Thái Ngọc Duy	2abf350385	revision.c: remove implicit dependency on the_index Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-09-21 09:51:19 -07:00
Jeff King	4a7e27e957	convert "oidcmp() == 0" to oideq() Using the more restrictive oideq() should, in the long run, give the compiler more opportunities to optimize these callsites. For now, this conversion should be a complete noop with respect to the generated code. The result is also perhaps a little more readable, as it avoids the "zero is equal" idiom. Since it's so prevalent in C, I think seasoned programmers tend not to even notice it anymore, but it can sometimes make for awkward double negations (e.g., we can drop a few !!oidcmp() instances here). This patch was generated almost entirely by the included coccinelle patch. This mechanical conversion should be completely safe, because we check explicitly for cases where oidcmp() is compared to 0, which is what oideq() is doing under the hood. Note that we don't have to catch "!oidcmp()" separately; coccinelle's standard isomorphisms make sure the two are treated equivalently. I say "almost" because I did hand-edit the coccinelle output to fix up a few style violations (it mostly keeps the original formatting, but sometimes unwraps long lines). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2018-08-29 11:32:49 -07:00
Junio C Hamano	4bea8485e3	Merge branch 'nd/i18n' Many more strings are prepared for l10n. * nd/i18n: (23 commits) transport-helper.c: mark more strings for translation transport.c: mark more strings for translation sha1-file.c: mark more strings for translation sequencer.c: mark more strings for translation replace-object.c: mark more strings for translation refspec.c: mark more strings for translation refs.c: mark more strings for translation pkt-line.c: mark more strings for translation object.c: mark more strings for translation exec-cmd.c: mark more strings for translation environment.c: mark more strings for translation dir.c: mark more strings for translation convert.c: mark more strings for translation connect.c: mark more strings for translation config.c: mark more strings for translation commit-graph.c: mark more strings for translation builtin/replace.c: mark more strings for translation builtin/pack-objects.c: mark more strings for translation builtin/grep.c: mark strings for translation builtin/config.c: mark more strings for translation ...	2018-08-15 15:08:23 -07:00

1 2 3 4

169 Commits