Git with broken hash generation to generate collisions between object IDs. Don't use this!
https://undefinedbehavior.de/posts/commit-vandalism/
![]() Ok, so on the kernel list, some people noticed that "git log --follow" doesn't work too well with some files in the x86 merge, because a lot of files got renamed in very special ways. In particular, there was a pattern of doing single commits with renames that looked basically like - rename "filename.h" -> "filename_64.h" - create new "filename.c" that includes "filename_32.h" or "filename_64.h" depending on whether we're 32-bit or 64-bit. which was preparatory for smushing the two trees together. Now, there's two issues here: - "filename.c" *remained*. Yes, it was a rename, but there was a new file created with the old name in the same commit. This was important, because we wanted each commit to compile properly, so that it was bisectable, so splitting the rename into one commit and the "create helper file" into another was *not* an option. So we need to break associations where the contents change too much. Fine. We have the -B flag for that. When we break things up, then the rename detection will be able to figure out whether there are better alternatives. - "git log --follow" didn't with with -B. Now, the second case was really simple: we use a different "diffopt" structure for the rename detection than the basic one (which we use for showing the diffs). So that second case is trivially fixed by a trivial one-liner that just copies the break_opt values from the "real" diffopts to the one used for rename following. So now "git log -B --follow" works fine: diff --git a/tree-diff.c b/tree-diff.c index 26bdbdd..7c261fd 100644 --- a/tree-diff.c +++ b/tree-diff.c @@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co diff_opts.detect_rename = DIFF_DETECT_RENAME; diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT; diff_opts.single_follow = opt->paths[0]; + diff_opts.break_opt = opt->break_opt; paths[0] = NULL; diff_tree_setup_paths(paths, &diff_opts); if (diff_setup_done(&diff_opts) < 0) however, the end result does *not* work. Because our diffcore-break.c logic is totally bogus! In particular: - it used to do if (base_size < MINIMUM_BREAK_SIZE) return 0; /* we do not break too small filepair */ which basically says "don't bother to break small files". But that "base_size" is the *smaller* of the two sizes, which means that if some large file was rewritten into one that just includes another file, we would look at the (small) result, and decide that it's smaller than the break size, so it cannot be worth it to break it up! Even if the other side was ten times bigger and looked *nothing* like the samell file! That's clearly bogus. I replaced "base_size" with "max_size", so that we compare the *bigger* of the filepair with the break size. - It calculated a "merge_score", which was the score needed to merge it back together if nothing else wanted it. But even if it was *so* different that we would never want to merge it back, we wouldn't consider it a break! That makes no sense. So I added if (*merge_score_p > break_score) return 1; to make it clear that if we wouldn't want to merge it at the end, it was *definitely* a break. - It compared the whole "extent of damage", counting all inserts and deletes, but it based this score on the "base_size", and generated the damage score with delta_size = src_removed + literal_added; damage_score = delta_size * MAX_SCORE / base_size; but that makes no sense either, since quite often, this will result in a number that is *bigger* than MAX_SCORE! Why? Because base_size is (again) the smaller of the two files we compare, and when you start out from a small file and add a lot (or start out from a large file and remove a lot), the base_size is going to be much smaller than the damage! Again, the fix was to replace "base_size" with "max_size", at which point the damage actually becomes a sane percentage of the whole. With these changes in place, not only does "git log -B --follow" work for the case that triggered this in the first place, ie now git log -B --follow arch/x86/kernel/vmlinux_64.lds.S actually gives reasonable results. But I also wanted to verify it in general, by doing a full-history git log --stat -B -C on my kernel tree with the old code and the new code. There's some tweaking to be done, but generally, the new code generates much better results wrt breaking up files (and then finding better rename candidates). Here's a few examples of the "--stat" output: - This: include/asm-x86/Kbuild | 2 - include/asm-x86/debugreg.h | 79 +++++++++++++++++++++++++++++++++++------ include/asm-x86/debugreg_32.h | 64 --------------------------------- include/asm-x86/debugreg_64.h | 65 --------------------------------- 4 files changed, 68 insertions(+), 142 deletions(-) Becomes: include/asm-x86/Kbuild | 2 - include/asm-x86/{debugreg_64.h => debugreg.h} | 9 +++- include/asm-x86/debugreg_32.h | 64 ------------------------- 3 files changed, 7 insertions(+), 68 deletions(-) - This: include/asm-x86/bug.h | 41 +++++++++++++++++++++++++++++++++++++++-- include/asm-x86/bug_32.h | 37 ------------------------------------- include/asm-x86/bug_64.h | 34 ---------------------------------- 3 files changed, 39 insertions(+), 73 deletions(-) Becomes include/asm-x86/{bug_64.h => bug.h} | 20 +++++++++++++----- include/asm-x86/bug_32.h | 37 ----------------------------------- 2 files changed, 14 insertions(+), 43 deletions(-) Now, in some other cases, it does actually turn a rename into a real "delete+create" pair, and then the diff is usually bigger, so truth in advertizing: it doesn't always generate a nicer diff. But for what -B was meant for, I think this is a big improvement, and I suspect those cases where it generates a bigger diff are tweakable. So I think this diff fixes a real bug, but we might still want to tweak the default values and perhaps the exact rules for when a break happens. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> |
||
---|---|---|
arm | ||
compat | ||
contrib | ||
Documentation | ||
git-gui | ||
gitweb | ||
mozilla-sha1 | ||
perl | ||
ppc | ||
t | ||
templates | ||
xdiff | ||
.gitignore | ||
.mailmap | ||
alloc.c | ||
archive-tar.c | ||
archive-zip.c | ||
archive.h | ||
attr.c | ||
attr.h | ||
base85.c | ||
blob.c | ||
blob.h | ||
builtin-add.c | ||
builtin-annotate.c | ||
builtin-apply.c | ||
builtin-archive.c | ||
builtin-blame.c | ||
builtin-branch.c | ||
builtin-bundle.c | ||
builtin-cat-file.c | ||
builtin-check-attr.c | ||
builtin-check-ref-format.c | ||
builtin-checkout-index.c | ||
builtin-commit-tree.c | ||
builtin-config.c | ||
builtin-count-objects.c | ||
builtin-describe.c | ||
builtin-diff-files.c | ||
builtin-diff-index.c | ||
builtin-diff-tree.c | ||
builtin-diff.c | ||
builtin-fetch--tool.c | ||
builtin-fmt-merge-msg.c | ||
builtin-for-each-ref.c | ||
builtin-fsck.c | ||
builtin-gc.c | ||
builtin-grep.c | ||
builtin-init-db.c | ||
builtin-log.c | ||
builtin-ls-files.c | ||
builtin-ls-tree.c | ||
builtin-mailinfo.c | ||
builtin-mailsplit.c | ||
builtin-merge-base.c | ||
builtin-merge-file.c | ||
builtin-mv.c | ||
builtin-name-rev.c | ||
builtin-pack-objects.c | ||
builtin-pack-refs.c | ||
builtin-prune-packed.c | ||
builtin-prune.c | ||
builtin-push.c | ||
builtin-read-tree.c | ||
builtin-reflog.c | ||
builtin-rerere.c | ||
builtin-rev-list.c | ||
builtin-rev-parse.c | ||
builtin-revert.c | ||
builtin-rm.c | ||
builtin-runstatus.c | ||
builtin-shortlog.c | ||
builtin-show-branch.c | ||
builtin-show-ref.c | ||
builtin-stripspace.c | ||
builtin-symbolic-ref.c | ||
builtin-tag.c | ||
builtin-tar-tree.c | ||
builtin-unpack-objects.c | ||
builtin-update-index.c | ||
builtin-update-ref.c | ||
builtin-upload-archive.c | ||
builtin-verify-pack.c | ||
builtin-verify-tag.c | ||
builtin-write-tree.c | ||
builtin.h | ||
cache-tree.c | ||
cache-tree.h | ||
cache.h | ||
check-builtins.sh | ||
check-racy.c | ||
color.c | ||
color.h | ||
combine-diff.c | ||
commit.c | ||
commit.h | ||
config.c | ||
config.mak.in | ||
configure.ac | ||
connect.c | ||
convert-objects.c | ||
convert.c | ||
copy.c | ||
COPYING | ||
csum-file.c | ||
csum-file.h | ||
ctype.c | ||
daemon.c | ||
date.c | ||
decorate.c | ||
decorate.h | ||
delta.h | ||
diff-delta.c | ||
diff-lib.c | ||
diff.c | ||
diff.h | ||
diffcore-break.c | ||
diffcore-delta.c | ||
diffcore-order.c | ||
diffcore-pickaxe.c | ||
diffcore-rename.c | ||
diffcore.h | ||
dir.c | ||
dir.h | ||
dump-cache-tree.c | ||
entry.c | ||
environment.c | ||
exec_cmd.c | ||
exec_cmd.h | ||
fast-import.c | ||
fetch-pack.c | ||
fetch.c | ||
fetch.h | ||
fixup-builtins | ||
generate-cmdlist.sh | ||
git-add--interactive.perl | ||
git-am.sh | ||
git-archimport.perl | ||
git-bisect.sh | ||
git-checkout.sh | ||
git-clean.sh | ||
git-clone.sh | ||
git-commit.sh | ||
git-compat-util.h | ||
git-cvsexportcommit.perl | ||
git-cvsimport.perl | ||
git-cvsserver.perl | ||
git-fetch.sh | ||
git-filter-branch.sh | ||
git-instaweb.sh | ||
git-lost-found.sh | ||
git-ls-remote.sh | ||
git-merge-octopus.sh | ||
git-merge-one-file.sh | ||
git-merge-ours.sh | ||
git-merge-resolve.sh | ||
git-merge-stupid.sh | ||
git-merge.sh | ||
git-mergetool.sh | ||
git-parse-remote.sh | ||
git-pull.sh | ||
git-quiltimport.sh | ||
git-rebase--interactive.sh | ||
git-rebase.sh | ||
git-relink.perl | ||
git-remote.perl | ||
git-repack.sh | ||
git-request-pull.sh | ||
git-reset.sh | ||
git-send-email.perl | ||
git-sh-setup.sh | ||
git-stash.sh | ||
git-submodule.sh | ||
git-svn.perl | ||
git-svnimport.perl | ||
GIT-VERSION-GEN | ||
git.c | ||
git.spec.in | ||
gitk | ||
grep.c | ||
grep.h | ||
hash-object.c | ||
help.c | ||
http-fetch.c | ||
http-push.c | ||
http.c | ||
http.h | ||
ident.c | ||
imap-send.c | ||
index-pack.c | ||
INSTALL | ||
interpolate.c | ||
interpolate.h | ||
list-objects.c | ||
list-objects.h | ||
local-fetch.c | ||
lockfile.c | ||
log-tree.c | ||
log-tree.h | ||
mailmap.c | ||
mailmap.h | ||
Makefile | ||
match-trees.c | ||
merge-file.c | ||
merge-index.c | ||
merge-recursive.c | ||
merge-tree.c | ||
mktag.c | ||
mktree.c | ||
object-refs.c | ||
object.c | ||
object.h | ||
pack-check.c | ||
pack-redundant.c | ||
pack-write.c | ||
pack.h | ||
pager.c | ||
patch-delta.c | ||
patch-id.c | ||
patch-ids.c | ||
patch-ids.h | ||
path-list.c | ||
path-list.h | ||
path.c | ||
peek-remote.c | ||
pkt-line.c | ||
pkt-line.h | ||
progress.c | ||
progress.h | ||
quote.c | ||
quote.h | ||
reachable.c | ||
reachable.h | ||
read-cache.c | ||
README | ||
receive-pack.c | ||
reflog-walk.c | ||
reflog-walk.h | ||
refs.c | ||
refs.h | ||
RelNotes | ||
remote.c | ||
remote.h | ||
revision.c | ||
revision.h | ||
rsh.c | ||
rsh.h | ||
run-command.c | ||
run-command.h | ||
send-pack.c | ||
server-info.c | ||
setup.c | ||
sha1_file.c | ||
sha1_name.c | ||
shallow.c | ||
shell.c | ||
show-index.c | ||
sideband.c | ||
sideband.h | ||
ssh-fetch.c | ||
ssh-pull.c | ||
ssh-push.c | ||
ssh-upload.c | ||
strbuf.c | ||
strbuf.h | ||
symlinks.c | ||
tag.c | ||
tag.h | ||
tar.h | ||
test-absolute-path.c | ||
test-chmtime.c | ||
test-date.c | ||
test-delta.c | ||
test-genrandom.c | ||
test-match-trees.c | ||
test-sha1.c | ||
test-sha1.sh | ||
trace.c | ||
tree-diff.c | ||
tree-walk.c | ||
tree-walk.h | ||
tree.c | ||
tree.h | ||
unpack-file.c | ||
unpack-trees.c | ||
unpack-trees.h | ||
update-server-info.c | ||
upload-pack.c | ||
usage.c | ||
utf8.c | ||
utf8.h | ||
var.c | ||
write_or_die.c | ||
wt-status.c | ||
wt-status.h | ||
xdiff-interface.c | ||
xdiff-interface.h |
//////////////////////////////////////////////////////////////// GIT - the stupid content tracker //////////////////////////////////////////////////////////////// "git" can mean anything, depending on your mood. - random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang. - "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - "goddamn idiotic truckload of sh*t": when it breaks Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals. Git is an Open Source project covered by the GNU General Public License. It was originally written by Linus Torvalds with help of a group of hackers around the net. It is currently maintained by Junio C Hamano. Please read the file INSTALL for installation instructions. See Documentation/tutorial.txt to get started, then see Documentation/everyday.txt for a useful minimum set of commands, and "man git-commandname" for documentation of each command. CVS users may also want to read Documentation/cvs-migration.txt. Many Git online resources are accessible from http://git.or.cz/ including full documentation and Git related tools. The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org. To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at http://marc.theaimsgroup.com/?l=git and other archival sites. The messages titled "A note from the maintainer", "What's in git.git (stable)" and "What's cooking in git.git (topics)" and the discussion following them on the mailing list give a good reference for project status, development direction and remaining tasks.