Git with broken hash generation to generate collisions between object IDs. Don't use this!
https://undefinedbehavior.de/posts/commit-vandalism/
2548183bad
When core.ignorecase is turned on and there are stale index entries, "git commit" can sometimes report directories as untracked, even though they contain tracked files. You can see an example of this with: # make a case-insensitive repo git init repo && cd repo && git config core.ignorecase true && # with some tracked files in a subdir mkdir subdir && > subdir/one && > subdir/two && git add . && git commit -m base && # now make the index entries stale touch subdir/* && # and then ask commit to update those entries and show # us the status template git commit -a which will report "subdir/" as untracked, even though it clearly contains two tracked files. What is happening in the commit program is this: 1. We load the index, and for each entry, insert it into the index's name_hash. In addition, if ignorecase is turned on, we make an entry in the name_hash for the directory (e.g., "contrib/"), which uses the following code from 5102c61's hash_index_entry_directories: hash = hash_name(ce->name, ptr - ce->name); if (!lookup_hash(hash, &istate->name_hash)) { pos = insert_hash(hash, &istate->name_hash); if (pos) { ce->next = *pos; *pos = ce; } } Note that we only add the directory entry if there is not already an entry. 2. We run add_files_to_cache, which gets updated information for each cache entry. It helpfully inserts this information into the cache, which calls replace_index_entry. This in turn calls remove_name_hash() on the old entry, and add_name_hash() on the new one. But remove_name_hash doesn't actually remove from the hash, it only marks it as "no longer interesting" (from cache.h): /* * We don't actually *remove* it, we can just mark it invalid so that * we won't find it in lookups. * * Not only would we have to search the lists (simple enough), but * we'd also have to rehash other hash buckets in case this makes the * hash bucket empty (common). So it's much better to just mark * it. */ static inline void remove_name_hash(struct cache_entry *ce) { ce->ce_flags |= CE_UNHASHED; } This is OK in the specific-file case, since the entries in the hash form a linked list, and we can just skip the "not here anymore" entries during lookup. But for the directory hash entry, we will _not_ write a new entry, because there is already one there: the old one that is actually no longer interesting! 3. While traversing the directories, we end up in the directory_exists_in_index_icase function to see if a directory is interesting. This in turn checks index_name_exists, which will look up the directory in the index's name_hash. We see the old, deleted record, and assume there is nothing interesting. The directory gets marked as untracked, even though there are index entries in it. The problem is in the code I showed above: hash = hash_name(ce->name, ptr - ce->name); if (!lookup_hash(hash, &istate->name_hash)) { pos = insert_hash(hash, &istate->name_hash); if (pos) { ce->next = *pos; *pos = ce; } } Having a single cache entry that represents the directory is not enough; that entry may go away if the index is changed. It may be tempting to say that the problem is in our removal method; if we removed the entry entirely instead of simply marking it as "not here anymore", then we would know we need to insert a new entry. But that only covers this particular case of remove-replace. In the more general case, consider something like this: 1. We add "foo/bar" and "foo/baz" to the index. Each gets their own entry in name_hash, plus we make a "foo/" entry that points to "foo/bar". 2. We remove the "foo/bar" entry from the index, and from the name_hash. 3. We ask if "foo/" exists, and see no entry, even though "foo/baz" exists. So we need that directory entry to have the list of _all_ cache entries that indicate that the directory is tracked. So that implies making a linked list as we do for other entries, like: hash = hash_name(ce->name, ptr - ce->name); pos = insert_hash(hash, &istate->name_hash); if (pos) { ce->next = *pos; *pos = ce; } But that's not right either. In fact, it shows a second bug in the current code, which is that the "ce->next" pointer is supposed to be linking entries for a specific filename entry, but here we are overwriting it for the directory entry. So the same cache entry ends up in two linked lists, but they share the same "next" pointer. As it turns out, this second bug can't be triggered in the current code. The "if (pos)" conditional is totally dead code; pos will only be non-NULL if there was an existing hash entry, and we already checked that there wasn't one through our call to lookup_hash. But fixing the first bug means taking out that call to lookup_hash, which is going to activate the buggy dead code, and we'll end up splicing the two linked lists together. So we need to have a separate next pointer for the list in the directory bucket, and we need to traverse that list in index_name_exists when we are looking up a directory. This bloats "struct cache_entry" by a few bytes. Which is annoying, because it's only necessary when core.ignorecase is enabled. There's not an easy way around it, short of separating out the "next" pointers from cache_entry entirely (i.e., having a separate "cache_entry_list" struct that gets stored in the name_hash). In practice, it probably doesn't matter; we have thousands of cache entries, compared to the millions of objects (where adding 4 bytes to the struct actually does impact performance). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
block-sha1 | ||
builtin | ||
compat | ||
contrib | ||
Documentation | ||
git_remote_helpers | ||
git-gui | ||
gitk-git | ||
gitweb | ||
perl | ||
po | ||
ppc | ||
t | ||
templates | ||
vcs-svn | ||
xdiff | ||
.gitattributes | ||
.gitignore | ||
.mailmap | ||
abspath.c | ||
aclocal.m4 | ||
advice.c | ||
advice.h | ||
alias.c | ||
alloc.c | ||
archive-tar.c | ||
archive-zip.c | ||
archive.c | ||
archive.h | ||
attr.c | ||
attr.h | ||
base85.c | ||
bisect.c | ||
bisect.h | ||
blob.c | ||
blob.h | ||
branch.c | ||
branch.h | ||
builtin.h | ||
bundle.c | ||
bundle.h | ||
cache-tree.c | ||
cache-tree.h | ||
cache.h | ||
check_bindir | ||
check-builtins.sh | ||
check-racy.c | ||
color.c | ||
color.h | ||
combine-diff.c | ||
command-list.txt | ||
commit.c | ||
commit.h | ||
config.c | ||
config.mak.in | ||
configure.ac | ||
connect.c | ||
convert.c | ||
copy.c | ||
COPYING | ||
csum-file.c | ||
csum-file.h | ||
ctype.c | ||
daemon.c | ||
date.c | ||
decorate.c | ||
decorate.h | ||
delta.h | ||
diff-delta.c | ||
diff-lib.c | ||
diff-no-index.c | ||
diff.c | ||
diff.h | ||
diffcore-break.c | ||
diffcore-delta.c | ||
diffcore-order.c | ||
diffcore-pickaxe.c | ||
diffcore-rename.c | ||
diffcore.h | ||
dir.c | ||
dir.h | ||
editor.c | ||
entry.c | ||
environment.c | ||
exec_cmd.c | ||
exec_cmd.h | ||
fast-import.c | ||
fetch-pack.h | ||
fixup-builtins | ||
fsck.c | ||
fsck.h | ||
generate-cmdlist.sh | ||
gettext.c | ||
gettext.h | ||
git-add--interactive.perl | ||
git-am.sh | ||
git-archimport.perl | ||
git-bisect.sh | ||
git-compat-util.h | ||
git-cvsexportcommit.perl | ||
git-cvsimport.perl | ||
git-cvsserver.perl | ||
git-difftool--helper.sh | ||
git-difftool.perl | ||
git-filter-branch.sh | ||
git-instaweb.sh | ||
git-lost-found.sh | ||
git-merge-octopus.sh | ||
git-merge-one-file.sh | ||
git-merge-resolve.sh | ||
git-mergetool--lib.sh | ||
git-mergetool.sh | ||
git-parse-remote.sh | ||
git-pull.sh | ||
git-quiltimport.sh | ||
git-rebase--am.sh | ||
git-rebase--interactive.sh | ||
git-rebase--merge.sh | ||
git-rebase.sh | ||
git-relink.perl | ||
git-remote-testgit.py | ||
git-repack.sh | ||
git-request-pull.sh | ||
git-send-email.perl | ||
git-sh-i18n.sh | ||
git-sh-setup.sh | ||
git-stash.sh | ||
git-submodule.sh | ||
git-svn.perl | ||
GIT-VERSION-GEN | ||
git-web--browse.sh | ||
git.c | ||
git.spec.in | ||
graph.c | ||
graph.h | ||
grep.c | ||
grep.h | ||
hash.c | ||
hash.h | ||
help.c | ||
help.h | ||
hex.c | ||
http-backend.c | ||
http-fetch.c | ||
http-push.c | ||
http-walker.c | ||
http.c | ||
http.h | ||
ident.c | ||
imap-send.c | ||
INSTALL | ||
levenshtein.c | ||
levenshtein.h | ||
LGPL-2.1 | ||
list-objects.c | ||
list-objects.h | ||
ll-merge.c | ||
ll-merge.h | ||
lockfile.c | ||
log-tree.c | ||
log-tree.h | ||
mailmap.c | ||
mailmap.h | ||
Makefile | ||
match-trees.c | ||
merge-file.c | ||
merge-file.h | ||
merge-recursive.c | ||
merge-recursive.h | ||
name-hash.c | ||
notes-cache.c | ||
notes-cache.h | ||
notes-merge.c | ||
notes-merge.h | ||
notes.c | ||
notes.h | ||
object.c | ||
object.h | ||
pack-check.c | ||
pack-refs.c | ||
pack-refs.h | ||
pack-revindex.c | ||
pack-revindex.h | ||
pack-write.c | ||
pack.h | ||
pager.c | ||
parse-options.c | ||
parse-options.h | ||
patch-delta.c | ||
patch-ids.c | ||
patch-ids.h | ||
path.c | ||
pkt-line.c | ||
pkt-line.h | ||
preload-index.c | ||
pretty.c | ||
progress.c | ||
progress.h | ||
quote.c | ||
quote.h | ||
reachable.c | ||
reachable.h | ||
read-cache.c | ||
README | ||
reflog-walk.c | ||
reflog-walk.h | ||
refs.c | ||
refs.h | ||
RelNotes | ||
remote-curl.c | ||
remote.c | ||
remote.h | ||
replace_object.c | ||
rerere.c | ||
rerere.h | ||
resolve-undo.c | ||
resolve-undo.h | ||
revision.c | ||
revision.h | ||
run-command.c | ||
run-command.h | ||
send-pack.h | ||
server-info.c | ||
setup.c | ||
sh-i18n--envsubst.c | ||
sha1_file.c | ||
sha1_name.c | ||
sha1-array.c | ||
sha1-array.h | ||
sha1-lookup.c | ||
sha1-lookup.h | ||
shallow.c | ||
shell.c | ||
shortlog.h | ||
show-index.c | ||
sideband.c | ||
sideband.h | ||
sigchain.c | ||
sigchain.h | ||
strbuf.c | ||
strbuf.h | ||
string-list.c | ||
string-list.h | ||
submodule.c | ||
submodule.h | ||
symlinks.c | ||
tag.c | ||
tag.h | ||
tar.h | ||
test-chmtime.c | ||
test-ctype.c | ||
test-date.c | ||
test-delta.c | ||
test-dump-cache-tree.c | ||
test-genrandom.c | ||
test-index-version.c | ||
test-line-buffer.c | ||
test-match-trees.c | ||
test-mktemp.c | ||
test-obj-pool.c | ||
test-parse-options.c | ||
test-path-utils.c | ||
test-run-command.c | ||
test-sha1.c | ||
test-sha1.sh | ||
test-sigchain.c | ||
test-string-pool.c | ||
test-subprocess.c | ||
test-svn-fe.c | ||
test-treap.c | ||
thread-utils.c | ||
thread-utils.h | ||
trace.c | ||
transport-helper.c | ||
transport.c | ||
transport.h | ||
tree-diff.c | ||
tree-walk.c | ||
tree-walk.h | ||
tree.c | ||
tree.h | ||
unimplemented.sh | ||
unpack-trees.c | ||
unpack-trees.h | ||
upload-pack.c | ||
url.c | ||
url.h | ||
usage.c | ||
userdiff.c | ||
userdiff.h | ||
utf8.c | ||
utf8.h | ||
walker.c | ||
walker.h | ||
wrap-for-bin.sh | ||
wrapper.c | ||
write_or_die.c | ||
ws.c | ||
wt-status.c | ||
wt-status.h | ||
xdiff-interface.c | ||
xdiff-interface.h | ||
zlib.c |
//////////////////////////////////////////////////////////////// GIT - the stupid content tracker //////////////////////////////////////////////////////////////// "git" can mean anything, depending on your mood. - random three-letter combination that is pronounceable, and not actually used by any common UNIX command. The fact that it is a mispronunciation of "get" may or may not be relevant. - stupid. contemptible and despicable. simple. Take your pick from the dictionary of slang. - "global information tracker": you're in a good mood, and it actually works for you. Angels sing, and a light suddenly fills the room. - "goddamn idiotic truckload of sh*t": when it breaks Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals. Git is an Open Source project covered by the GNU General Public License. It was originally written by Linus Torvalds with help of a group of hackers around the net. It is currently maintained by Junio C Hamano. Please read the file INSTALL for installation instructions. See Documentation/gittutorial.txt to get started, then see Documentation/everyday.txt for a useful minimum set of commands, and Documentation/git-commandname.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with "man gittutorial" or "git help tutorial", and the documentation of each command with "man git-commandname" or "git help commandname". CVS users may also want to read Documentation/gitcvs-migration.txt ("man gitcvs-migration" or "git help cvs-migration" if git is installed). Many Git online resources are accessible from http://git-scm.com/ including full documentation and Git related tools. The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org. To subscribe to the list, send an email with just "subscribe git" in the body to majordomo@vger.kernel.org. The mailing list archives are available at http://marc.theaimsgroup.com/?l=git and other archival sites. The messages titled "A note from the maintainer", "What's in git.git (stable)" and "What's cooking in git.git (topics)" and the discussion following them on the mailing list give a good reference for project status, development direction and remaining tasks.