* maint-2.23: (44 commits)
Git 2.23.1
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
...
* maint-2.22: (43 commits)
Git 2.22.2
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
...
* maint-2.21: (42 commits)
Git 2.21.1
mingw: sh arguments need quoting in more circumstances
mingw: fix quoting of empty arguments for `sh`
mingw: use MSYS2 quoting even when spawning shell scripts
mingw: detect when MSYS2's sh is to be spawned more robustly
t7415: drop v2.20.x-specific work-around
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
...
* maint-2.20: (36 commits)
Git 2.20.2
t7415: adjust test for dubiously-nested submodule gitdirs for v2.20.x
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
...
* maint-2.19: (34 commits)
Git 2.19.3
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
...
* maint-2.18: (33 commits)
Git 2.18.2
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
...
* maint-2.17: (32 commits)
Git 2.17.3
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
...
* maint-2.16: (31 commits)
Git 2.16.6
test-drop-caches: use `has_dos_drive_prefix()`
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
...
* maint-2.15: (29 commits)
Git 2.15.4
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
...
* maint-2.14: (28 commits)
Git 2.14.6
mingw: handle `subst`-ed "DOS drives"
mingw: refuse to access paths with trailing spaces or periods
mingw: refuse to access paths with illegal characters
unpack-trees: let merged_entry() pass through do_add_entry()'s errors
quote-stress-test: offer to test quoting arguments for MSYS2 sh
t6130/t9350: prepare for stringent Win32 path validation
quote-stress-test: allow skipping some trials
quote-stress-test: accept arguments to test via the command-line
tests: add a helper to stress test argument quoting
mingw: fix quoting of arguments
Disallow dubiously-nested submodule git directories
protect_ntfs: turn on NTFS protection by default
path: also guard `.gitmodules` against NTFS Alternate Data Streams
is_ntfs_dotgit(): speed it up
mingw: disallow backslash characters in tree objects' file names
path: safeguard `.git` against NTFS Alternate Streams Accesses
clone --recurse-submodules: prevent name squatting on Windows
is_ntfs_dotgit(): only verify the leading segment
test-path-utils: offer to run a protectNTFS/protectHFS benchmark
...
As with export-marks in the previous commit, import-marks can access the
filesystem. This is significantly less dangerous than export-marks
because it only involves reading from arbitrary paths, rather than
writing them. However, it could still be surprising and have security
implications (e.g., exfiltrating data from a service that accepts
fast-import streams).
Let's lump it (and its "if-exists" counterpart) in with export-marks,
and enable the in-stream version only if --allow-unsafe-features is set.
Signed-off-by: Jeff King <peff@peff.net>
The fast-import stream command "feature export-marks=<path>" lets the
stream write marks to an arbitrary path. This may be surprising if you
are running fast-import against an untrusted input (which otherwise
cannot do anything except update Git objects and refs).
Let's disallow the use of this feature by default, and provide a
command-line option to re-enable it (you can always just use the
command-line --export-marks as well, but the in-stream version provides
an easy way for exporters to control the process).
This is a backwards-incompatible change, since the default is flipping
to the new, safer behavior. However, since the main users of the
in-stream versions would be import/export-based remote helpers, and
since we trust remote helpers already (which are already running
arbitrary code), we'll pass the new option by default when reading a
remote helper's stream. This should minimize the impact.
Note that the implementation isn't totally simple, as we have to work
around the fact that fast-import doesn't parse its command-line options
until after it has read any "feature" lines from the stream. This is how
it lets command-line options override in-stream. But in our case, it's
important to parse the new --allow-unsafe-features first.
There are three options for resolving this:
1. Do a separate "early" pass over the options. This is easy for us to
do because there are no command-line options that allow the
"unstuck" form (so there's no chance of us mistaking an argument
for an option), though it does introduce a risk of incorrect
parsing later (e.g,. if we convert to parse-options).
2. Move the option parsing phase back to the start of the program, but
teach the stream-reading code never to override an existing value.
This is tricky, because stream "feature" lines override each other
(meaning we'd have to start tracking the source for every option).
3. Accept that we might parse a "feature export-marks" line that is
forbidden, as long we don't _act_ on it until after we've parsed
the command line options.
This would, in fact, work with the current code, but only because
the previous patch fixed the export-marks parser to avoid touching
the filesystem.
So while it works, it does carry risk of somebody getting it wrong
in the future in a rather subtle and unsafe way.
I've gone with option (1) here as simple, safe, and unlikely to cause
regressions.
This fixes CVE-2019-1348.
Signed-off-by: Jeff King <peff@peff.net>
When we parse the --export-marks option, we don't immediately open the
file, but we do create any leading directories. This can be especially
confusing when a command-line option overrides an in-stream one, in
which case we'd create the leading directory for the in-stream file,
even though we never actually write the file.
Let's instead create the directories just before opening the file, which
means we'll create only useful directories. Note that this could change
the handling of relative paths if we chdir() in between, but we don't
actually do so; the only permanent chdir is from setup_git_directory()
which runs before either code path (potentially we should take the
pre-setup dir into account to avoid surprising the user, but that's an
orthogonal change).
The test just adapts the existing "override" test to use paths with
leading directories. This checks both that the correct directory is
created (which worked before but was not tested), and that the
overridden one is not (our new fix here).
While we're here, let's also check the error result of
safe_create_leading_directories(). We'd presumably notice any failure
immediately after when we try to open the file itself, but we can give a
more specific error message in this case.
Signed-off-by: Jeff King <peff@peff.net>
When asked to import marks from "subdir/file.marks", we create the
leading directory "subdir" if it doesn't exist. This makes no sense for
importing marks, where we only ever open the path for reading.
Most of the time this would be a noop, since if the marks file exists,
then the leading directories exist, too. But if it doesn't (e.g.,
because --import-marks-if-exists was used), then we'd create the useless
directory.
This dates back to 580d5f83e7 (fast-import: always create marks_file
directories, 2010-03-29). Even then it was useless, so it seems to have
been added in error alongside the --export-marks case (which _is_
helpful).
Signed-off-by: Jeff King <peff@peff.net>
We parse options like "--max-pack-size=" using skip_prefix(), which
makes sense to get at the bytes after the "=". However, we also parse
"--quiet" and "--stats" with skip_prefix(), which allows things like
"--quiet-nonsense" to behave like "--quiet".
This was a mistaken conversion in 0f6927c229 (fast-import: put option
parsing code in separate functions, 2009-12-04). Let's tighten this to
an exact match, which was the original intent.
Signed-off-by: Jeff King <peff@peff.net>
Updates to fast-import/export.
* en/fast-imexport-nested-tags:
fast-export: handle nested tags
t9350: add tests for tags of things other than a commit
fast-export: allow user to request tags be marked with --mark-tags
fast-export: add support for --import-marks-if-exists
fast-import: add support for new 'alias' command
fast-import: allow tags to be identified by mark labels
fast-import: fix handling of deleted tags
fast-export: fix exporting a tag and nothing else
fast-export and fast-import have nice --import-marks flags which allow
for incremental migrations. However, if there is a mark in
fast-export's file of marks without a corresponding mark in the one for
fast-import, then we run the risk that fast-export tries to send new
objects relative to the mark it knows which fast-import does not,
causing fast-import to fail.
This arises in practice when there is a filter of some sort running
between the fast-export and fast-import processes which prunes some
commits programmatically. Provide such a filter with the ability to
alias pruned commits to their most recent non-pruned ancestor.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Mark identifiers are used in fast-export and fast-import to provide a
label to refer to earlier content. Blobs are given labels because they
need to be referenced in the commits where they first appear with a
given filename, and commits are given labels because they can be the
parents of other commits. Tags were never given labels, probably
because they were viewed as unnecessary, but that presents two problems:
1. It leaves us without a way of referring to previous tags if we
want to create a tag of a tag (or higher nestings).
2. It leaves us with no way of recording that a tag has already been
imported when using --export-marks and --import-marks.
Fix these problems by allowing an optional mark label for tags.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
If our input stream includes a tag which is later deleted, we were not
properly deleting it. We did have a step which would delete it, but we
left a tag in the tag list noting that it needed to be updated, and the
updating of annotated tags occurred AFTER ref deletion. So, when we
record that a tag needs to be deleted, also remove it from the list of
annotated tags to update.
While this has likely been something that has not happened in practice,
it will come up more in order to support nested tags. For nested tags,
we either need to give temporary names to the intermediate tags and then
delete them, or else we need to use the final name for the intermediate
tags. If we use the final name for the intermediate tags, then in order
to keep the sanity check that someone doesn't try to update the same tag
twice, we need to delete the ref after creating the intermediate tag.
So, either way nested tags imply the need to delete temporary inner tag
references.
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Fast-import's read_next_command() has somewhat odd memory ownership
semantics for the command_buf strbuf. After reading a command, we copy
the strbuf's pointer (without duplicating the string) into our cmd_hist
array of recent commands. And then when we're about to read a new
command, we clear the strbuf by calling strbuf_detach(), dropping
ownership from the strbuf (leaving the cmd_hist reference as the
remaining owner).
This has a few surprising implications:
- if the strbuf hasn't been copied into cmd_hist (e.g., because we
haven't ready any commands yet), then the strbuf_detach() will leak
the resulting string
- any modification to command_buf risks invalidating the pointer held
by cmd_hist. There doesn't seem to be any way to trigger this
currently (since we tend to modify it only by detaching and reading
in a new value), but it's subtly dangerous.
- any pointers into an input string will remain valid as long as
cmd_hist points to them. So in general, you can point into
command_buf.buf and call read_next_command() up to 100 times before
your string is cycled out and freed, leaving you with a dangling
pointer. This makes it easy to miss bugs during testing, as they
might trigger only for a sufficiently large commit (e.g., the bug
fixed in the previous commit).
Instead, let's make a new string to copy the command into the history
array, rather than having dual ownership with the old. Then we can drop
the strbuf_detach() calls entirely, and just reuse the same buffer
within command_buf over and over. We'd normally have to strbuf_reset()
it before using it again, but in both cases here we're using
strbuf_getline(), which does it automatically for us.
This fixes the leak, and it means that even a single call to
read_next_command() will invalidate any held pointers, making it easier
to find bugs. In fact, we can drop the extra input lines added to the
test case by the previous commit, as the unfixed bug would now trigger
just from reading the commit message, even without any modified files in
the commit.
Reported-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We read each line of the fast-import stream into the command_buf strbuf.
When reading a commit, we parse a line like "encoding foo" by storing a
pointer to "foo", but not making a copy. We may then read an unbounded
number of other lines (e.g., one for each modified file in the commit),
each of which writes into command_buf.
This works out in practice for small cases, because we hand off
ownership of the heap buffer from command_buf to the cmd_hist array, and
read new commands into a fresh heap buffer. And thus the pointer to
"foo" remains valid as long as there aren't so many intermediate lines
that we end up dropping the original "encoding" line from the history.
But as the test modification shows, if we go over our default of 100
lines, we end up with our encoding string pointing into freed heap
memory. This seems to fail reliably by writing garbage into the output,
but running under ASan definitely detects this as a use-after-free.
We can fix it by duplicating the encoding value, just as we do for other
parsed lines (e.g., an author line ends up in parse_ident, which copies
it to a new string).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The tree-walk API learned to pass an in-core repository
instance throughout more codepaths.
* nd/tree-walk-with-repo:
t7814: do not generate same commits in different repos
Use the right 'struct repository' instead of the_repository
match-trees.c: remove the_repo from shift_tree*()
tree-walk.c: remove the_repo from get_tree_entry_follow_symlinks()
tree-walk.c: remove the_repo from get_tree_entry()
tree-walk.c: remove the_repo from fill_tree_descriptor()
sha1-file.c: remove the_repo from read_object_with_reference()
Convert calls of memcpy(3) to use COPY_ARRAY, which shortens and
simplifies the code a bit.
Patch generated by Coccinelle and contrib/coccinelle/array.cocci.
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since git supports commit messages with an encoding other than UTF-8,
allow fast-import to import such commits. This may be useful for folks
who do not want to reencode commit messages from an external system, and
may also be useful to achieve reversible history rewrites (e.g. sha1sum
<-> sha256sum transitions or subtree work) with git repositories that
have used specialized encodings in their commit history.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Conversion from unsigned char[20] to struct object_id continues.
* bc/hash-transition-16: (35 commits)
gitweb: make hash size independent
Git.pm: make hash size independent
read-cache: read data in a hash-independent way
dir: make untracked cache extension hash size independent
builtin/difftool: use parse_oid_hex
refspec: make hash size independent
archive: convert struct archiver_args to object_id
builtin/get-tar-commit-id: make hash size independent
get-tar-commit-id: parse comment record
hash: add a function to lookup hash algorithm by length
remote-curl: make hash size independent
http: replace sha1_to_hex
http: compute hash of downloaded objects using the_hash_algo
http: replace hard-coded constant with the_hash_algo
http-walker: replace sha1_to_hex
http-push: remove remaining uses of sha1_to_hex
http-backend: allow 64-character hex names
http-push: convert to use the_hash_algo
builtin/pull: make hash-size independent
builtin/am: make hash size independent
...
When get-mark was introduced in commit 28c7b1f7b7 ("fast-import: add a
get-mark command", 2015-07-01), it followed the precedent of the
cat-blob command to be allowed on any line other than in the middle of a
data directive; see commit 777f80d742 ("fast-import: Allow cat-blob
requests at arbitrary points in stream", 2010-11-28). It was useful to
allow cat-blob directives in the middle of a commit to get more data
that would be used in writing the current commit object. get-mark is
not similarly useful since fast-import can already use either object id
or mark. Further, trying to allow this command anywhere caused parsing
bugs. Fix the parsing problems by only allowing get-mark commands to
appear when other commands have completed.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In commit 777f80d742 ("fast-import: Allow cat-blob requests at
arbitrary points in stream", 2010-11-28), fast-import started allowing
cat-blob commands to appear on the start of any line except in the
middle of a "data" command. It could be in the middle of various
directives that were part of a tag command, or in the middle of
checkpoints or progresses (each of which allow an optional second empty
newline), or even immediately after the mark command of a blob before
the data directive appeared (raising the question of what if it used the
mark for the blob that just barely appeared in the stream that we do not
yet have the data for). None of these locations make any sense as
places to put cat-blob requests.
The purpose of this change as stated in that commit message was to
[save] frontends from having to loop over everything they want to
commit in the next commit and cat-ing the necessary objects in
advance.
However, that can be achieved by simply allowing cat-blob requests to
appear whenever a filemodify directive is allowed. Further, it avoids
setting a bad precedent for other commands to follow (e.g. get-mark); a
precedent which caused parsing problems in corner cases.
Technically, inline filemodify directives add a slight wrinkle in that
frontends might want to have cat-blob directives appear after the start
of the filemodify and before the data directive contained within it. I
think it would have been better to disallow such a case (it would be
trivial to use cat-blob before the filemodify instead), but since there
is evidence this was used, for backwards compatibility let's support
that case too.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This is not a very important change, and one that I expect to have no
performance impact whatsoever, but reading the code bothered me. The
parsing of command types in cmd_main() mostly runs in order of most
common to least common commands; sure, it's hard to say for sure what
the most common are without some type of study, but it seems fairly
clear to mark the original four ("blob", "commit", "tag", "reset") as
the most prominent. Indeed, the parsing for most other commands were
added to later in the list. However, when "ls" was added, it was stuck
near the top of the list, with no rationale for that particular
location. Move it down to later to appease my Tourette's-like internal
twitching that its former location was causing.
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace the uses of sha1_to_hex in this function with hash_to_hex to
allow the use of SHA-256 as well. Rename a variable since it is no
longer limited to SHA-1.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Replace several uses of GIT_SHA1_HEXSZ and 40-based constants with
references to the_hash_algo. Update the note handling code here to
compute path sizes based on GIT_MAX_RAWSZ as well.
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This member is used to represent the pack checksum of the pack in
question. Expand this member to be GIT_MAX_RAWSZ bytes in length so it
works with longer hashes and rename it to be "hash" instead of "sha1".
This transformation was made with a change to the definition and the
following semantic patch:
@@
struct packed_git *E1;
@@
- E1->sha1
+ E1->hash
@@
struct packed_git E1;
@@
- E1.sha1
+ E1.hash
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Small fixes and features for fast-export and fast-import, mostly on
the fast-export side.
* en/fast-export-import:
fast-export: add a --show-original-ids option to show original names
fast-import: remove unmaintained duplicate documentation
fast-export: add --reference-excluded-parents option
fast-export: ensure we export requested refs
fast-export: when using paths, avoid corrupt stream with non-existent mark
fast-export: move commit rewriting logic into a function for reuse
fast-export: avoid dying when filtering by paths and old tags exist
fast-export: use value from correct enum
git-fast-export.txt: clarify misleading documentation about rev-list args
git-fast-import.txt: fix documentation for --quiet option
fast-export: convert sha1 to oid
Knowing the original names (hashes) of commits can sometimes enable
post-filtering that would otherwise be difficult or impossible. In
particular, the desire to rewrite commit messages which refer to other
prior commits (on top of whatever other filtering is being done) is
very difficult without knowing the original names of each commit.
In addition, knowing the original names (hashes) of blobs can allow
filtering by blob-id without requiring re-hashing the content of the
blob, and is thus useful as a small optimization.
Once we add original ids for both commits and blobs, we may as well
add them for tags too for completeness. Perhaps someone will have a
use for them.
This commit teaches a new --show-original-ids option to fast-export
which will make it add a 'original-oid <hash>' line to blob, commits,
and tags. It also teaches fast-import to parse (and ignore) such
lines.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
fast-import.c has started with a comment for nine and a half years
re-directing the reader to Documentation/git-fast-import.txt for
maintained documentation. Instead of leaving the unmaintained
documentation in place, just excise it.
Signed-off-by: Elijah Newren <newren@gmail.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.
Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.
Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
spatch transformation to replace boolean uses of !hashcmp() to
newly introduced oideq() is added, and applied, to regain
performance lost due to support of multiple hash algorithms.
* jk/cocci:
show_dirstat: simplify same-content check
read-cache: use oideq() in ce_compare functions
convert hashmap comparison functions to oideq()
convert "hashcmp() != 0" to "!hasheq()"
convert "oidcmp() != 0" to "!oideq()"
convert "hashcmp() == 0" to hasheq()
convert "oidcmp() == 0" to oideq()
introduce hasheq() and oideq()
coccinelle: use <...> for function exclusion
The code for computing history reachability has been shuffled,
obtained a bunch of new tests to cover them, and then being
improved.
* ds/reachable:
commit-reach: correct accidental #include of C file
commit-reach: use can_all_from_reach
commit-reach: make can_all_from_reach... linear
commit-reach: replace ref_newer logic
test-reach: test commit_contains
test-reach: test can_all_from_reach_with_flags
test-reach: test reduce_heads
test-reach: test get_merge_bases_many
test-reach: test is_descendant_of
test-reach: test in_merge_bases
test-reach: create new test tool for ref_newer
commit-reach: move can_all_from_reach_with_flags
upload-pack: generalize commit date cutoff
upload-pack: refactor ok_to_give_up()
upload-pack: make reachable() more generic
commit-reach: move commit_contains from ref-filter
commit-reach: move ref_newer from remote.c
commit.h: remove method declarations
commit-reach: move walk methods from commit.c
This is the flip side of the previous two patches: checking
for a non-zero oidcmp() can be more strictly expressed as
inequality. Like those patches, we write "!= 0" in the
coccinelle transformation, which covers by isomorphism the
more common:
if (oidcmp(E1, E2))
As with the previous two patches, this patch can be achieved
almost entirely by running "make coccicheck"; the only
differences are manual line-wrap fixes to match the original
code.
There is one thing to note for anybody replicating this,
though: coccinelle 1.0.4 seems to miss the case in
builtin/tag.c, even though it's basically the same as all
the others. Running with 1.0.7 does catch this, so
presumably it's just a coccinelle bug that was fixed in the
interim.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.
The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).
This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.
I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
These methods are now declared in commit-reach.h. Remove them from
commit.h and add new include statements in all files that require these
declarations.
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We know diff_delta() returns NULL, saying "no good delta exists for
it", when fed an empty data. Check the length of the data in the
caller to avoid such a call.
This incidentally reduces the number of attempted deltification we
see in the final statistics.
Signed-off-by: Mike Hommey <mh@glandium.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a repository argument to allow callers of
lookup_commit_reference_gently to be more specific about which
repository to handle. This is a small mechanical change; it doesn't
change the implementation to handle repositories other than
the_repository yet.
As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Code clean-up to adjust to a more recent lockfile API convention that
allows lockfile instances kept on the stack.
* ma/lockfile-cleanup:
lock_file: move static locks into functions
lock_file: make function-local locks non-static
refs.c: do not die if locking fails in `delete_pseudoref()`
refs.c: do not die if locking fails in `write_pseudoref()`
t/helper/test-write-cache: clean up lock-handling
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).
* sb/oid-object-info:
cache.h: allow oid_object_info to handle arbitrary repositories
packfile: add repository argument to cache_or_unpack_entry
packfile: add repository argument to unpack_entry
packfile: add repository argument to read_object
packfile: add repository argument to packed_object_info
packfile: add repository argument to packed_to_object_type
packfile: add repository argument to retry_bad_packed_offset
cache.h: add repository argument to oid_object_info
cache.h: add repository argument to oid_object_info_extended
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)
These `struct lock_file`s are local to their respective functions and we
can drop their staticness.
For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.
As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.
Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.
* ds/commit-graph:
commit-graph: implement "--append" option
commit-graph: build graph from starting commits
commit-graph: read only from specific pack-indexes
commit: integrate commit graph with commit parsing
commit-graph: close under reachability
commit-graph: add core.commitGraph setting
commit-graph: implement git commit-graph read
commit-graph: implement git-commit-graph write
commit-graph: implement write_commit_graph()
commit-graph: create git-commit-graph builtin
graph: add commit graph design document
commit-graph: add format document
csum-file: refactor finalize_hashfile() method
csum-file: rename hashclose() to finalize_hashfile()