Commit Graph

33 Commits

Author SHA1 Message Date
Junio C Hamano
f2e2136ad7 Merge branch 'md/test-cleanup'
Various test scripts have been updated for style and also correct
handling of exit status of various commands.

* md/test-cleanup:
  tests: order arguments to git-rev-list properly
  t9109: don't swallow Git errors upstream of pipes
  tests: don't swallow Git errors upstream of pipes
  t/*: fix ordering of expected/observed arguments
  tests: standardize pipe placement
  Documentation: add shell guidelines
  t/README: reformat Do, Don't, Keep in mind lists
2018-10-16 16:16:01 +09:00
Matthew DeVore
bdbc17e86a tests: standardize pipe placement
Instead of using a line-continuation and pipe on the second line, take
advantage of the shell's implicit line continuation after a pipe
character.  So for example, instead of

	some long line \
		| next line

use

	some long line |
	next line

And add a blank line before and after the pipe where it aids readability
(it usually does).

This better matches the coding style documented in
Documentation/CodingGuidelines and used in shell scripts elsewhere in
the tree.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:51:18 +09:00
brian m. carlson
e95f53137d t1006: make hash size independent
Compute the size of the tree and commit objects we're creating by
checking for the size of an object ID and computing the resulting sizes
accordingly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-17 08:10:32 -07:00
Jeff King
0750bb5b51 cat-file: support "unordered" output for --batch-all-objects
If you're going to access the contents of every object in a
packfile, it's generally much more efficient to do so in
pack order, rather than in hash order. That increases the
locality of access within the packfile, which in turn is
friendlier to the delta base cache, since the packfile puts
related deltas next to each other. By contrast, hash order
is effectively random, since the sha1 has no discernible
relationship to the content.

This patch introduces an "--unordered" option to cat-file
which iterates over packs in pack-order under the hood. You
can see the results when dumping all of the file content:

  $ time ./git cat-file --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m44.491s
  user	0m42.902s
  sys	0m5.230s

  $ time ./git cat-file --unordered \
                        --batch-all-objects --buffer --batch | wc -c
  6883195596

  real	0m6.075s
  user	0m4.774s
  sys	0m3.548s

Same output, different order, way faster. The same speed-up
applies even if you end up accessing the object content in a
different process, like:

  git cat-file --batch-all-objects --buffer --batch-check |
  grep blob |
  git cat-file --batch='%(objectname) %(rest)' |
  wc -c

Adding "--unordered" to the first command drops the runtime
in git.git from 24s to 3.5s.

  Side note: there are actually further speedups available
  for doing it all in-process now. Since we are outputting
  the object content during the actual pack iteration, we
  know where to find the object and could skip the extra
  lookup done by oid_object_info(). This patch stops short
  of that optimization since the underlying API isn't ready
  for us to make those sorts of direct requests.

So if --unordered is so much better, why not make it the
default? Two reasons:

  1. We've promised in the documentation that --batch-all-objects
     outputs in hash order. Since cat-file is plumbing,
     people may be relying on that default, and we can't
     change it.

  2. It's actually _slower_ for some cases. We have to
     compute the pack revindex to walk in pack order. And
     our de-duplication step uses an oidset, rather than a
     sort-and-dedup, which can end up being more expensive.
     If we're just accessing the type and size of each
     object, for example, like:

       git cat-file --batch-all-objects --buffer --batch-check

     my best-of-five warm cache timings go from 900ms to
     1100ms using --unordered. Though it's possible in a
     cold-cache or under memory pressure that we could do
     better, since we'd have better locality within the
     packfile.

And one final question: why is it "--unordered" and not
"--pack-order"? The answer is again two-fold:

  1. "pack order" isn't a well-defined thing across the
     whole set of objects. We're hitting loose objects, as
     well as objects in multiple packs, and the only
     ordering we're promising is _within_ a single pack. The
     rest is apparently random.

  2. The point here is optimization. So we don't want to
     promise any particular ordering, but only to say that
     we will choose an ordering which is likely to be
     efficient for accessing the object content. That leaves
     the door open for further changes in the future without
     having to add another compatibility option.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:31 -07:00
Jeff King
aa2f5ef500 t1006: test cat-file --batch-all-objects with duplicates
The test for --batch-all-objects in t1006 covers a variety
of object storage situations, but one thing it doesn't cover
is that we avoid mentioning duplicate objects. We won't have
any because running "git repack -ad" will have packed them
all and deleted the loose ones.

This does work (because we sort and de-dup the output list),
but it's good to include it in our test. And doubly so for
when we add an unordered mode which has to de-dup in a
different way.

Note that we cannot just re-create one of the objects, as
Git will omit the write of an object that is already
present. However, we can create a new pack with one of the
objects, which forces the duplication.

One alternative would be to just use "git repack -a" instead
of "-ad". But then _every_ object would be duplicated as
loose and packed, and we might miss a bug that omits packed
objects (because we'd show their loose counterparts).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 13:48:29 -07:00
brian m. carlson
8125a58b91 t: switch $_z40 to $ZERO_OID
Switch all uses of $_z40 to $ZERO_OID so that they work correctly with
larger hashes.  This commit was created by using the following sed
command to modify all files in the t directory except t/test-lib.sh:

  sed -i 's/\$_z40/$ZERO_OID/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-14 11:02:00 +09:00
Nguyễn Thái Ngọc Duy
c680668d1a t/helper: merge test-genrandom into test-tool
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-27 08:45:47 -07:00
Ville Skyttä
2e3a16b279 Spelling fixes
<BAD>                     <CORRECTED>
    accidently                accidentally
    commited                  committed
    dependancy                dependency
    emtpy                     empty
    existance                 existence
    explicitely               explicitly
    git-upload-achive         git-upload-archive
    hierachy                  hierarchy
    indegee                   indegree
    intial                    initial
    mulitple                  multiple
    non-existant              non-existent
    precendence.              precedence.
    priviledged               privileged
    programatically           programmatically
    psuedo-binary             pseudo-binary
    soemwhere                 somewhere
    successfull               successful
    transfering               transferring
    uncommited                uncommitted
    unkown                    unknown
    usefull                   useful
    writting                  writing

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:35:42 -07:00
Jeff King
3115ee45c8 cat-file: sort and de-dup output of --batch-all-objects
The sorting we could probably live without, but printing
duplicates is just a hassle for the user, who must then
de-dup themselves (or risk a wrong answer if they are doing
something like counting objects with a particular property).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-26 09:24:42 -07:00
Jeff King
6a951937ae cat-file: add --batch-all-objects option
It can sometimes be useful to examine all objects in the
repository. Normally this is done with "git rev-list --all
--objects", but:

  1. That shows only reachable objects. You may want to look
     at all available objects.

  2. It's slow. We actually open each object to walk the
     graph. If your operation is OK with seeing unreachable
     objects, it's an order of magnitude faster to just
     enumerate the loose directories and pack indices.

You can do this yourself using "ls" and "git show-index",
but it's non-obvious.  This patch adds an option to
"cat-file --batch-check" to operate on all available
objects (rather than reading names from stdin).

This is based on a proposal by Charles Bailey to provide a
separate "git list-all-objects" command. That is more
orthogonal, as it splits enumerating the objects from
getting information about them. However, in practice you
will either:

  a. Feed the list of objects directly into cat-file anyway,
     so you can find out information about them. Keeping it
     in a single process is more efficient.

  b. Ask the listing process to start telling you more
     information about the objects, in which case you will
     reinvent cat-file's batch-check formatter.

Adding a cat-file option is simple and efficient. And if you
really do want just the object names, you can always do:

  git cat-file --batch-check='%(objectname)' --batch-all-objects

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 14:55:52 -07:00
Junio C Hamano
67f0b6f3b2 Merge branch 'dt/cat-file-follow-symlinks'
"git cat-file --batch(-check)" learned the "--follow-symlinks"
option that follows an in-tree symbolic link when asked about an
object via extended SHA-1 syntax, e.g. HEAD:RelNotes that points at
Documentation/RelNotes/2.5.0.txt.  With the new option, the command
behaves as if HEAD:Documentation/RelNotes/2.5.0.txt was given as
input instead.

* dt/cat-file-follow-symlinks:
  cat-file: add --follow-symlinks to --batch
  sha1_name: get_sha1_with_context learns to follow symlinks
  tree-walk: learn get_tree_entry_follow_symlinks
2015-06-01 12:45:16 -07:00
David Turner
122d53464b cat-file: add --follow-symlinks to --batch
This wires the in-repo-symlink following code through to the cat-file
builtin.  In the event of an out-of-repo link, cat-file will print
the link in a new format.

Signed-off-by: David Turner <dturner@twopensource.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-20 13:46:21 -07:00
Karthik Nayak
3e370f9faf t1006: add tests for git cat-file --allow-unknown-type
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-06 13:47:18 -07:00
Jeff King
99094a7ad4 t: fix trivial &&-chain breakage
These are tests which are missing a link in their &&-chain,
but during a setup phase. We may fail to notice failure in
commands that build the test environment, but these are
typically not expected to fail at all (but it's still good
to double-check that our test environment is what we
expect).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-20 10:20:14 -07:00
Junio C Hamano
b2132068c6 Merge branch 'jk/oi-delta-base'
Teach "cat-file --batch" to show delta-base object name for a
packed object that is represented as a delta.

* jk/oi-delta-base:
  cat-file: provide %(deltabase) batch format
  sha1_object_info_extended: provide delta base sha1s
2014-01-10 10:33:11 -08:00
Junio C Hamano
604ada435b Merge branch 'jk/cat-file-regression-fix'
"git cat-file --batch=", an admittedly useless command, did not
behave very well.

* jk/cat-file-regression-fix:
  cat-file: handle --batch format with missing type/size
  cat-file: pass expand_data to print_object_or_die
2013-12-27 14:58:11 -08:00
Jeff King
65ea9c3c3d cat-file: provide %(deltabase) batch format
It can be useful for debugging or analysis to see which
objects are stored as delta bases on top of others. This
information is available by running `git verify-pack`, but
that is extremely expensive (and is harder than necessary to
parse).

Instead, let's make it available as a cat-file query format,
which makes it fast and simple to get the bases for a subset
of the objects.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-26 11:54:26 -08:00
Jeff King
6554dfa97a cat-file: handle --batch format with missing type/size
Commit 98e2092 taught cat-file to stream blobs with --batch,
which requires that we look up the object type before
loading it into memory.  As a result, we now print the
object header from information in sha1_object_info, and the
actual contents from the read_sha1_file. We double-check
that the information we printed in the header matches the
content we are about to show.

Later, commit 93d2a60 allowed custom header lines for
--batch, and commit 5b08640 made type lookups optional. As a
result, specifying a header line without the type or size
means that we will not look up those items at all.

This causes our double-checking to erroneously die with an
error; we think the type or size has changed, when in fact
it was simply left at "0".

For the size, we can fix this by only doing the consistency
double-check when we have retrieved the size via
sha1_object_info. In the case that we have not retrieved the
value, that means we also did not print it, so there is
nothing for us to check that we are consistent with.

We could do the same for the type. However, besides our
consistency check, we also care about the type in deciding
whether to stream or not. So instead of handling the case
where we do not know the type, this patch instead makes sure
that we always trigger a type lookup when we are printing,
so that even a format without the type will stream as we
would in the normal case.

Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-12 11:31:25 -08:00
Junio C Hamano
c17fa972d3 Merge branch 'sb/sha1-loose-object-info-check-existence'
"git cat-file --batch-check=ok" did not check the existence of the
named object.

* sb/sha1-loose-object-info-check-existence:
  sha1_loose_object_info(): do not return success on missing object
2013-12-05 13:00:12 -08:00
Junio C Hamano
4ef8d1dd03 sha1_loose_object_info(): do not return success on missing object
Since 052fe5ea (sha1_loose_object_info: make type lookup optional,
2013-07-12), sha1_loose_object_info() returns happily without
checking if the object in question exists, which is not what the the
caller sha1_object_info_extended() expects; the caller does not even
bother checking the existence of the object itself.

Noticed-by: Sven Brauch <svenbrauch@googlemail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-06 11:03:33 -08:00
Jeff King
97be04077f cat-file: only split on whitespace when %(rest) is used
Commit c334b87b (cat-file: split --batch input lines on whitespace,
2013-07-11) taught `cat-file --batch-check` to split input lines on
the first whitespace, and stash everything after the first token
into the %(rest) output format element.  It claimed:

   Object names cannot contain spaces, so any input with
   spaces would have resulted in a "missing" line.

But that is not correct.  Refs, object sha1s, and various peeling
suffixes cannot contain spaces, but some object names can. In
particular:

  1. Tree paths like "[<tree>]:path with whitespace"

  2. Reflog specifications like "@{2 days ago}"

  3. Commit searches like "rev^{/grep me}" or ":/grep me"

To remain backwards compatible, we cannot split on whitespace by
default, hence we will ship 1.8.4 with the commit reverted.

Resurrect its attempt but in a weaker form; only do the splitting
when "%(rest)" is used in the output format. Since that element did
not exist at all before c334b87, old scripts cannot be affected.

The existence of object names with spaces does mean that you
cannot reliably do:

  echo ":path with space and other data" |
  git cat-file --batch-check="%(objectname) %(rest)"

as it would split the path and feed only ":path" to get_sha1. But
that command is nonsensical. If you wanted to see "and other data"
in "%(rest)", git cannot possibly know where the filename ends and
the "rest" begins.

It might be more robust to have something like "-z" to separate the
input elements. But this patch is still a reasonable step before
having that.  It makes the easy cases easy; people who do not care
about %(rest) do not have to consider it, and the %(rest) code
handles the spaces and newlines of "rev-list --objects" correctly.

Hard cases remain hard but possible (if you might get whitespace in
your input, you do not get to use %(rest) and must split and join
the output yourself using more flexible tools). And most
importantly, it does not preclude us from having different splitting
rules later if a "-z" (or similar) option is added.  So we can make
the hard cases easier later, if we choose to.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 09:30:48 -07:00
Junio C Hamano
062aeee8aa Revert "cat-file: split --batch input lines on whitespace"
This reverts commit c334b87b30c1464a1ab563fe1fb8de5eaf0e5bac; the
update assumed that people only used the command to read from
"rev-list --objects" output, whose lines begin with a 40-hex object
name followed by a whitespace, but it turns out that scripts feed
random extended SHA-1 expressions (e.g. "HEAD:$pathname") in which
a whitespace has to be kept.
2013-08-02 09:29:30 -07:00
Jeff King
c334b87b30 cat-file: split --batch input lines on whitespace
If we get an input line to --batch or --batch-check that
looks like "HEAD foo bar", we will currently feed the whole
thing to get_sha1(). This means that to use --batch-check
with `rev-list --objects`, one must pre-process the input,
like:

  git rev-list --objects HEAD |
  cut -d' ' -f1 |
  git cat-file --batch-check

Besides being more typing and slightly less efficient to
invoke `cut`, the result loses information: we no longer
know which path each object was found at.

This patch teaches cat-file to split input lines at the
first whitespace. Everything to the left of the whitespace
is considered an object name, and everything to the right is
made available as the %(reset) atom. So you can now do:

  git rev-list --objects HEAD |
  git cat-file --batch-check='%(objectsize) %(rest)'

to collect object sizes at particular paths.

Even if %(rest) is not used, we always do the whitespace
split (which means you can simply eliminate the `cut`
command from the first example above).

This whitespace split is backwards compatible for any
reasonable input. Object names cannot contain spaces, so any
input with spaces would have resulted in a "missing" line.
The only input hurt is if somebody really expected input of
the form "HEAD is a fine-looking ref!" to fail; it will now
parse HEAD, and make "is a fine-looking ref!" available as
%(rest).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:18:42 -07:00
Jeff King
93d2a607ba cat-file: add --batch-check=<format>
The `cat-file --batch-check` command can be used to quickly
get information about a large number of objects. However, it
provides a fixed set of information.

This patch adds an optional <format> option to --batch-check
to allow a caller to specify which items they are interested
in, and in which order to output them. This is not very
exciting for now, since we provide the same limited set that
you could already get. However, it opens the door to adding
new format items in the future without breaking backwards
compatibility (or forcing callers to pay the cost to
calculate uninteresting items).

Since the --batch option shares code with --batch-check, it
receives the same feature, though it is less likely to be of
interest there.

The format atom names are chosen to match their counterparts
in for-each-ref. Though we do not (yet) share any code with
for-each-ref's formatter, this keeps the interface as
consistent as possible, and may help later on if the
implementations are unified.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-12 09:18:12 -07:00
Jeff King
03c893cbf9 t1006: modernize output comparisons
In modern tests, we typically put output into a file and
compare it with test_cmp. This is nicer than just comparing
via "test", and much shorter than comparing via "test" and
printing a custom message.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-11 10:37:14 -07:00
Jeff King
9cfa5126a0 cat-file: print tags raw for "cat-file -p"
When "cat-file -p" prints commits, it shows them in their
raw format, since git's format is already human-readable.
For tags, however, we print the whole thing raw except for
one thing: we convert the timestamp on the tagger line into a
human-readable date.

This dates all the way back to a0f15fa (Pretty-print tagger
dates, 2006-03-01). At that time there was no other way to
pretty-print a tag.  These days, however, neither of those
matters much. The normal way to pretty-print a tag is with
"git show", which is much more flexible than "cat-file -p".

Commit a0f15fa also built "verify-tag --verbose" (and
subsequently "tag -v") around the "cat-file -p" output.
However, that behavior was lost in commit 62e09ce (Make git
tag a builtin, 2007-07-20), and we went back to printing
the raw tag contents. Nobody seems to have noticed the bug
since then (and it is arguably a saner behavior anyway, as
it shows the actual bytes for which we verified the
signature).

Let's drop the tagger-date formatting for "cat-file -p". It
makes us more consistent with cat-file's commit
pretty-printer, and as a bonus, we can drop the hand-rolled
tag parsing code in cat-file (which happened to behave
inconsistently with the tag pretty-printing code elsewhere).

This is a change of output format, so it's possible that
some callers could considered this a regression. However,
the original behavior was arguably a bug (due to the
inconsistency with commits), likely nobody was relying on it
(even we do not use it ourselves these days), and anyone
relying on the "-p" pretty-printer should be able to expect
a change in the output format (i.e., while "cat-file" is
plumbing, the output format of "-p" was never guaranteed to
be stable).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-17 14:48:45 -07:00
Stefano Lattarini
41ccfdd9c9 Correct common spelling mistakes in comments and tests
Most of these were found using Lucas De Marchi's codespell tool.

Signed-off-by: Stefano Lattarini <stefano.lattarini@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-12 13:38:40 -07:00
Lea Wiemann
3c076dbe3c cat-file --batch / --batch-check: do not exit if hashes are missing
Previously, cat-file --batch / --batch-check would silently exit if it
was passed a non-existent SHA1 on stdin.  Now it prints "<SHA1>
missing" as in all other cases (and as advertised in the
documentation).

Note that cat-file --batch-check (but not --batch) will still output
"error: unable to find <SHA1>" on stderr if a non-existent SHA1 is
passed, but this does not affect parsing its stdout.

Also, type <= 0 was previously using the potentially uninitialized
type variable (relying on it being 0); it is now being initialized.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-09 13:46:08 -07:00
Lea Wiemann
4e44ae45fe t1006-cat-file.sh: typo
Previously timestamps were removed unconditionally (though this didn't
seem to break this test).  Now they are only removed if $no_ts is
non-empty.

Signed-off-by: Lea Wiemann <LeWiemann@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-06-09 13:31:23 -07:00
Michele Ballabio
6c41e21d48 change quoting in test t1006-cat-file.sh
Signed-off-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-23 12:08:42 -07:00
Adam Roben
a8128ed628 git-cat-file: Add --batch option
--batch is similar to --batch-check, except that the contents of each object is
also printed. The output's form is:

<sha1> SP <type> SP <size> LF
<contents> LF

Signed-off-by: Adam Roben <aroben@apple.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-05 21:18:37 -07:00
Adam Roben
05d5667fec git-cat-file: Add --batch-check option
This new option allows multiple objects to be specified on stdin. For each
object specified, a line of the following form is printed:

<sha1> SP <type> SP <size> LF

If the object does not exist in the repository, a line of the following form is
printed:

<object> SP missing LF

Signed-off-by: Adam Roben <aroben@apple.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-05 21:17:32 -07:00
Adam Roben
b335d3f121 Add tests for git cat-file
Signed-off-by: Adam Roben <aroben@apple.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-05-05 20:55:52 -07:00