Commit Graph

12160 Commits

Author SHA1 Message Date
Linus Torvalds
17559a643e Do exact rename detection regardless of rename limits
Now that the exact rename detection is linear-time (with a very small
constant factor to boot), there is no longer any reason to limit it by
the number of files involved.

In some trivial testing, I created a repository with a directory that
had a hundred thousand files in it (all with different contents), and
then moved that directory to show the effects of renaming 100,000 files.

With the new code, that resulted in

	[torvalds@woody big-rename]$ time ~/git/git show -C | wc -l
	400006

	real    0m2.071s
	user    0m1.520s
	sys     0m0.576s

ie the code can correctly detect the hundred thousand renames in about 2
seconds (the number "400006" comes from four lines for each rename:

	diff --git a/really-big-dir/file-1-1-1-1-1 b/moved-big-dir/file-1-1-1-1-1
	similarity index 100%
	rename from really-big-dir/file-1-1-1-1-1
	rename to moved-big-dir/file-1-1-1-1-1

and the extra six lines is from a one-liner commit message and all the
commit information and spacing).

Most of those two seconds weren't even really the rename detection, it's
really all the other stuff needed to get there.

With the old code, this wouldn't have been practically possible.  Doing
a pairwise check of the ten billion possible pairs would have been
prohibitively expensive.  In fact, even with the rename limiter in
place, the old code would waste a lot of time just on the diff_filespec
checks, and despite not even trying to find renames, it used to look
like:

	[torvalds@woody big-rename]$ time git show -C | wc -l
	1400006

	real    0m12.337s
	user    0m12.285s
	sys     0m0.192s

ie we used to take 12 seconds for this load and not even do any rename
detection! (The number 1400006 comes from fourteen lines per file moved:
seven lines each for the delete and the create of a one-liner file, and
the same extra six lines of commit information).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
9027f53cb5 Do linear-time/space rename logic for exact renames
This implements a smarter rename detector for exact renames, which
rather than doing a pairwise comparison (time O(m*n)) will just hash the
files into a hash-table (size O(n+m)), and only do pairwise comparisons
to renames that have the same hash (time O(n+m) except for unrealistic
hash collissions, which we just cull aggressively).

Admittedly the exact rename case is not nearly as interesting as the
generic case, but it's an important case none-the-less. A similar general
approach should work for the generic case too, but even then you do need
to handle the exact renames/copies separately (to avoid the inevitable
added cost factor that comes from the _size_ of the file), so this is
worth doing.

In the expectation that we will indeed do the same hashing trick for the
general rename case, this code uses a generic hash-table implementation
that can be used for other things too.  In fact, we might be able to
consolidate some of our existing hash tables with the new generic code
in hash.[ch].

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
644797119d copy vs rename detection: avoid unnecessary O(n*m) loops
The core rename detection had some rather stupid code to check if a
pathname was used by a later modification or rename, which basically
walked the whole pathname space for all renames for each rename, in
order to tell whether it was a pure rename (no remaining users) or
should be considered a copy (other users of the source file remaining).

That's really silly, since we can just keep a count of users around, and
replace all those complex and expensive loops with just testing that
simple counter (but this all depends on the previous commit that shared
the diff_filespec data structure by using a separate reference count).

Note that the reference count is not the same as the rename count: they
behave otherwise rather similarly, but the reference count is tied to
the allocation (and decremented at de-allocation, so that when it turns
zero we can get rid of the memory), while the rename count is tied to
the renames and is decremented when we find a rename (so that when it
turns zero we know that it was a rename, not a copy).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:06 -07:00
Linus Torvalds
9fb88419ba Ref-count the filespecs used by diffcore
Rather than copy the filespecs when introducing new versions of them
(for rename or copy detection), use a refcount and increment the count
when reusing the diff_filespec.

This avoids unnecessary allocations, but the real reason behind this is
a future enhancement: we will want to track shared data across the
copy/rename detection.  In order to efficiently notice when a filespec
is used by a rename, the rename machinery wants to keep track of a
rename usage count which is shared across all different users of the
filespec.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Linus Torvalds
cb1491b6bf Split out "exact content match" phase of rename detection
This makes the exact content match a separate function of its own.
Partly to cut down a bit on the size of the diffcore_rename() function
(which is too complex as it is), and partly because there are smarter
ways to do this than an O(m*n) loop over it all, and that function
should be rewritten to take that into account.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Linus Torvalds
505f297989 Add 'diffcore.h' to LIB_H
The diffcore.h header file is included by more than just the internal
diff generation files, and needs to be part of the proper dependencies.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:18:05 -07:00
Junio C Hamano
d633f702a0 Merge branch 'maint'
* maint:
  Fix generation of perl/perl.mak
  git-remote: fix "Use of uninitialized value in string ne"
2007-10-26 23:17:23 -07:00
Christian Couder
15387e32ff Test suite: reset TERM to its previous value after testing.
Using konsole, I get no colored output at the end of "t7005-editor.sh"
without this patch.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:17:19 -07:00
Junio C Hamano
dc2715554e Merge branch 'ph/color-test'
* ph/color-test:
  Support a --quiet option in the test-suite.
  Add some fancy colors in the test library when terminal supports it.
2007-10-26 23:17:14 -07:00
Jim Meyering
4a21d13db4 hooks-pre-commit: use \t, rather than a literal TAB in regexp
Signed-off-by: Jim Meyering <meyering@redhat.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 23:16:51 -07:00
Alex Riesen
d1a2057560 Fix generation of perl/perl.mak
The code generating perl/Makefile from Makefile.PL was causing trouble
because it didn't considered NO_PERL_MAKEMAKER and ran makemaker
unconditionally, rewriting perl.mak. Makemaker is FUBAR in ActiveState Perl,
and perl/Makefile has a replacement for it.

Besides, a changed Git.pm is *NOT* a reason to rebuild all the perl scripts,
so remove the dependency too.

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 16:44:45 -07:00
Pierre Habouzit
c2e6b6d0d1 fast-import.c: fix regression due to strbuf conversion
Without this strbuf_detach(), it yields a double free later, the
command is in fact stashed, and this is not a memory leak.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-26 15:28:09 -07:00
Pierre Habouzit
1ece127467 Support a --quiet option in the test-suite.
This shuts down the "*  ok ##: `test description`" messages.

Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:44:14 -07:00
Pierre Habouzit
55db1df0c8 Add some fancy colors in the test library when terminal supports it.
Signed-off-by: Pierre Habouzit <madcoder@debian.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:44:14 -07:00
David Symonds
d3cd249565 gitweb: Use chop_and_escape_str in more places.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:07:05 -07:00
David Symonds
ce58ec9158 gitweb: Refactor abbreviation-with-title-attribute code.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 22:06:57 -07:00
Junio C Hamano
d90a7fda35 Merge branch 'db/fetch-pack'
* db/fetch-pack: (60 commits)
  Define compat version of mkdtemp for systems lacking it
  Avoid scary errors about tagged trees/blobs during git-fetch
  fetch: if not fetching from default remote, ignore default merge
  Support 'push --dry-run' for http transport
  Support 'push --dry-run' for rsync transport
  Fix 'push --all branch...' error handling
  Fix compilation when NO_CURL is defined
  Added a test for fetching remote tags when there is not tags.
  Fix a crash in ls-remote when refspec expands into nothing
  Remove duplicate ref matches in fetch
  Restore default verbosity for http fetches.
  fetch/push: readd rsync support
  Introduce remove_dir_recursively()
  bundle transport: fix an alloc_ref() call
  Allow abbreviations in the first refspec to be merged
  Prevent send-pack from segfaulting when a branch doesn't match
  Cleanup unnecessary break in remote.c
  Cleanup style nit of 'x == NULL' in remote.c
  Fix memory leaks when disconnecting transport instances
  Ensure builtin-fetch honors {fetch,transfer}.unpackLimit
  ...
2007-10-24 21:59:50 -07:00
Miklos Vajna
2db9b49c6c git-send-email: add a new sendemail.to configuration variable
Some projects prefer to receive patches via a given email address.
In these cases, it's handy to configure that address once.

Signed-off-by: Miklos Vajna <vmiklos@frugalware.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 20:13:07 -07:00
Junio C Hamano
59b2023fbb git-remote: fix "Use of uninitialized value in string ne"
martin f krafft <madduck@madduck.net> writes:

> piper:~> git remote show origin
> * remote origin
>   URL: ssh://git.madduck.net/~/git/etc/mailplate.git
> Use of uninitialized value in string ne at /usr/local/stow/git/bin/git-remote line 248.

This is because there might not be branch.<name>.remote defined but
the code unconditionally dereferences $branch->{$name}{'REMOTE'} and
compares with another string.

Tested-by: Martin F Krafft <madduck@madduck.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-24 18:47:50 -07:00
Paul Mackerras
74a40c7110 gitk: Fix a couple more bugs in the path limiting
First, paths ending in a slash were not matching anything.  This fixes
path_filter to handle paths ending in a slash (such entries have to
match a directory, and can't match a file, e.g., foo/bar/ can't match
a plain file called foo/bar).

Secondly, clicking in the file list pane (bottom right) was broken
because $treediffs($ids) contained all the files modified by the
commit, not just those within the file list.  This fixes that too.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-24 10:16:56 +10:00
Junio C Hamano
8d863c98b2 k.org git toppage: Add link to 1.5.3 release notes.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-10-23 12:10:55 -07:00
Paul Mackerras
3de07118f0 Merge branch 'master' into dev 2007-10-23 22:40:50 +10:00
Paul Mackerras
bd8f677e1c gitk: Fix some bugs with path limiting in the diff display
First, we weren't putting "--" between the ids and the paths in the
git diff-tree/diff-index/diff-files command, so if there was a tag
and a file with the same name, we could get an ambiguity in the
command.  This puts the "--" in to make it clear that the paths are
paths.

Secondly, this implements the path limiting for merge diffs as well
as the normal 2-way diffs.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 22:37:23 +10:00
Paul Mackerras
4570b7e9d7 gitk: Use the status window for other functions
This sets the status window when reading commits, searching through
commits, cherry-picking or checking out a head.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 21:19:06 +10:00
Paul Mackerras
a137a90f49 gitk: Integrate the reset progress bar in the main frame
This makes the reset function use a progress bar in the same location
as the progress bars for reading in commits and for finding commits,
instead of a progress bar in a separate detached window.  The progress
bar for resetting is red.

This also puts "Resetting" in the status window while the reset is in
progress.  The setting of the status window is done through an
extension of the interface used for setting the watch cursor.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 21:12:49 +10:00
Paul Mackerras
94503918e4 gitk: Ensure tabstop setting gets restored by Cancel button
We weren't restoring the tabstop setting if the user pressed the
Cancel button in the Edit/Preferences window.  Also improved the
label for the checkbox (made it "Tab spacing" rather than the laconic
"tabstop") and moved it above the "Display nearby tags" checkbox.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 10:33:38 +10:00
Paul Mackerras
7a39a17a87 gitk: Limit diff display to listed paths by default
When the user has specified a list of paths, either on the command line
or when creating a view, gitk currently displays the diffs for all files
that a commit has modified, not just the ones that match the path list.
This is different from other git commands such as git log.  This change
makes gitk behave the same as these other git commands by default, that
is, gitk only displays the diffs for files that match the path list.

There is now a checkbox labelled "Limit diffs to listed paths" in the
Edit/Preferences pane.  If that is unchecked, gitk will display the
diffs for all files as before.

When gitk is run with the --merge flag, it will get the list of unmerged
files at startup, intersect that with the paths listed on the command line
(if any), and use that as the list of paths.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2007-10-23 10:15:11 +10:00
Jari Aalto
b5d21a4b68 On error, do not list all commands, but point to --help option
- Remove out call to list_common_cmds_help()
- Send error message to stderr, not stdout.

Signed-off-by: Jari Aalto <jari.aalto@cante.net>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:57:50 -04:00
David Symonds
e076a0e71f gitweb: Provide title attributes for abbreviated author names.
Signed-off-by: David Symonds <dsymonds@gmail.com>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:54:12 -04:00
Ralf Wildenhues
dd8175f83c git-cherry-pick: improve description of -x.
Reword the first sentence of the description of -x, in order to
make it easier to read and understand.

Signed-off-by: Ralf Wildenhues <Ralf.Wildenhues@gmx.de>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 01:38:19 -04:00
René Scharfe
c32f749fec Correct some sizeof(size_t) != sizeof(unsigned long) typing errors
Fix size_t vs. unsigned long pointer mismatch warnings introduced
with the addition of strbuf_detach().

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-22 00:00:40 -04:00
Shawn O. Pearce
5be507fc95 Use PRIuMAX instead of 'unsigned long long' in show-index
Elsewhere in Git we already use PRIuMAX and cast to uintmax_t when
we need to display a value that is 'very big' and we're not exactly
sure what the largest display size is for this platform.

This particular fix is needed so we can do the incredibly crazy
temporary hack of:

    diff --git a/cache.h b/cache.h
    index e0abcd6..6637fd8 100644
    --- a/cache.h
    +++ b/cache.h
    @@ -6,6 +6,7 @@

     #include SHA1_HEADER
     #include <zlib.h>
    +#define long long long

     #if ZLIB_VERNUM < 0x1200
     #define deflateBound(c,s)  ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)

allowing us to more easily look for locations where we are passing
a pointer to an 8 byte value to a function that expects a 4 byte
value.  This can occur on some platforms where sizeof(long) == 8
and sizeof(size_t) == 4.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 02:16:57 -04:00
Shawn O. Pearce
8a37e21dab Merge branch 'maint'
* maint:
  Describe more 1.5.3.5 fixes in release notes
  Fix diffcore-break total breakage
  Fix directory scanner to correctly ignore files without d_type
  Improve receive-pack error message about funny ref creation
  fast-import: Fix argument order to die in file_change_m
  git-gui: Don't display CR within console windows
  git-gui: Handle progress bars from newer gits
  git-gui: Correctly report failures from git-write-tree
  gitk.txt: Fix markup.
  send-pack: respect '+' on wildcard refspecs
  git-gui: accept versions containing text annotations, like 1.5.3.mingw.1
  git-gui: Don't crash when starting gitk from a browser session
  git-gui: Allow gitk to be started on Cygwin with native Tcl/Tk
  git-gui: Ensure .git/info/exclude is honored in Cygwin workdirs
  git-gui: Handle starting on mapped shares under Cygwin
  git-gui: Display message box when we cannot find git in $PATH
  git-gui: Avoid using bold text in entire gui for some fonts
2007-10-21 02:11:45 -04:00
Shawn O. Pearce
2ee52eb17c Describe more 1.5.3.5 fixes in release notes
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 02:04:02 -04:00
Linus Torvalds
6dd4b66fde Fix diffcore-break total breakage
Ok, so on the kernel list, some people noticed that "git log --follow"
doesn't work too well with some files in the x86 merge, because a lot of
files got renamed in very special ways.

In particular, there was a pattern of doing single commits with renames
that looked basically like

 - rename "filename.h" -> "filename_64.h"
 - create new "filename.c" that includes "filename_32.h" or
   "filename_64.h" depending on whether we're 32-bit or 64-bit.

which was preparatory for smushing the two trees together.

Now, there's two issues here:

 - "filename.c" *remained*. Yes, it was a rename, but there was a new file
   created with the old name in the same commit. This was important,
   because we wanted each commit to compile properly, so that it was
   bisectable, so splitting the rename into one commit and the "create
   helper file" into another was *not* an option.

   So we need to break associations where the contents change too much.
   Fine. We have the -B flag for that. When we break things up, then the
   rename detection will be able to figure out whether there are better
   alternatives.

 - "git log --follow" didn't with with -B.

Now, the second case was really simple: we use a different "diffopt"
structure for the rename detection than the basic one (which we use for
showing the diffs). So that second case is trivially fixed by a trivial
one-liner that just copies the break_opt values from the "real" diffopts
to the one used for rename following. So now "git log -B --follow" works
fine:

	diff --git a/tree-diff.c b/tree-diff.c
	index 26bdbdd..7c261fd 100644
	--- a/tree-diff.c
	+++ b/tree-diff.c
	@@ -319,6 +319,7 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
	 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
	 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
	 	diff_opts.single_follow = opt->paths[0];
	+	diff_opts.break_opt = opt->break_opt;
	 	paths[0] = NULL;
	 	diff_tree_setup_paths(paths, &diff_opts);
	 	if (diff_setup_done(&diff_opts) < 0)

however, the end result does *not* work. Because our diffcore-break.c
logic is totally bogus!

In particular:

 - it used to do

	if (base_size < MINIMUM_BREAK_SIZE)
		return 0; /* we do not break too small filepair */

   which basically says "don't bother to break small files". But that
   "base_size" is the *smaller* of the two sizes, which means that if some
   large file was rewritten into one that just includes another file, we
   would look at the (small) result, and decide that it's smaller than the
   break size, so it cannot be worth it to break it up! Even if the other
   side was ten times bigger and looked *nothing* like the samell file!

   That's clearly bogus. I replaced "base_size" with "max_size", so that
   we compare the *bigger* of the filepair with the break size.

 - It calculated a "merge_score", which was the score needed to merge it
   back together if nothing else wanted it. But even if it was *so*
   different that we would never want to merge it back, we wouldn't
   consider it a break! That makes no sense. So I added

	if (*merge_score_p > break_score)
		return 1;

   to make it clear that if we wouldn't want to merge it at the end, it
   was *definitely* a break.

 - It compared the whole "extent of damage", counting all inserts and
   deletes, but it based this score on the "base_size", and generated the
   damage score with

	delta_size = src_removed + literal_added;
	damage_score = delta_size * MAX_SCORE / base_size;

   but that makes no sense either, since quite often, this will result in
   a number that is *bigger* than MAX_SCORE! Why? Because base_size is
   (again) the smaller of the two files we compare, and when you start out
   from a small file and add a lot (or start out from a large file and
   remove a lot), the base_size is going to be much smaller than the
   damage!

   Again, the fix was to replace "base_size" with "max_size", at which
   point the damage actually becomes a sane percentage of the whole.

With these changes in place, not only does "git log -B --follow" work for
the case that triggered this in the first place, ie now

	git log -B --follow arch/x86/kernel/vmlinux_64.lds.S

actually gives reasonable results. But I also wanted to verify it in
general, by doing a full-history

	git log --stat -B -C

on my kernel tree with the old code and the new code.

There's some tweaking to be done, but generally, the new code generates
much better results wrt breaking up files (and then finding better rename
candidates). Here's a few examples of the "--stat" output:

 - This:
	include/asm-x86/Kbuild        |    2 -
	include/asm-x86/debugreg.h    |   79 +++++++++++++++++++++++++++++++++++------
	include/asm-x86/debugreg_32.h |   64 ---------------------------------
	include/asm-x86/debugreg_64.h |   65 ---------------------------------
	4 files changed, 68 insertions(+), 142 deletions(-)

      Becomes:

	include/asm-x86/Kbuild                        |    2 -
	include/asm-x86/{debugreg_64.h => debugreg.h} |    9 +++-
	include/asm-x86/debugreg_32.h                 |   64 -------------------------
	3 files changed, 7 insertions(+), 68 deletions(-)

 - This:
	include/asm-x86/bug.h    |   41 +++++++++++++++++++++++++++++++++++++++--
	include/asm-x86/bug_32.h |   37 -------------------------------------
	include/asm-x86/bug_64.h |   34 ----------------------------------
	3 files changed, 39 insertions(+), 73 deletions(-)

      Becomes

	include/asm-x86/{bug_64.h => bug.h} |   20 +++++++++++++-----
	include/asm-x86/bug_32.h            |   37 -----------------------------------
	2 files changed, 14 insertions(+), 43 deletions(-)

Now, in some other cases, it does actually turn a rename into a real
"delete+create" pair, and then the diff is usually bigger, so truth in
advertizing: it doesn't always generate a nicer diff. But for what -B was
meant for, I think this is a big improvement, and I suspect those cases
where it generates a bigger diff are tweakable.

So I think this diff fixes a real bug, but we might still want to tweak
the default values and perhaps the exact rules for when a break happens.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:59:42 -04:00
Linus Torvalds
07134421fc Fix directory scanner to correctly ignore files without d_type
On Fri, 19 Oct 2007, Todd T. Fries wrote:
> If DT_UNKNOWN exists, then we have to do a stat() of some form to
> find out the right type.

That happened in the case of a pathname that was ignored, and we did
not ask for "dir->show_ignored". That test used to be *together*
with the "DTYPE(de) != DT_DIR", but splitting the two tests up
means that we can do that (common) test before we even bother to
calculate the real dtype.

Of course, that optimization only matters for systems that don't
have, or don't fill in DTYPE properly.

I also clarified the real relationship between "exclude" and
"dir->show_ignored". It used to do

	if (exclude != dir->show_ignored) {
		..

which wasn't exactly obvious, because it triggers for two different
cases:

 - the path is marked excluded, but we are not interested in ignored
   files: ignore it

 - the path is *not* excluded, but we *are* interested in ignored
   files: ignore it unless it's a directory, in which case we might
   have ignored files inside the directory and need to recurse
   into it).

so this splits them into those two cases, since the first case
doesn't even care about the type.

I also made a the DT_UNKNOWN case a separate helper function,
and added some commentary to the cases.

		Linus

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:44:40 -04:00
Shawn O. Pearce
538dfe7397 Improved const correctness for strings
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:43:27 -04:00
Johannes Sixt
546bb58232 Use the asyncronous function infrastructure to run the content filter.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:42 -04:00
Johannes Sixt
7683b6e81f Avoid a dup2(2) in apply_filter() - start_command() can do it for us.
When apply_filter() runs the external (clean or smudge) filter program, it
needs to pass the writable end of a pipe as its stdout. For this purpose,
it used to dup2(2) the file descriptor explicitly to stdout. Now we use
the facilities of start_command() to do it for us.

Furthermore, the path argument of a subordinate function, filter_buffer(),
was not used, so here we replace it to pass the fd instead.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:42 -04:00
Johannes Sixt
a0ae35ae2d t0021-conversion.sh: Test that the clean filter really cleans content.
This test uses a rot13 filter, which is its own inverse. It tested only
that the content was the same as the original after both the 'clean' and
the 'smudge' filter were applied. This way it would not detect whether
any filter was run at all. Hence, here we add another test that checks
that the repository contained content that was processed by the 'clean'
filter.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:42 -04:00
Johannes Sixt
21edd3f197 upload-pack: Run rev-list in an asynchronous function.
This gets rid of an explicit fork().

Since upload-pack has to coordinate two processes (rev-list and
pack-objects), we cannot use the normal finish_async(), but have to monitor
the process explicitly. Hence, there are no changes at this front.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:42 -04:00
Johannes Sixt
80ccaa78a8 upload-pack: Move the revision walker into a separate function.
This allows us later to use start_async() with this function, and at
the same time is a nice cleanup that makes a long function
(create_pack_file()) shorter.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:41 -04:00
Johannes Sixt
088fab5fc4 Use the asyncronous function infrastructure in builtin-fetch-pack.c.
We run the sideband demultiplexer in an asynchronous function.

Note that earlier there was a check in the child process that closed
xd[1] only if it was different from xd[0]; this test is no longer needed
because git_connect() always returns two different file descriptors
(see ec587fde0a).

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:41 -04:00
Johannes Sixt
2d22c20830 Add infrastructure to run a function asynchronously.
This adds start_async() and finish_async(), which runs a function
asynchronously. Communication with the caller happens only via pipes.
For this reason, this implementation forks off a child process that runs
the function.

[sp: Style nit fixed by removing unnecessary block on if condition
     inside of start_async()]

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:41 -04:00
Johannes Sixt
cc41fa8da9 upload-pack: Use start_command() to run pack-objects in create_pack_file().
This gets rid of an explicit fork/exec.

Since upload-pack has to coordinate two processes (rev-list and
pack-objects), we cannot use the normal finish_command(), but have to
monitor the processes explicitly. Hence, the waitpid() call remains.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:40 -04:00
Johannes Sixt
f3b33f1d22 Have start_command() create a pipe to read the stderr of the child.
This adds another stanza that allocates a pipe that is connected to the
child's stderr and that the caller can read from. In order to request this
pipe, the caller sets cmd->err to -1.

The implementation is not exactly modeled after the stdout case: For stdout
the caller can supply an existing file descriptor, but this facility is
nowhere needed in the stderr case. Additionally, the caller is required to
close cmd->err.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:40 -04:00
Johannes Sixt
477822c35d Use start_comand() in builtin-fetch-pack.c instead of explicit fork/exec.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:40 -04:00
Johannes Sixt
d5535ec75c Use run_command() to spawn external diff programs instead of fork/exec.
Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:40 -04:00
Johannes Sixt
dc1bfdcd1a Use start_command() to run content filters instead of explicit fork/exec.
The previous code already used finish_command() to wait for the process
to terminate, but did not use start_command() to run it.

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:39 -04:00
Johannes Sixt
f364cb8823 Use start_command() in git_connect() instead of explicit fork/exec.
The child process handling is delegated to start_command() and
finish_command().

Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
2007-10-21 01:30:39 -04:00