Commit Graph

31 Commits

Author SHA1 Message Date
Linus Torvalds
1974632c66 Remove TYPE_* constant macros and use object_type enums consistently.
This updates the type-enumeration constants introduced to reduce
the memory footprint of "struct object" to match the type bits
already used in the packfile format, by removing the former
(i.e. TYPE_* constant macros) and using the latter (i.e. enum
object_type) throughout the code for consistency.

Eventually we can stop passing around the "type strings"
entirely, and this will help - no confusion about two different
integer enumeration.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-12 23:18:03 -07:00
Linus Torvalds
885a86abe2 Shrink "struct object" a bit
This shrinks "struct object" by a small amount, by getting rid of the
"struct type *" pointer and replacing it with a 3-bit bitfield instead.

In addition, we merge the bitfields and the "flags" field, which
incidentally should also remove a useless 4-byte padding from the object
when in 64-bit mode.

Now, our "struct object" is still too damn large, but it's now less
obviously bloated, and of the remaining fields, only the "util" (which is
not used by most things) is clearly something that should be eventually
discarded.

This shrinks the "git-rev-list --all" memory use by about 2.5% on the
kernel archive (and, perhaps more importantly, on the larger mozilla
archive). That may not sound like much, but I suspect it's more on a
64-bit platform.

There are other remaining inefficiencies (the parent lists, for example,
probably have horrible malloc overhead), but this was pretty obvious.

Most of the patch is just changing the comparison of the "type" pointer
from one of the constant string pointers to the appropriate new TYPE_xxx
small integer constant.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-17 18:49:18 -07:00
Junio C Hamano
e5f38ec3c5 ref-log: style fixes.
A few style fixes to get the code in line with the rest.

 - asterisk to make a type a pointer to something goes in front
   of the variable, not at the end of the base type.
   E.g. a pointer to an integer is "int *ip", not "int* ip".

 - open parenthesis for function parameter list, unlike
   syntactic constructs, comes immediately after the function
   name.  E.g. "if (foo) bar();" not "if(foo) bar ();".

 - "else" does not come on the same line as the closing brace of
   corresponding "if".

The style is mostly a matter of personal taste, and people may
disagree, but consistency is important.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-06 14:30:58 -07:00
Junio C Hamano
16a4c6ee0d Merge branch 'lt/tree-2'
* lt/tree-2:
  fetch.c: do not call process_tree() from process_tree().
  tree_entry(): new tree-walking helper function
  adjust to the rebased series by Linus.
  Remove "tree->entries" tree-entry list from tree parser
  Switch "read_tree_recursive()" over to tree-walk functionality
  Make "tree_entry" have a SHA1 instead of a union of object pointers
  Add raw tree buffer info to "struct tree"
  Remove last vestiges of generic tree_entry_list
  Convert fetch.c: process_tree() to raw tree walker
  Convert "mark_tree_uninteresting()" to raw tree walker
  Remove unused "zeropad" entry from tree_list_entry
  fsck-objects: avoid unnecessary tree_entry_list usage
  Remove "tree->entries" tree-entry list from tree parser
  builtin-read-tree.c: avoid tree_entry_list in prime_cache_tree_rec()
  Switch "read_tree_recursive()" over to tree-walk functionality
  Make "tree_entry" have a SHA1 instead of a union of object pointers
  Make "struct tree" contain the pointer to the tree buffer
2006-06-03 23:59:27 -07:00
Junio C Hamano
6f9012b625 fetch.c: do not call process_tree() from process_tree().
This function reads a freshly fetched tree object, and schedules
the objects pointed by it for further fetching, so doing
lookup_tree() and process_tree() recursively from there does not
make much sense.  We need to use process() on it to make sure we
fetch it first, and leave the recursive processing to later
stages.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-02 15:23:47 -07:00
Junio C Hamano
99bd0f5558 fetch.c: do not pass uninitialized lock to unlock_ref().
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-31 15:23:44 -07:00
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Linus Torvalds
1bc995a392 Convert fetch.c: process_tree() to raw tree walker
This leaves only the horrid code in builtin-read-tree.c using the old
interface. Some day I will gather the strength to tackle that one too.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:08:33 -07:00
Linus Torvalds
2d9c58c69d Remove "tree->entries" tree-entry list from tree parser
Instead, just use the tree buffer directly, and use the tree-walk
infrastructure to walk the buffers instead of the tree-entry list.

The tree-entry list is inefficient, and generates tons of small
allocations for no good reason. The tree-walk infrastructure is
generally no harder to use than following a linked list, and allows
us to do most tree parsing in-place.

Some programs still use the old tree-entry lists, and are a bit
painful to convert without major surgery. For them we have a helper
function that creates a temporary tree-entry list on demand.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-29 19:06:59 -07:00
Junio C Hamano
a5c8a98ca7 Merge branch 'master' into sp/reflog
* master: (90 commits)
  fetch.c: remove an unused variable and dead code.
  Clean up sha1 file writing
  Builtin git-cat-file
  builtin format-patch: squelch content-type for 7-bit ASCII
  CMIT_FMT_EMAIL: Q-encode Subject: and display-name part of From: fields.
  add more informative error messages to git-mktag
  remove the artificial restriction tagsize < 8kb
  git-rebase: use canonical A..B syntax to format-patch
  git-format-patch: now built-in.
  fmt-patch: Support --attach
  fmt-patch: understand old <his> notation
  Teach fmt-patch about --keep-subject
  Teach fmt-patch about --numbered
  fmt-patch: implement -o <dir>
  fmt-patch: output file names to stdout
  Teach fmt-patch to write individual files.
  built-in tar-tree and remote tar-tree
  Builtin git-diff-files, git-diff-index, git-diff-stages, and git-diff-tree.
  Builtin git-show-branch.
  Builtin git-apply.
  ...
2006-05-24 16:49:24 -07:00
Junio C Hamano
84c667ff97 fetch.c: remove an unused variable and dead code.
Funnily enough, this variable was never assigned ever since it
was introduced, and has been protecting some code that has never
been executed.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-24 16:42:38 -07:00
Shawn Pearce
d0740d92be Log ref updates made by fetch.
If a ref is changed by http-fetch, local-fetch or ssh-fetch
record the change and the remote URL/name in the log for the ref.
This requires loading the config file to check logAllRefUpdates.

Also fixed a bug in the ref lock generation; the log file name was
not being produced right due to a bad prefix length.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-19 15:02:52 -07:00
Shawn Pearce
4bd18c43d9 Improve abstraction of ref lock/write.
Created 'struct ref_lock' to contain the data necessary to perform
a ref update.  This change improves writing a ref as the file names
are generated only once (rather than twice) and supports following
symrefs (up to the maximum depth).  Further the ref_lock structure
provides room to extend the update API with ref logging.

Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-17 17:36:36 -07:00
Nick Hengeveld
11f0dafe2b [PATCH] Don't fetch objects that exist in the local repository
Be sure not to fetch objects that already exist in the local repository.
The main process loop no longer performs this check, http-fetch now checks
prior to starting a new request queue entry and when fetch_object() is called,
and local-fetch now checks when fetch_object() is called.

As discussed in this thread: http://marc.theaimsgroup.com/?t=112854890500001

Signed-off-by: Nick Hengeveld <nickh@reactrix.com>
2005-10-10 23:22:01 -07:00
Daniel Barkalow
820eca68c2 [PATCH] Implement --recover for git-*-fetch
With the --recover option, we verify that we have absolutely
everything reachable from the target, not assuming that things
reachable from refs will be complete.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-27 00:16:40 -07:00
Sergey Vlasov
d35bbe0b2e [PATCH] fetch.c: Plug memory leak in process_tree()
When freeing a tree entry, must free its name too.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-23 14:30:45 -07:00
Sergey Vlasov
a95cb6fb6b [PATCH] fetch.c: Do not build object ref lists
The fetch code does not need object ref lists; by disabling them we
can save some time and memory.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-23 14:30:42 -07:00
Sergey Vlasov
2c08b36383 [PATCH] fetch.c: Remove call to parse_object() from process()
The call to parse_object() in process() is not actually needed - if
the object type is unknown, parse_object() will be called by loop();
if the type is known, the object will be parsed by the appropriate
process_*() function.

After this change blobs which exist locally are no longer parsed,
which gives about 2x CPU usage improvement; the downside is that there
will be no warnings for existing corrupted blobs, but detecting such
corruption is the job of git-fsck-objects, not the fetch programs.
Newly fetched objects are still checked for corruption in http-fetch.c
and ssh-fetch.c (local-fetch.c does not seem to do it, but the removed
parse_object() call would not be reached for new objects anyway).

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:12 -07:00
Sergey Vlasov
24451c3103 [PATCH] fetch.c: Clean up object flag definitions
Remove holes left after deleting flags, and use shifts to emphasize
that flags are single bits.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
2449696bcd [PATCH] fetch.c: Remove redundant test of TO_SCAN in process()
If the SEEN flag was not set, the TO_SCAN flag cannot be set,
therefore testing it is pointless.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
7b64d06b2e [PATCH] fetch.c: Remove some duplicated code in process()
It does not matter if we call prefetch() or set the TO_SCAN flag before
or after adding the object to process_queue.  However, doing it before
object_list_insert() allows us to kill 3 lines of duplicated code.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
51d8faf860 [PATCH] fetch.c: Remove redundant TO_FETCH flag
The TO_FETCH flag also became redundant after adding the SEEN flag -
it was set and checked in process() to prevent adding the same object
to process_queue multiple times, but now SEEN guards against this.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
754ac00e71 [PATCH] fetch.c: Remove redundant SCANNED flag
After adding the SEEN flag, the SCANNED flag became obviously
redundant - each object can get into process_queue through process()
only once, and therefore multiple calls to process_object() for the
same object are not possible.

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
a82d07e5e6 [PATCH] fetch.c: Make process() look at each object only once
The process() function is very often called multiple times for the
same object (because lots of trees refer to the same blobs), but did
not have a fast check for this, therefore a lot of useless calls to
has_sha1_file() and parse_object() were made before discovering that
nothing needs to be done.

This patch adds the SEEN flag which is used in process() to make it
look at each object only once.  When testing git-local-fetch on the
repository of GIT, this gives a 14x improvement in CPU usage (mainly
because the redundant calls to parse_object() are now avoided -
parse_object() always unpacks and parses the object data, even if it
was already parsed before).

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:11 -07:00
Sergey Vlasov
80077f0716 [PATCH] fetch.c: Remove useless lookup_object_type() call in process()
In all places where process() is called except the one in pull() (which
is executed only once) the pointer to the object is already available,
so pass it as the argument to process() instead of sha1 and avoid an
unneeded call to lookup_object_type().

Signed-off-by: Sergey Vlasov <vsu@altlinux.ru>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-22 21:52:10 -07:00
Junio C Hamano
029f6de377 fetch() assumes we do not have the object.
Bugfix for the previous one.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-18 14:11:53 -07:00
Junio C Hamano
85d106c267 Improve the safety check used in fetch.c
The recent safety check to trust only the commits we have made
things impossibly slow and turn out to waste a lot of memory.

This commit fixes it with the following improvements:

 - mark already scanned objects and avoid rescanning the same
   object again;

 - free the tree entries when we have scanned the tree entries;
   this is the same as b0d8923ec0
   which reduced memory usage by rev-list;

 - plug memory leak from the object_list dequeuing code;

 - use the process_queue not just for fetching but for scanning,
   to make things tail recursive to avoid deep recursion; the
   deep recursion was especially prominent when we cloned a big
   pack.

 - avoid has_sha1_file() call when we already know we do not have
   that object.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-18 01:01:07 -07:00
Junio C Hamano
d0ac30f20c [PATCH] fetch.c: cleanups
Clean-ups suggested by Sergey Vlasov and acked by Daniel Barkalow.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-16 15:16:45 -07:00
Junio C Hamano
98533b90cb Avoid wasting memory while keeping track of what we have during fetch.
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15 15:06:39 -07:00
Daniel Barkalow
22c6e1d0f7 [PATCH] Fix fetch completeness assumptions
Don't assume that any commit we have is complete; assume that any ref
we have is complete.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-15 13:19:29 -07:00
Junio C Hamano
215a7ad1ef Big tool rename.
As promised, this is the "big tool rename" patch.  The primary differences
since 0.99.6 are:

  (1) git-*-script are no more.  The commands installed do not
      have any such suffix so users do not have to remember if
      something is implemented as a shell script or not.

  (2) Many command names with 'cache' in them are renamed with
      'index' if that is what they mean.

There are backward compatibility symblic links so that you and
Porcelains can keep using the old names, but the backward
compatibility support  is expected to be removed in the near
future.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2005-09-07 17:45:20 -07:00