Commit Graph

252 Commits

Author SHA1 Message Date
Nicolas Pitre
07cf0f2407 make --max-pack-size argument to 'git pack-object' count in bytes
The value passed to --max-pack-size used to count in MiB which was
inconsistent with the corresponding configuration variable as well as
other command arguments which are defined to count in bytes with an
optional unit suffix.  This brings --max-pack-size in line with the
rest of Git.

Also, in order not to cause havoc with people used to the previous
megabyte scale, and because this is a sane thing to do anyway, a
minimum size of 1 MiB is enforced to avoid an explosion of pack files.

Adjust and extend test suite accordingly.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-03 20:39:56 -08:00
Nicolas Pitre
a2430dde8c pack-objects: fix pack generation when using pack_size_limit
Current handling of pack_size_limit is quite suboptimal.  Let's consider
a list of objects to pack which contain alternatively big and small
objects (which pretty matches reality when big blobs are interlaced
with tree objects).  Currently, the code simply close the pack and opens
a new one when the next object in line breaks the size limit.

The current code may degenerate into:

  - small tree object => store into pack #1
  - big blob object busting the pack size limit => store into pack #2
  - small blob but pack #2 is over the limit already => pack #3
  - big blob busting the size limit => pack #4
  - small tree but pack #4 is over the limit => pack #5
  - big blob => pack #6
  - small tree => pack #7
  - ... and so on.

The reality is that the content of packs 1, 3, 5 and 7 could well be
stored more efficiently (and delta compressed) together in pack #1 if
the big blobs were not forcing an immediate transition to a new pack.

Incidentally this can be fixed pretty easily by simply skipping over
those objects that are too big to fit in the current pack while trying
the whole list of unwritten objects, and then that list considered from
the beginning again when a new pack is opened.  This creates much fewer
smallish pack files and help making more predictable test cases for the
test suite.

This change made one of the self sanity checks useless so it is removed
as well. That check was rather redundant already anyway.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-03 20:39:24 -08:00
Dan McGee
7eb151d6e2 Make NO_PTHREADS the sole thread configuration variable
When the first piece of threaded code was introduced in commit 8ecce684, it
came with its own THREADED_DELTA_SEARCH Makefile option. Since this time,
more threaded code has come into the codebase and a NO_PTHREADS option has
also been added. Get rid of the original option as the newer, more generic
option covers everything we need.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-31 11:50:50 -08:00
Linus Torvalds
3bb7256281 make "index-pack" a built-in
This required some fairly trivial packfile function 'const' cleanup,
since the builtin commands get a const char *argv[] array.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-22 10:10:27 -08:00
Junio C Hamano
06dbc1ea57 Merge branch 'jc/conflict-marker-size'
* jc/conflict-marker-size:
  rerere: honor conflict-marker-size attribute
  rerere: prepare for customizable conflict marker length
  conflict-marker-size: new attribute
  rerere: use ll_merge() instead of using xdl_merge()
  merge-tree: use ll_merge() not xdl_merge()
  xdl_merge(): allow passing down marker_size in xmparam_t
  xdl_merge(): introduce xmparam_t for merge specific parameters
  git_attr(): fix function signature

Conflicts:
	builtin-merge-file.c
	ll-merge.c
	xdiff/xdiff.h
	xdiff/xmerge.c
2010-01-20 20:28:51 -08:00
Junio C Hamano
7fb0eaa289 git_attr(): fix function signature
The function took (name, namelen) as its arguments, but all the public
callers wanted to pass a full string.

Demote the counted-string interface to an internal API status, and allow
public callers to just pass the string to the function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 20:39:59 -08:00
Andrzej K. Haczewski
44626dc7d5 MSVC: Windows-native implementation for subset of Pthreads API
This patch implements native to Windows subset of pthreads API used by Git.
It allows to remove Pthreads for Win32 dependency for MSVC, msysgit and
Cygwin.

[J6t: If the MinGW build was built as part of the msysgit build
environment, then threading was already enabled because the
pthreads-win32 package is available in msysgit. With this patch, we can now
enable threaded code unconditionally.]

Signed-off-by: Andrzej K. Haczewski <ahaczewski@gmail.com>
Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-16 18:16:06 -08:00
Junio C Hamano
75a7ea258c Merge branch 'maint'
* maint:
  pack-objects: split implications of --all-progress from progress activation
  instaweb: restart server if already running
  prune-packed: only show progress when stderr is a tty

Conflicts:
	builtin-pack-objects.c
2009-11-23 21:54:39 -08:00
Nicolas Pitre
4f36627518 pack-objects: split implications of --all-progress from progress activation
Currently the --all-progress flag is used to use force progress display
during the writing object phase even if output goes to stdout which is
primarily the case during a push operation.  This has the unfortunate
side effect of forcing progress display even if stderr is not a
terminal.

Let's introduce the --all-progress-implied argument which has the same
intent except for actually forcing the activation of any progress
display.  With this, progress display will be automatically inhibited
whenever stderr is not a terminal, or full progress display will be
included otherwise.  This should let people use 'git push' within a cron
job without filling their logs with useless percentage displays.

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Tested-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-23 21:33:09 -08:00
Nicolas Pitre
ef0555712c pack-objects: move thread autodetection closer to relevant code
Let's keep thread stuff close together if possible.  And in this case,
this even reduces the #ifdef noise, and allows for skipping the
autodetection altogether if delta search is not needed (like with a pure
clone).

Signed-off-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-11-05 23:11:42 -08:00
Thiago Farina
812bdbc31b pack-objects: remove SP at the end of usage string
These spaces immediately before the end of lines are unnecessary.

While at it, instead of using a single string literal with backslashes
at end of each line, split the lines into individual string literals
and tell the compiler to concatenate them.

Signed-off-by: Thiago Farina <tfransosi@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-18 19:48:48 -07:00
Junio C Hamano
8e4384fd44 Merge branch 'np/maint-1.6.3-deepen'
* np/maint-1.6.3-deepen:
  pack-objects: free preferred base memory after usage
  make shallow repository deepening more network efficient
2009-09-07 15:23:50 -07:00
Nicolas Pitre
0ef95f72f8 pack-objects: free preferred base memory after usage
When adding objects for preferred delta base, the content from tree
objects leading to given paths is kept in a cache. This has the
potential to grow significantly, especially with large directories as
the whole tree object content is loaded in memory, even if in practice
the number of those objects is limited to the 256 cache entries plus the
$window root tree objects.  Still, that can't hurt freeing that up after
object enumeration is done, and before more memory is needed for delta
search.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-05 22:27:08 -07:00
Junio C Hamano
dcda3614d4 builtin-pack-objects.c: avoid vla
This is one of only two places that we use C99 variable length array on
the stack, which some older compilers apparently are not happy with.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-09-01 07:55:56 -07:00
Brian Gianforcaro
eeefa7c90e Style fixes, add a space after if/for/while.
The majority of code in core git appears to use a single
space after if/for/while. This is an attempt to bring more
code to this standard. These are entirely cosmetic changes.

Signed-off-by: Brian Gianforcaro <b.gianfo@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-31 23:26:28 -07:00
Junio C Hamano
f00ecbe42b Merge branch 'cc/replace'
* cc/replace:
  t6050: check pushing something based on a replaced commit
  Documentation: add documentation for "git replace"
  Add git-replace to .gitignore
  builtin-replace: use "usage_msg_opt" to give better error messages
  parse-options: add new function "usage_msg_opt"
  builtin-replace: teach "git replace" to actually replace
  Add new "git replace" command
  environment: add global variable to disable replacement
  mktag: call "check_sha1_signature" with the replacement sha1
  replace_object: add a test case
  object: call "check_sha1_signature" with the replacement sha1
  sha1_file: add a "read_sha1_file_repl" function
  replace_object: add mechanism to replace objects found in "refs/replace/"
  refs: add a "for_each_replace_ref" function
2009-08-21 18:47:53 -07:00
Nicolas Pitre
5749b0b2f9 don't let the delta cache grow unbounded in 'git repack'
I have 4GB of RAM on my system which should, in theory, be quite enough
to repack a 600 MB repository.  However the unbounded delta cache size
always pushes it into swap, at which point everything virtually comes to
a halt.  So unbounded caches are never a good idea.

A default of 256MB should be a good compromize between memory usage and
speed where medium sized repositories are still likely to fit in the
cache with a reasonable memory usage, and larger repositories are going
to take quite some time to repack already anyway.

While at it, clarify the associated config variable documentation
entries a bit.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-08-05 20:14:54 -07:00
Junio C Hamano
130b04ab37 Merge branch 'js/maint-graft-unhide-true-parents'
* js/maint-graft-unhide-true-parents:
  git repack: keep commits hidden by a graft
  Add a test showing that 'git repack' throws away grafted-away parents

Conflicts:
	git-repack.sh
2009-07-25 00:45:03 -07:00
Johannes Schindelin
7f3140cd23 git repack: keep commits hidden by a graft
When you have grafts that pretend that a given commit has different
parents than the ones recorded in the commit object, it is dangerous
to let 'git repack' remove those hidden parents, as you can easily
remove the graft and end up with a broken repository.

So let's play it safe and keep those parent objects and everything
that is reachable by them, in addition to the grafted parents.

As this behavior can only be triggered by git pack-objects, and as that
command handles duplicate parents gracefully, we do not bother to cull
duplicated parents that may result by using both true and grafted
parents.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-24 09:10:16 -07:00
Junio C Hamano
dd787c19c4 Merge branch 'tr/die_errno'
* tr/die_errno:
  Use die_errno() instead of die() when checking syscalls
  Convert existing die(..., strerror(errno)) to die_errno()
  die_errno(): double % in strerror() output just in case
  Introduce die_errno() that appends strerror(errno) to die()
2009-07-06 09:39:46 -07:00
Thomas Rast
d824cbba02 Convert existing die(..., strerror(errno)) to die_errno()
Change calls to die(..., strerror(errno)) to use the new die_errno().

In the process, also make slight style adjustments: at least state
_something_ about the function that failed (instead of just printing
the pathname), and put paths in single quotes.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-27 11:14:53 -07:00
Linus Torvalds
48fb7deb5b Fix big left-shifts of unsigned char
Shifting 'unsigned char' or 'unsigned short' left can result in sign
extension errors, since the C integer promotion rules means that the
unsigned char/short will get implicitly promoted to a signed 'int' due to
the shift (or due to other operations).

This normally doesn't matter, but if you shift things up sufficiently, it
will now set the sign bit in 'int', and a subsequent cast to a bigger type
(eg 'long' or 'unsigned long') will now sign-extend the value despite the
original expression being unsigned.

One example of this would be something like

	unsigned long size;
	unsigned char c;

	size += c << 24;

where despite all the variables being unsigned, 'c << 24' ends up being a
signed entity, and will get sign-extended when then doing the addition in
an 'unsigned long' type.

Since git uses 'unsigned char' pointers extensively, we actually have this
bug in a couple of places.

I may have missed some, but this is the result of looking at

	git grep '[^0-9 	][ 	]*<<[ 	][a-z]' -- '*.c' '*.h'
	git grep '<<[   ]*24'

which catches at least the common byte cases (shifting variables by a
variable amount, and shifting by 24 bits).

I also grepped for just 'unsigned char' variables in general, and
converted the ones that most obviously ended up getting implicitly cast
immediately anyway (eg hash_name(), encode_85()).

In addition to just avoiding 'unsigned char', this patch also tries to use
a common idiom for the delta header size thing. We had three different
variations on it: "& 0x7fUL" in one place (getting the sign extension
right), and "& ~0x80" and "& 0x7f" in two other places (not getting it
right). Apart from making them all just avoid using "unsigned char" at
all, I also unified them to then use a simple "& 0x7f".

I considered making a sparse extension which warns about doing implicit
casts from unsigned types to signed types, but it gets rather complex very
quickly, so this is just a hack.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-06-18 09:22:46 -07:00
Christian Couder
dae556bdb1 environment: add global variable to disable replacement
This new "read_replace_refs" global variable is set to 1 by
default, so that replace refs are used by default. But
reachability traversal and packing commands ("cmd_fsck",
"cmd_prune", "cmd_pack_objects", "upload_pack",
"cmd_unpack_objects") set it to 0, as they must work with the
original DAG.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-05-31 17:02:59 -07:00
Mike Ralphson
3ea3c215c0 Fix typos / spelling in comments
Signed-off-by: Mike Ralphson <mike@abacus.co.uk>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-22 19:02:12 -07:00
Junio C Hamano
9824a388e5 Merge branch 'lt/pack-object-memuse'
* lt/pack-object-memuse:
  show_object(): push path_name() call further down
  process_{tree,blob}: show objects without buffering

Conflicts:
	builtin-pack-objects.c
	builtin-rev-list.c
	list-objects.c
	list-objects.h
	upload-pack.c
2009-04-18 14:46:17 -07:00
Linus Torvalds
cf2ab916af show_object(): push path_name() call further down
In particular, pushing the "path_name()" call _into_ the show() function
would seem to allow

 - more clarity into who "owns" the name (ie now when we free the name in
   the show_object callback, it's because we generated it ourselves by
   calling path_name())

 - not calling path_name() at all, either because we don't care about the
   name in the first place, or because we are actually happy walking the
   linked list of "struct name_path *" and the last component.

Now, I didn't do that latter optimization, because it would require some
more coding, but especially looking at "builtin-pack-objects.c", we really
don't even want the whole pathname, we really would be better off with the
list of path components.

Why? We use that name for two things:
 - add_preferred_base_object(), which actually _wants_ to traverse the
   path, and now does it by looking for '/' characters!
 - for 'name_hash()', which only cares about the last 16 characters of a
   name, so again, generating the full name seems to be just unnecessary
   work.

Anyway, so I didn't look any closer at those things, but it did convince
me that the "show_object()" calling convention was crazy, and we're
actually better off doing _less_ in list-objects.c, and giving people
access to the internal data structures so that they can decide whether
they want to generate a path-name or not.

This patch does that, and then for people who did use the name (even if
they might do something more clever in the future), it just does the
straightforward "name = path_name(path, component); .. free(name);" thing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12 17:28:31 -07:00
Linus Torvalds
8d2dfc49b1 process_{tree,blob}: show objects without buffering
Here's a less trivial thing, and slightly more dubious one.

I was looking at that "struct object_array objects", and wondering why we
do that. I have honestly totally forgotten. Why not just call the "show()"
function as we encounter the objects? Rather than add the objects to the
object_array, and then at the very end going through the array and doing a
'show' on all, just do things more incrementally.

Now, there are possible downsides to this:

 - the "buffer using object_array" _can_ in theory result in at least
   better I-cache usage (two tight loops rather than one more spread out
   one). I don't think this is a real issue, but in theory..

 - this _does_ change the order of the objects printed. Instead of doing a
   "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop
   over the commits (which puts all the root trees _first_ in the object
   list, this patch just adds them to the list of pending objects, and
   then we'll traverse them in that order (and thus show each root tree
   object together with the objects we discover under it)

   I _think_ the new ordering actually makes more sense, but the object
   ordering is actually a subtle thing when it comes to packing
   efficiency, so any change in order is going to have implications for
   packing. Good or bad, I dunno.

 - There may be some reason why we did it that odd way with the object
   array, that I have simply forgotten.

Anyway, now that we don't buffer up the objects before showing them
that may actually result in lower memory usage during that whole
traverse_commit_list() phase.

This is seriously not very deeply tested. It makes sense to me, it seems
to pass all the tests, it looks ok, but...

Does anybody remember why we did that "object_array" thing? It used to be
an "object_list" a long long time ago, but got changed into the array due
to better memory usage patterns (those linked lists of obejcts are
horrible from a memory allocation standpoint). But I wonder why we didn't
do this back then. Maybe there's a reason for it.

Or maybe there _used_ to be a reason, and no longer is.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12 17:28:31 -07:00
Junio C Hamano
6e353a5e5d Merge branch 'cc/bisect-filter'
* cc/bisect-filter: (21 commits)
  rev-list: add "int bisect_show_flags" in "struct rev_list_info"
  rev-list: remove last static vars used in "show_commit"
  list-objects: add "void *data" parameter to show functions
  bisect--helper: string output variables together with "&&"
  rev-list: pass "int flags" as last argument of "show_bisect_vars"
  t6030: test bisecting with paths
  bisect: use "bisect--helper" and remove "filter_skipped" function
  bisect: implement "read_bisect_paths" to read paths in "$GIT_DIR/BISECT_NAMES"
  bisect--helper: implement "git bisect--helper"
  bisect: use the new generic "sha1_pos" function to lookup sha1
  rev-list: call new "filter_skip" function
  patch-ids: use the new generic "sha1_pos" function to lookup sha1
  sha1-lookup: add new "sha1_pos" function to efficiently lookup sha1
  rev-list: pass "revs" to "show_bisect_vars"
  rev-list: make "show_bisect_vars" non static
  rev-list: move code to show bisect vars into its own function
  rev-list: move bisect related code into its own file
  rev-list: make "bisect_list" variable local to "cmd_rev_list"
  refs: add "for_each_ref_in" function to refactor "for_each_*_ref" functions
  quote: add "sq_dequote_to_argv" to put unwrapped args in an argv array
  ...
2009-04-12 16:46:40 -07:00
Junio C Hamano
a54c4edc51 Merge branch 'maint'
* maint:
  GIT 1.6.2.3
  State the effect of filter-branch on graft explicitly
  process_{tree,blob}: Remove useless xstrdup calls

Conflicts:
	GIT-VERSION-GEN
2009-04-12 16:01:25 -07:00
Junio C Hamano
1966af8176 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  State the effect of filter-branch on graft explicitly
  process_{tree,blob}: Remove useless xstrdup calls
2009-04-12 15:34:53 -07:00
Junio C Hamano
bc69776aa1 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  State the effect of filter-branch on graft explicitly
  process_{tree,blob}: Remove useless xstrdup calls
2009-04-12 15:20:29 -07:00
Linus Torvalds
213152688c process_{tree,blob}: Remove useless xstrdup calls
On Wed, 8 Apr 2009, Björn Steinbrink wrote:
>
> The name of the processed object was duplicated for passing it to
> add_object(), but that already calls path_name, which allocates a new
> string anyway. So the memory allocated by the xstrdup calls just went
> nowhere, leaking memory.

Ack, ack.

There's another easy 5% or so for the built-in object walker: once we've
created the hash from the name, the name isn't interesting any more, and
so something trivial like this can help a bit.

Does it matter? Probably not on its own. But a few more memory saving
tricks and it might all make a difference.

		Linus

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-12 14:30:31 -07:00
Dan McGee
b6c29915d2 Update delta compression message to be less misleading
In the case of a small repository, pack-objects is smart enough to not
start more threads than necessary. However, the output to the user always
reports the value of the pack.threads configuration and not the real
number of threads to be used.

Signed-off-by: Dan McGee <dpmcgee@gmail.com>
Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 22:21:59 -07:00
Junio C Hamano
c3067cbfb3 Merge branch 'jc/maint-1.6.0-keep-pack' into maint
* jc/maint-1.6.0-keep-pack:
  pack-objects: don't loosen objects available in alternate or kept packs
  t7700: demonstrate repack flaw which may loosen objects unnecessarily
  Remove --kept-pack-only option and associated infrastructure
  pack-objects: only repack or loosen objects residing in "local" packs
  git-repack.sh: don't use --kept-pack-only option to pack-objects
  t7700-repack: add two new tests demonstrating repacking flaws
  is_kept_pack(): final clean-up
  Simplify is_kept_pack()
  Consolidate ignore_packed logic more
  has_sha1_kept_pack(): take "struct rev_info"
  has_sha1_pack(): refactor "pretend these packs do not exist" interface
  git-repack: resist stray environment variable

Conflicts:
	t/t7700-repack.sh
2009-04-08 23:21:10 -07:00
Christian Couder
11c211fa06 list-objects: add "void *data" parameter to show functions
The goal of this patch is to get rid of the "static struct rev_info
revs" static variable in "builtin-rev-list.c".

To do that, we need to pass the revs to the "show_commit" function
in "builtin-rev-list.c" and this in turn means that the
"traverse_commit_list" function in "list-objects.c" must be passed
functions pointers to functions with 2 parameters instead of one.

So we have to change all the callers and all the functions passed
to "traverse_commit_list".

Anyway this makes the code more clean and more generic, so it
should be a good thing in the long run.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-07 22:12:38 -07:00
Junio C Hamano
3c91bf6805 Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  pack-objects: don't loosen objects available in alternate or kept packs
  t7700: demonstrate repack flaw which may loosen objects unnecessarily
  Remove --kept-pack-only option and associated infrastructure
  pack-objects: only repack or loosen objects residing in "local" packs
  git-repack.sh: don't use --kept-pack-only option to pack-objects
  t7700-repack: add two new tests demonstrating repacking flaws

Conflicts:
	t/t7700-repack.sh
2009-04-01 22:34:19 -07:00
Junio C Hamano
89fbda2425 Merge branch 'maint'
* maint:
  Increase the size of the die/warning buffer to avoid truncation
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 19:45:57 -07:00
Junio C Hamano
b0de555410 Merge branch 'maint-1.6.1' into maint
* maint-1.6.1:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:21 -07:00
Junio C Hamano
2a5643da73 Merge branch 'maint-1.6.0' into maint-1.6.1
* maint-1.6.0:
  close_sha1_file(): make it easier to diagnose errors
  avoid possible overflow in delta size filtering computation
2009-03-24 15:31:15 -07:00
Nicolas Pitre
720fe22d50 avoid possible overflow in delta size filtering computation
On a 32-bit system, the maximum possible size for an object is less than
4GB, while 64-bit systems may cope with larger objects.  Due to this
limitation, variables holding object sizes are using an unsigned long
type (32 bits on 32-bit systems, or 64 bits on 64-bit systems).

When large objects are encountered, and/or people play with large delta
depth values, it is possible for the maximum allowed delta size
computation to overflow, especially on a 32-bit system.  When this
occurs, surviving result bits may represent a value much smaller than
what it is supposed to be, or even zero.  This prevents some objects
from being deltified although they do get deltified when a smaller depth
limit is used.  Fix this by always performing a 64-bit multiplication.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-24 14:37:30 -07:00
Junio C Hamano
2990034f1e Merge branch 'jc/maint-1.6.0-pack-directory' into maint-1.6.1
* jc/maint-1.6.0-pack-directory:
  Fix odb_mkstemp() on AIX
  Make sure objects/pack exists before creating a new pack

Conflicts:
	wrapper.c
2009-03-21 22:53:36 -07:00
Brandon Casey
094085e336 pack-objects: don't loosen objects available in alternate or kept packs
If pack-objects is called with the --unpack-unreachable option then it
will unpack (i.e. loosen) all unreferenced objects from local not-kept
packs, including those that also exist in packs residing in an alternate
object database or a locally kept pack.  The only user of this option is
git-repack.

In this case, repack will follow the call to pack-objects with a call to
prune-packed, which will delete these newly loosened objects, making the
act of loosening a waste of time.  The unnecessary loosening can be
avoided by checking whether an object exists in a non-local pack or a
locally kept pack before loosening it.

This fixes the 'local packed unreachable obs that exist in alternate ODB
are not loosened' test in t7700.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-21 21:58:44 -07:00
Brandon Casey
4d6acb7041 Remove --kept-pack-only option and associated infrastructure
This option to pack-objects/rev-list was created to improve the -A and -a
options of repack.  It was found to be lacking in that it did not provide
the ability to differentiate between local and non-local kept packs, and
found to be unnecessary since objects residing in local kept packs can be
filtered out by the --honor-pack-keep option.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-20 13:32:33 -07:00
Brandon Casey
79bc4c7155 pack-objects: only repack or loosen objects residing in "local" packs
These two features were invented for use by repack when repack will delete
the local packs that have been made redundant.  The packs accessible
through alternates are not deleted by repack, so the objects contained in
them are still accessible after the local packs are deleted.  They do not
need to be repacked into the new pack or loosened.  For the case of
loosening they would immediately be deleted by the subsequent prune-packed
that is called by repack anyway.

This fixes the test
'packed unreachable obs in alternate ODB are not loosened' in t7700.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-03-20 13:32:33 -07:00
Junio C Hamano
aec813062b Merge branch 'jc/maint-1.6.0-keep-pack'
* jc/maint-1.6.0-keep-pack:
  is_kept_pack(): final clean-up
  Simplify is_kept_pack()
  Consolidate ignore_packed logic more
  has_sha1_kept_pack(): take "struct rev_info"
  has_sha1_pack(): refactor "pretend these packs do not exist" interface
  git-repack: resist stray environment variable
2009-03-11 13:49:56 -07:00
Junio C Hamano
69e020ae00 is_kept_pack(): final clean-up
Now is_kept_pack() is just a member lookup into a structure, we can write
it as such.

Also rewrite the sole caller of has_sha1_kept_pack() to switch on the
criteria the callee uses (namely, revs->kept_pack_only) between calling
has_sha1_kept_pack() and has_sha1_pack(), so that these two callees do not
have to take a pointer to struct rev_info as an argument.

This removes the header file dependency issue temporarily introduced by
the earlier commit, so we revert changes associated to that as well.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
03a9683d22 Simplify is_kept_pack()
This removes --unpacked=<packfile> parameter from the revision parser, and
rewrites its use in git-repack to pass a single --kept-pack-only option
instead.

The new --kept-pack-only option means just that.  When this option is
given, is_kept_pack() that used to say "not on the --unpacked=<packfile>
list" now says "the packfile has corresponding .keep file".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
386cb77210 Consolidate ignore_packed logic more
This refactors three loops that check if a given packfile is on the
ignore_packed list into a function is_kept_pack().  The function returns
false for a pack on the list, and true for a pack not on the list, because
this list is solely used by "git repack" to pass list of packfiles that do
not have corresponding .keep files, i.e. a packfile not on the list is
"kept".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-28 01:06:06 -08:00
Junio C Hamano
bb0cebd7d0 Merge branch 'jc/maint-1.6.0-pack-directory'
* jc/maint-1.6.0-pack-directory:
  Make sure objects/pack exists before creating a new pack
2009-02-25 14:50:05 -08:00
Junio C Hamano
6e180cdcec Make sure objects/pack exists before creating a new pack
In a repository created with git older than f49fb35 (git-init-db: create
"pack" subdirectory under objects, 2005-06-27), objects/pack/ directory is
not created upon initialization.  It was Ok because subdirectories are
created as needed inside directories init-db creates, and back then,
packfiles were recent invention.

After the said commit, new codepaths started relying on the presense of
objects/pack/ directory in the repository.  This was exacerbated with
8b4eb6b (Do not perform cross-directory renames when creating packs,
2008-09-22) that moved the location temporary pack files are created from
objects/ directory to objects/pack/ directory, because moving temporary to
the final location was done carefully with lazy leading directory creation.

Many packfile related operations in such an old repository can fail
mysteriously because of this.

This commit introduces two helper functions to make things work better.

 - odb_mkstemp() is a specialized version of mkstemp() to refactor the
   code and teach it to create leading directories as needed;

 - odb_pack_keep() refactors the code to create a ".keep" file while
   create leading directories as needed.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-02-25 14:39:42 -08:00