git-commit-vandalism

Author	SHA1	Message	Date
Shawn O. Pearce	825769a8fe	Teach fast-import how to clear the internal branch content. Some frontends may not be able to (easily) keep track of which files are included in the branch, and which aren't. Performing this tracking can be tedious and error prone for the frontend to do, especially if its foreign data source cannot supply the changed path list on a per-commit basis. fast-import now allows a frontend to request that a branch's tree be wiped clean (reset to the empty tree) at the start of a commit, allowing the frontend to feed in all paths which belong on the branch. This is ideal for a tar-file importer frontend, for example, as the frontend just needs to reformat the tar data stream into a gfi data stream, which may be something a few Perl regexps can take care of. :) Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-07 02:03:03 -05:00
Junio C Hamano	9981b6d915	S_IFLNK != 0140000 Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 16:08:30 -05:00
Shawn O. Pearce	7073e69e38	Don't do non-fastforward updates in fast-import. If fast-import is being used to update an existing branch of a repository, the user may not want to lose commits if another process updates the same ref at the same time. For example, the user might be using fast-import to make just one or two commits against a live branch. We now perform a fast-forward check during the ref updating process. If updating a branch would cause commits in that branch to be lost, we skip over it and display the new SHA1 to standard error. This new default behavior can be overridden with `--force`, like git-push and git-fetch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 16:08:06 -05:00
Shawn O. Pearce	63e0c8b364	Support RFC 2822 date parsing in fast-import. Since some frontends may be working with source material where the dates are only readily available as RFC 2822 strings, it is more friendly if fast-import exposes Git's parse_date() function to handle the conversion. This way the frontend doesn't need to perform the parsing itself. The new --date-format option to fast-import can be used by a frontend to select which format it will supply date strings in. The default is the standard `raw` Git format, which fast-import has always supported. Format rfc2822 can be used to activate the parse_date() function instead. Because fast-import could also be useful for creating new, current commits, the format `now` is also supported to generate the current system timestamp. The implementation of `now` is a trivial call to datestamp(), but is actually a whole whopping 3 lines so that fast-import can verify the frontend really meant `now`. As part of this change I have added validation of the `raw` date format. Prior to this change fast-import would accept anything in a `committer` command, even if it was seriously malformed. Now fast-import requires the '> ' near the end of the string and verifies the timestamp is formatted properly. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 14:58:30 -05:00
Shawn O. Pearce	e7d06a4b70	Remove unnecessary null pointer checks in fast-import. There is no need to check for a NULL pointer before invoking free(), the runtime library automatically performs this check anyway and does nothing if a NULL pointer is supplied. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 12:05:51 -05:00
Shawn O. Pearce	e5b1444b96	Correct minor style issue in fast-import. Junio noticed that I was using a different style in fast-import for returned pointers than the rest of Git. Before merging this code into the main git.git tree I'd like to make it consistent, as this style variation was not intentional. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:43:59 -05:00
Shawn O. Pearce	10e8d68820	Correct compiler warnings in fast-import. Junio noticed these warnings/errors in fast-import when compiling with `-Werror -ansi -pedantic`. A few changes are to reduce compiler warnings, while one (in cmd_merge) is a bug fix. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:26:49 -05:00
Shawn O. Pearce	0b868e0240	Remove --branch-log from fast-import. The --branch-log option and its associated code hasn't been used in several months, as its not really very useful for debugging fast-import or a frontend. I don't plan on supporting it in this state long-term, so I'm killing it now before it gets distributed to a wider audience. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-06 00:15:37 -05:00
Shawn O. Pearce	6c3aac1c69	Don't support shell-quoted refnames in fast-import. The current implementation of shell-style quoted refnames and SHA-1 expressions within fast-import contains a bad memory leak. We leak the unquoted strings used by the `from` and `merge` commands, maybe others. Its also just muddling up the docs. Since Git refnames cannot contain LF, and that is our delimiter for the end of the refname, and we accept any other character as-is, there is no reason for these strings to support quoting, except to be nice to frontends. But frontends shouldn't be expecting to use funny refs in Git, and its just as simple to never quote them as it is to always pass them through the same quoting filter as pathnames. So frontends should never quote refs, or ref expressions. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 20:30:37 -05:00
Shawn O. Pearce	10831c5513	Reduce memory usage of fast-import. Some structs are allocated rather frequently, but were using integer types which were far larger than required to actually store their full value range. As packfiles are limited to 4 GiB we don't need more than 32 bits to store the offset of an object within that packfile, an `unsigned long` on a 64 bit system is likely a 64 bit unsigned value. Saving 4 bytes per object on a 64 bit system can add up fast on any sizable import. As atom strings are strictly single components in a path name these are probably limited to just 255 bytes by the underlying OS. Going to that short of a string is probably too restrictive, but certainly `unsigned int` is far too large for their lengths. `unsigned short` is a reasonable limit. Modes within a tree really only need two bytes to store their whole value; using `unsigned int` here is vast overkill. Saving 4 bytes per file entry in an active branch can add up quickly on a project with a large number of files. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 16:34:56 -05:00
Shawn O. Pearce	8c1f22da9f	Include checkpoint command in the BNF. This command isn't encouraged (as its slow) but it does exist and is accepted, so it still should be covered in the BNF. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-02-05 16:05:11 -05:00
Shawn O. Pearce	76db9dec81	Merge branch 'master' into sp/gfi git-fast-import requires use of inttypes.h, but the master branch has added it to git-compat-util differently than git-fast-import originally had used it. This merge back of master to the fast-import topic is to get (and use) inttypes.h the way master is using it. This is a partially evil merge to remove the call to setup_ident(), as the master branch now contains a change which makes this implicit and therefore removed the function declaration. (commit `01754769`). Conflicts: git-compat-util.h	2007-01-30 11:07:24 -05:00
Shawn O. Pearce	b715cfbba4	Accept 'inline' file data in fast-import commit structure. Its very annoying to need to specify the file content ahead of a commit and use marks to connect the individual blobs to the commit's file modification entry, especially if the frontend can't/won't generate the blob SHA1s itself. Instead it would much easier to use if we can accept the blob data at the same time as we receive each file_change line. Now fast-import accepts 'inline' instead of a mark idnum or blob SHA1 within the 'M' type file_change command. If an inline is detected the very next line must be a 'data n' command, supplying the file data. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 15:17:58 -05:00
Shawn O. Pearce	3b4dce0275	Support delimited data regions in fast-import. During testing its nice to not have to feed the length of a data chunk to the 'data' command of fast-import. Instead we would prefer to be able to establish a data chunk much like shell's << operator and use a line delimiter to denote the end of the input. So now if a data command is started as 'data <<EOF' we will look for a terminator line containing only the string EOF on that line. Once found, we stop the data command. Everything between the two lines is used as the data value. The 'data <<' syntax is slower than 'data n', as we don't know how many bytes to expect and instead must grow our buffer on the fly. It also has the problem that the frontend must use a string which will not appear on a line by itself in the input, and the data region will always end in an LF. For these reasons real import frontends are encouraged to continue to use _only_ 'data n'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 13:25:37 -05:00
Shawn O. Pearce	e5808826c4	Remove unnecessary options from fast-import. The --objects command line option is rather unnecessary. Internally we allocate objects in 5000 unit blocks, ensuring that any sort of malloc overhead is ammortized over the individual objects to almost nothing. Since most frontends don't know how many objects they will need for a given import run (and its hard for them to predict without just doing the run) we probably won't see anyone using --objects. Further since there's really no major benefit to using the option, most frontends won't even bother supplying it even if they could estimate the number of objects. So I'm removing it. The --max-objects-per-pack option was probably a mistake to even have added in the first place. The packfile format is limited to 4 GiB today; given that objects need at least 3 bytes of data (and probably need even more) there's no way we are going to exceed the limit of 1<<32-1 objects before we reach the file size limit. So I'm removing it (to slightly reduce the complexity of the code) before anyone gets any wise ideas and tries to use it. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 12:02:37 -05:00
Shawn O. Pearce	ebea9dd4f1	Use fixed-size integers when writing out the index in fast-import. Currently the pack .idx file format uses 32-bit unsigned integers for the fan-out table and the object offsets. We had previously defined these as 'unsigned int', but not every system will define that type to be a 32 bit value. To ensure maximum portability we should always use 'uint32_t'. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 11:30:17 -05:00
Shawn O. Pearce	566f44252b	Always use struct pack_header for pack header in fast-import. Previously we were using 'unsigned int' to update the hdr_entries field of the pack header after the file had been completed and was being hashed. This may not be 32 bits on all platforms. Instead we want to always uint32_t. I'm actually cheating here by just using the pack_header like the rest of Git and letting the struct definition declare the correct type. Right now that field is still 'unsigned int' (wrong) but a pending change submitted by Simon 'corecode' Schubert changes it to uint32_t. After that change is merged in fast-import will do the right thing all of the time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-18 11:26:06 -05:00
Shawn O. Pearce	69e74e7412	Correct packfile edge output in fast-import. Branches are only contained by a packfile if the branch actually had its most recent commit in that packfile. So new branches are set to MAX_PACK_ID to ensure they don't cause their commit to list as part of the first packfile when it closes out if the commit was actually in existance before fast-import started. Also corrected the type of last_commit to be umaxint_t to prevent overflow and wraparound on very large imports. Though that is highly unlikely to occur as we're talking 4 billion commits, which no real project has right now. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 02:42:43 -05:00
Shawn O. Pearce	fd99224eec	Declare no-arg functions as (void) in fast-import. Apparently the git convention is to declare any function which takes no arguments as taking void. I did not do this during the early fast-import development, but should have. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 01:47:25 -05:00
Shawn O. Pearce	6f64f6d9d2	Correct a few types to be unsigned in fast-import. The length of an atom string cannot be negative. So make it explicit and declare it as an unsigned value. The shift width in a mark table node also cannot be negative. I'm also moving it to after the pointer arrays to prevent any possible alignment problems on a 64 bit system. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 01:13:22 -05:00
Shawn O. Pearce	2104838bf9	Corrected BNF input documentation for fast-import. Now that fast-import uses uintmax_t (the largest available unsigned integer type) for marks we don't want to say its an unsigned 32 bit integer in ASCII base 10 notation. It could be much larger, especially on 64 bit systems, and especially if a frontend uses a very large number of marks (1 per file revision on a very, very large import). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-17 00:33:18 -05:00
Shawn O. Pearce	2369ed7907	Print out the edge commits for each packfile in fast-import. To help callers repack very large repositories into a series of packfiles fast-import now outputs the last commits/tags it wrote to a packfile when it prints out the packfile name. This information can be feed to pack-objects --revs to repack. For the first pack of an initial import this is pretty easy (just feed those SHA1s on stdin) but for subsequent packs you want to feed the subsequent pack's final SHA1s but also all prior pack's SHA1s prefixed with the negation operator. This way the prior pack's data does not get included into the subsequent pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 16:18:44 -05:00
Shawn O. Pearce	a7ddc48765	Correct object_count type and stat output in fast-import. Since object_count is limited to 'unsigned long' (really an unsigned 32 bit integer value) by the pack file format we may as well use exactly that type here in fast-import for that counter. An earlier change by me incorrectly made it uintmax_t. But since object_count is a counter for the current packfile only, we don't want to output its value at the end. Instead we should sum up the individual type counters and report that total, as that will cover all of the packfiles. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 04:55:41 -05:00
Shawn O. Pearce	eec11c2484	Correct max_packsize default in fast-import. Apparently amd64 has defined 'unsigned long' to be a 64 bit value, which means -1 was way over the 4 GiB packfile limit. Whoops. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 04:25:12 -05:00
Shawn O. Pearce	0fcbcae753	Remove unnecessary pack_fd global in fast-import. Much like the pack_sha1 the pack_fd is an unnecessary global variable, we already have the fd stored in our struct packed_git *pack_data so that the core library functions in sha1_file.c are able to lookup and decompress object data that we have previously written. Keeping an extra copy of this value in our own variable is just a hold-over from earlier versions of fast-import and is now completely unnecessary. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:20:57 -05:00
Shawn O. Pearce	1280158738	Ensure we close the packfile after creating it in fast-import. Because we are renaming the packfile into its file destination we need to be sure its not open when the rename is called, otherwise some operating systems (e.g. Windows) may prevent the rename from occurring. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:17:47 -05:00
Shawn O. Pearce	8455e48476	Use .keep files in fast-import during processing. Because fast-import automatically updates all references (heads and tags) at the end of its run the repository is corrupt unless the objects are available in the .git/objects/pack directory prior to the refs being modified. The easiest way to ensure that is true is to move the packfile and its associated index directly into the .git/objects/pack directory as soon as we have finished output to it. But the only safe way to do this is to create the a temporary .keep file for that pack, so we use the same tricks that index-pack uses when its being invoked by receive-pack. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 01:15:31 -05:00
Shawn O. Pearce	09543c96bb	Reuse sha1 in packed_git in fast-import. Rather than maintaing our own packfile level sha1 variable we can make use of the one already available in struct packed_git. Its meant for the SHA1 of the index but it can also hold the SHA1 of the packfile itself between final checksumming of the packfile and creation of the index. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:44:48 -05:00
Shawn O. Pearce	6cf0926193	Replace redundant yread() with read_in_full() in fast-import. Prior to git having read_in_full() fast-import used its own private function yread to perform the header reading task. No sense in keeping that around now that read_in_full is a public, stable function. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:35:41 -05:00
Shawn O. Pearce	0ea9f045f4	Use uintmax_t for marks in fast-import. If a frontend wants to use a mark per file revision and per commit and is doing a truly huge import (such as a 32 GiB SVN repository) we may need more than 2**32 unique mark values, especially if the frontend is unable (or unwilling) to recycle mark values. For mark idnums we should use the largest unsigned integer type available, hoping that will be at least 64 bits when we are compiled as a 64 bit executable. This way we may consume huge amounts of memory storing our mark table, but we'll at least be able to process the entire import without failing. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-16 00:33:36 -05:00
Shawn O. Pearce	5d6f3ef641	Corrected buffer overflow during automatic checkpoint in fast-import. If we previously were using a delta but we needed to checkpoint the current packfile and switch to a new packfile we need to throw away the delta and compress the raw object by itself, as delta chains cannot span non-thin packfiles. Unfortunately the output buffer in this case needs to grow, as the size of the compressed object may be quite a bit larger than the size of the compressed delta. I've also avoided recompressing the object if we are checkpointing and we didn't use a delta. In this case the output buffer is the correct size and has already been populated with the right data, we just need to close out the current packfile and open a new one. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 23:40:27 -05:00
Shawn O. Pearce	9d1b1b5ed7	Print the packfile names to stdout from fast-import. Caller scripts may want to know what packfiles the fast-import process just wrote out for them. This is now output to stdout, one packfile name per line, after we checkpoint each packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 08:05:01 -05:00
Shawn O. Pearce	d9ee53ce45	Implemented automatic checkpoints within fast-import. When the number of objects or number of bytes gets close to the limit allowed by the packfile format (or configured on the command line by our caller) we should automatically checkpoint the current packfile and start a new one before writing the object out. This does however require that we abandon the delta (if we had one) as its not valid in a new packfile. I also added the simple rule that if we got a delta back but the delta itself is the same size as or larger than the uncompressed object to ignore the delta and just store the object data. This should avoid some really bad behavior caused by our current delta strategy. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 08:00:49 -05:00
Shawn O. Pearce	2fce1f3c86	Optimize index creation on large object sets in fast-import. When we are generating multiple packfiles at once we only need to scan the blocks of object_entry structs which contain objects for the current packfile. Because the most recent blocks are at the front of the linked list, and because all new objects going into the current file are allocated from the front of that list, we can stop scanning for objects as soon as we identify one which doesn't belong to the current packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 07:12:23 -05:00
Shawn O. Pearce	3e005baf85	Don't create a final empty packfile in fast-import. If the last packfile is going to be empty (has 0 objects) then it shouldn't be kept after the import has terminated, as there is no point to the packfile. So rather than hashing it and making the index file, just delete the packfile. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:39:39 -05:00
Shawn O. Pearce	7bfe6e2613	Implemented manual packfile switching in fast-import. To help importers which are dealing with massive amounts of data fast-import needs to be able to close the packfile it is currently writing to and open a new packfile for any additional data that will be received. A new 'checkpoint' command has been introduced which can be used by the frontend import process to force this to occur at any time. This may be useful to ensure a very long running import doesn't lose any work due to unexpected failures. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:35:41 -05:00
Shawn O. Pearce	80144727ac	Remove unnecessary duplicate_count in fast-import. There is little reason to be keeping a global duplicate_count value when we also keep it per object type. The global counter can easily be computed at the end, once all processing has completed. This saves us a couple of machine instructions in an unimportant part of code. But it looks slightly better to me to not keep two counters around. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 06:05:22 -05:00
Shawn O. Pearce	f70b653429	Restructure fast-import to support creating multiple packfiles. Now that we are starting to see some really large projects (such as KDE or a fork of FreeBSD) get imported into Git we're running into the upper limit on packfile object count as well as overall byte length. The KDE and FreeBSD projects are both likely to require more than 4 GiB to store their current history, which means we really need multiple packfiles to handle their content. This is a fairly simple restructuring of the internal code to help us support creating multiple packfiles from within fast-import. We are now adding a 5 digit incrementing suffix to the end of the basename supplied to us by the caller, permitting up to 99,999 packs to be generated in a single fast-import run. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 04:39:05 -05:00
Shawn O. Pearce	03842d8e24	Misc. type cleanups within fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-15 00:16:23 -05:00
Shawn O. Pearce	d489bc1491	Improve reuse of sha1_file library within fast-import. Now that the sha1_file.c library routines use the sliding mmap routines to perform efficient access to portions of a packfile I can remove that code from fast-import.c and just invoke it. One benefit is we now have reloading support for any packfile which uses OBJ_OFS_DELTA. Another is we have significantly less code to maintain. This code reuse change requires that fast-import generate only an OBJ_OFS_DELTA format packfile, as there is absolutely no index available to perform OBJ_REF_DELTA lookup in while unpacking an object. This is probably reasonable to require as the delta offsets result in smaller packfiles and are faster to unpack, as no index searching is required. Its also only a temporary requirement as users could always repack without offsets before making the import available to older versions of Git. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 22:33:51 -05:00
Shawn O. Pearce	1fcdd62adf	Merge branch 'master' into sp/fast-import I'm bringing master in early so that the OBJ_OFS_DELTA implementation is available as part of the topic. This way git-fast-import can learn about this new slightly smaller and faster packfile format, and can generate them directly rather than needing to have them be repacked with git-pack-objects. Due to the API changes in master during the period of development of git-fast-import, a few minor tweaks to fast-import.c are needed to produce a working merge. I've done them here as part of the merge to ensure bisection always works. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:44:18 -05:00
Shawn O. Pearce	9938ffc53a	Allow creating branches without committing in fast-import. Some importers may want to create a branch long before they actually commit to it, or in some cases they may never commit to the branch but they still need the ref to be created in the repository after the import is complete. This extends the 'reset ' command to automatically create a new branch if the supplied reference isn't already known as a branch. While I'm at it I also modified the syntax of the reset command to terminate with an empty line, like commit and tag operate. This just makes the command set more consistent. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:12 -05:00
Shawn O. Pearce	62b6f48388	Support creation of merge commits in fast-import. Some importers are able to determine when branch merges occurred within their source data. In these cases they will want to supply the correct commits to fast-import so that a proper merge commit will exist in Git. This is now supported by supplying a 'merge ' command after the commit message and optional from command. A merge is not actually performed by fast-import, its assumed that the frontend performed any sort of merging activity already and that fast-import should simply be storing its result. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:12 -05:00
Shawn O. Pearce	cacbdd0afb	Fix repository corruption when using marks for modified blobs. Apparently we did not copy the blob SHA1 into the stack variable 'sha1' when a mark is used to refer to a prior blob. This code was not previously tested as the Mozilla CVS -> git-fast-import program always fed us full SHA1s for modified blobs and did not use the mark feature there. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	8a8c55ea70	Additional fast-import tree delta corruption cleanups. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	b54d6422b1	Correct tree corruption problems in fast-import. The new tree delta implementation caused blob SHA1s to be used instead of a tree SHA1 when a tree was written out. This really only appeared to happen when converting an existing file to a tree, but may have been possible in some other situations. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:11 -05:00
Shawn O. Pearce	23bc886c96	Replace ywrite in fast-import with the standard write_or_die. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	243f801d1d	Reuse the same buffer for all commits/tags in fast-import. Since most commits and tag objects are around the same size and we only generate one at a time we can reuse the same buffer rather than xmalloc'ing and free'ing the buffer every time we generate a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	e2eb469d1f	Recycle data buffers for tree generation in fast-import. We only ever generate at most two tree streams at a time. Since most trees are around the same size we can simply recycle the buffers from one tree generation to the next rather than constantly xmalloc'ing and free'ing them. This should perform slightly better when handling a large number of trees as malloc has less work to do. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	4cabf8583f	Implemented tree delta compression in fast-import. We now store for every tree entry two modes and two sha1 values; the base (aka "version 0") and the current/new (aka "version 1"). When we generate a tree object we also regenerate the prior version object and use that as our base object for a delta. This strategy saves a significant amount of memory as we can continue to use the atom pool for file/directory names and only increases each tree entry by an additional 24 bytes of memory. Branches should automatically delta against their ancestor tree, unless the ancestor tree is already at the delta chain limit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:10 -05:00
Shawn O. Pearce	445b85999a	Converted hash memcpy/memcmp to new hashcpy/hashcmp/hashclr. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	08d7e892a7	Don't crash fast-import if no branch log was requested. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	5fced8dc6f	Added 'reset' command to clear a branch's tree. Sometimes an import frontend may need to work with a temporary branch which will actually contain many different branches over the life of the import. This is especially useful when the frontend needs to create a tag from a set of file versions which are otherwise never a commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:09 -05:00
Shawn O. Pearce	53dbce78a2	Map only part of the generated pack file at any point in time. When generating a very large pack file (for example close to 1 GB in size) it may be impossible for the kernel to find a contiguous free range within a 32 bit address space for the mapping to be located at. This is especially problematic on large imports where there is a lot of malloc activity occuring within the same process and the malloc'd regions may straddle the previously mapped regions, thereby creating large holes in the address space. So instead we map only 128 MB of the pack at any given time. This will likely increase the number of times the file gets mapped (with additional system time required to update the page tables more frequently) but will allow the program to handle packs up to 4 GB in size. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	35ef237cf6	Fixed compile error in fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	2eb26d8454	Fixed GPF in fast-import caused by unterminated linked list. fast-import was encounting a GPF when it ran out of free tree_entry objects but didn't know this was the cause because the last tree_entry wasn't terminated with a NULL pointer. The missing NULL pointer occurred when we allocated additional entries via xmalloc but didn't set the last tree_entry's "next" pointer to NULL. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	264244a042	Added --branch-log to option to fast-import. This option can be used to have a record of every commit, the mark (if supplied) and branch name of the commit recorded into a log file when the commit is generated. This log can be useful to verify the results of an import as the commits can be compared to some source repository matching commits through the mark value. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:08 -05:00
Shawn O. Pearce	a6a1a831d9	Added option to export the marks table when fast-import terminates. The marks table can be used by the frontend to load any commit after the import and compare it to whatever data the frontend knows about that commit. If the mark idnums can be easily correlated to some reference source then its relatively trivial to compare the GIT tree to the reference to verify the accuracy of the import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	8435a9cb26	Account for tree entry memory costs in fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	02f3389d96	Moved from command to after data to help cvs2svn. cvs2svn has three phases: begin_commit, middle_commit, end_commit. The ancester is computed in the middle_commit phase. So its easier to generate a stream if the from command appears after the commit message itself but before the file change commands. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:07 -05:00
Shawn O. Pearce	00e2b8842c	Remove branch creation command from fast-import. Jon Smirl was finding it difficult to alter cvs2svn to generate branch commands prior to the first commit of the same branch. This change moves the 'from' command to be an optional parameter of the 'commit' command, thereby allowing a new branch to be defined at the moment it gets used to create the first commit on that branch. This change makes it impossible to create a branch with no commits on it as at least one commit is needed to register the branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	8d8928b051	Round out memory pool allocations in fast-import to pointer sizes. Some architectures (e.g. SPARC) would require that we access pointers only on pointer-sized alignments. So ensure the pool allocator rounds out non-pointer sized allocations to the next pointer so we don't generate bad memory addresses. This could have occurred if we had previously allocated an atom whose string was not a whole multiple of the pointer size, for example. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	41e5257fcf	Implemented tree reloading in fast-import. Tree reloading allows fast-import to swap out the least-recently used branch by simply deallocating the data structures from memory that were associated with that branch. Later if the branch becomes active again it can lazily recreate those structures on demand by reloading the necessary trees from the pack file it originally wrote them to. The reloading process is implemented by mmap'ing the pack into memory and using a much tighter variant of the pack reading code contained in sha1_file.c. This was a blatent copy from sha1_file.c but the unpacking functions were significantly simplified and are actually now in a form that should make it easier to map only the necessary regions of a pack rather than the entire file. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	72303d44e9	Implemented 'tag' command in fast-import. Tags received from the frontend are generated in memory in a simple linked list in the order that the tag commands were sent by the frontend. If multiple different tag objects for the same tag name get generated the last one sent by the frontend will be the one that gets written out at termination. Multiple tag objects for the same name will cause all older tags of the same name to be lost. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:06 -05:00
Shawn O. Pearce	d6c7eb2c16	Added branch load counter to fast-import. If the branch load count exceeds the number of branches created then the frontend is causing fast-import to page branches into and out of memory due to the way its ordering its commits. Performance can likely be increased if the frontend were to alter its commit sequence such that it stays on one branch before switching to another branch, then never returns to the prior branch. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	d83971688b	Added mark store/find to fast-import. Marks are now saved when the mark directive gets used by the frontend and may be used in place of a SHA1 expression to locate a previous SHA1 which fast-import may have generated. This is particularly useful with commits where the frontend does not (easily) have the ability to compute the SHA1 for an arbitrary commit but needs it to generate a branch or tag from that commit. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	d5c57b284e	Converted fast-import to accept standard command line parameters. The following command line options are now accepted before the pack name: --objects=n # replaces the object count after the pack name --depth=n # delta chain depth to use (default is 10) --active-branches=n # maximum number of branches to keep in memory Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	afde8dd96d	Fixed segfault in fast-import after growing a tree. Growing a tree caused all subtrees to be deallocated and put back into the free list yet those subtree's contents were still actively in use. Consequently they were doled out again and got stomped on elsewhere. Releasing a tree is now performed in two parts, either releasing only the content array or releasing the content array and recursively releasing the subtree(s). Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:05 -05:00
Shawn O. Pearce	ace4a9d1ae	Allow symlink blobs in trees during fast-import. If a frontend is smart enough to import a symlink then we should let them do so. We'll assume that they were smart enough to first generate a blob to hold the link target, as that's how symlinks get represented in GIT. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	c90be46abd	Changed fast-import's pack header creation to use pack.h Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	c44cdc7eef	Converted fast-import to a text based protocol. Frontend clients can now send a text stream to fast-import rather than a binary stream. This should facilitate developing frontend software as the data stream is easier to view, manipulate and debug my hand and Mark-I eyeball. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:04 -05:00
Shawn O. Pearce	7111feede9	Implement blob ID validation in fast-import. When accepting revision SHA1 IDs from the frontend verify the SHA1 actually refers to a blob and is known to exist. Its an error to use a SHA1 in a tree if the blob doesn't exist as this would cause git-fsck-objects to report a missing blob should the pack get closed without the blob being appended into it or a subsequent pack. So right now we'll just ask that the frontend "pre-declare" any blobs it wants to use in a tree before it can use them. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	463acbe1c6	Added tree and commit writing to fast-import. The tree of the current commit can be altered by file_change commands before the commit gets written to the pack. The file changes are rather primitive as they simply allow removal of a tree entry or setting/adding a tree entry. Currently trees and commits aren't being deltafied when written to the pack and branch reloading from the current pack doesn't work, so at most 5 branches can be worked with at any one time. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	6bb5b3291d	Implemented branch handling and basic tree support in fast-import. This provides the basic data structures needed to store trees in memory while we are processing them for a branch. What we are attempting to do is track one complete tree for each branch that the frontend has registered with us through the 'newb' (new_branch) command. When the frontend edits that tree through 'updf' or 'delf' commands we'll mark the affected tree(s) as being dirty and recompute their objects during 'comt' (commit). Currently the protocol is decidedly _not_ user friendly. I crashed fast-import by giving it bad input data from Perl. I may try to improve upon it, or at least upon its error handling. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	6143f0644e	Added basic command handler to fast-import. Moved the new_blob logic off into a new subroutine and invoked it when getting the 'blob' command. Added statistics dump to STDERR when the program terminates listing what it did at a high level. This is somewhat interesting. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:03 -05:00
Shawn O. Pearce	ac47a738a7	Refactored fast-import's internals for future additions. Too many globals variables were being used not not enough code was resuable to process trees and commits so this is a simple refactoring of the existing blob processing code to get into a state that will be easier to handle trees and commits in. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:02 -05:00
Shawn O. Pearce	27d6d29035	Cleaned up memory allocation for object_entry structs. Although its easy to ask the user to tell us how many objects they will need, its probably better to dynamically grow the object table in large units. But if the user can give us a hint as to roughly how many objects then we can still use it during startup. Also stopped printing the SHA1 strings to stdout as no user is currently making use of that facility. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:02 -05:00
Shawn O. Pearce	8bcce30126	Added automatic index generation to fast-import. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:01 -05:00
Shawn O. Pearce	db5e523fdd	Created fast-import, a tool to quickly generating a pack from blobs. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-01-14 02:15:01 -05:00

... 3 4 5 6 7

329 Commits