git-commit-vandalism

Author	SHA1	Message	Date
Junio C Hamano	339aff0846	Merge branch 'jc/maint-lf-to-crlf-keep-crlf' * jc/maint-lf-to-crlf-keep-crlf: lf_to_crlf_filter(): resurrect CRLF->CRLF hack	2011-12-22 11:27:29 -08:00
Junio C Hamano	3bb8d69cdd	Merge branch 'cn/maint-lf-to-crlf-filter' into maint * cn/maint-lf-to-crlf-filter: lf_to_crlf_filter(): tell the caller we added "\n" when draining convert: track state in LF-to-CRLF filter	2011-12-21 11:42:44 -08:00
Junio C Hamano	8496f56873	lf_to_crlf_filter(): resurrect CRLF->CRLF hack The non-streaming version of the filter counts CRLF and LF in the whole buffer, and returns without doing anything when they match (i.e. what is recorded in the object store already uses CRLF). This was done to help people who added files from the DOS world before realizing they want to go cross platform and adding .gitattributes to tell Git that they only want CRLF in their working tree. The streaming version of the filter does not want to read the whole thing before starting to work, as that defeats the whole point of streaming. So we instead check what byte follows CR whenever we see one, and add CR before LF only when the LF does not immediately follow CR already to keep CRLF as is. Reported-and-tested-by: Ralf Thielow Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-18 20:40:41 -08:00
Junio C Hamano	87afe9a5ed	lf_to_crlf_filter(): tell the caller we added "\n" when draining This can only happen when the input size is multiple of the buffer size of the cascade filter (16k) and ends with an LF, but in such a case, the code forgot to tell the caller that it added the "\n" it could not add during the last round. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-12-16 14:39:37 -08:00
Carlos Martín Nieto	284e3d280e	convert: track state in LF-to-CRLF filter There may not be enough space to store CRLF in the output. If we don't fill the buffer, then the filter will keep getting called with the same short buffer and will loop forever. Instead, always store the CR and record whether there's a missing LF if so we store it in the output buffer the next time the function gets called. Reported-by: Henrik Grubbström <grubba@roxen.com> Signed-off-by: Carlos Martín Nieto <cmn@elego.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-28 11:30:34 -08:00
Ramsay Jones	ef563de6dd	convert.c: Fix return type of git_path_check_eol() The git_path_check_eol() function converts a string value to the corresponding 'enum eol' value. However, the function is currently declared to return an 'enum crlf_action', which causes sparse to complain thus: SP convert.c convert.c:736:50: warning: mixing different enum types convert.c:736:50: int enum crlf_action versus convert.c:736:50: int enum eol In order to suppress the warning, we simply correct the return type in the function declaration. Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-21 11:00:57 -08:00
Ramkumar Ramachandra	7356b51e4b	convert: don't mix enum with int Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-11-15 16:09:02 -08:00
Junio C Hamano	8a72864426	Merge branch 'tr/maint-ident-to-git-memmove' * tr/maint-ident-to-git-memmove: Use memmove in ident_to_git	2011-09-02 13:18:25 -07:00
Thomas Rast	7732118438	Use memmove in ident_to_git convert_to_git sets src=dst->buf if any of the preceding conversions actually did any work. Thus in ident_to_git we have to use memmove instead of memcpy as far as src->dst copying is concerned. Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-29 15:23:22 -07:00
Michael Haggerty	d932f4eb9f	Rename git_checkattr() to git_check_attr() Suggested by: Junio Hamano <gitster@pobox.com> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-08-04 15:53:21 -07:00
Junio C Hamano	a265a7f95e	streaming: filter cascading This implements an internal "cascade" filter mechanism that plugs two filters in series. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 16:47:15 -07:00
Junio C Hamano	b84c783917	streaming filter: ident filter Add support for "ident" filter on the output codepath. This does not work with lf-to-crlf filter together (yet). Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 16:47:15 -07:00
Junio C Hamano	e322ee38ad	Add LF-to-CRLF streaming conversion If we do not have to guess or validate by scanning the input, we can just stream this through. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 16:47:15 -07:00
Junio C Hamano	4ae6670444	stream filter: add "no more input" to the filters Some filters may need to buffer the input and look-ahead inside it to decide what to output, and they may consume more than zero bytes of input and still not produce any output. After feeding all the input, pass NULL as input as keep calling stream_filter() to let such filters know there is no more input coming, and it is time for them to produce the remaining output based on the buffered input. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 16:47:15 -07:00
Junio C Hamano	b6691092d7	Add streaming filter API This introduces an API to plug custom filters to an input stream. The caller gets get_stream_filter("path") to obtain an appropriate filter for the path, and then uses it when opening an input stream via open_istream(). After that, the caller can read from the stream with read_istream(), and close it with close_istream(), just like an unfiltered stream. This only adds a "null" filter that is a pass-thru filter, but later changes can add LF-to-CRLF and other filters, and the callers of the streaming API do not have to change. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-26 16:47:15 -07:00
Junio C Hamano	b0d9c69f5e	convert: CRLF_INPUT is a no-op in the output codepath Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 23:16:53 -07:00
Junio C Hamano	dd8e912190	streaming_write_entry(): use streaming API in write_entry() When the output to a path does not have to be converted, we can read from the object database from the streaming API and write to the file in the working tree, without having to hold everything in the memory. The ident, auto- and safe- crlf conversions inherently require you to read the whole thing before deciding what to do, so while it is technically possible to support them by using a buffer of an unbound size or rewinding and reading the stream twice, it is less practical than the traditional "read the whole thing in core and convert" approach. Adding streaming filters for the other conversions on top of this should be doable by tweaking the can_bypass_conversion() function (it should be renamed to can_filter_stream() when it happens). Then the streaming API can be extended to wrap the git_istream streaming_write_entry() opens on the underlying object in another git_istream that reads from it, filters what is read, and let the streaming_write_entry() read the filtered result. But that is outside the scope of this series. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-20 18:46:58 -07:00
Junio C Hamano	3bfba20dae	convert: make it harder to screw up adding a conversion attribute The current internal API requires the callers of setup_convert_check() to supply the git_attr_check structures (hence they need to know how many to allocate), but they grab the same set of attributes for given path. Define a new convert_attrs() API that fills a higher level information that the callers (convert_to_git and convert_to_working_tree) really want, and move the common code to interact with the attributes system to it. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 14:59:09 -07:00
Junio C Hamano	83295964b3	convert: make it safer to add conversion attributes The places that need to pass an array of "struct git_attr_check" needed to be careful to pass a large enough array and know what index each element lied. Make it safer and easier to code these. Besides, the hard-coded sequence of initializing various attributes was too ugly after we gained more than a few attributes. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 14:59:09 -07:00
Junio C Hamano	c61dcff9d6	convert: give saner names to crlf/eol variables, types and functions Back when the conversion was only about the end-of-line convention, it might have made sense to call what we do upon seeing CR/LF simply an "action", but these days the conversion routines do a lot more than just tweaking the line ending. Raname "action" to "crlf_action". The function that decides what end of line conversion to use on the output codepath was called "determine_output_conversion", as if there is no other kind of output conversion. Rename it to "output_eol"; it is a function that returns what EOL convention is to be used. A function that decides what "crlf_action" needs to be used on the input codepath, given what conversion attribute is set to the path and global end-of-line convention, was called "determine_action". Rename it to "input_crlf_action". Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 14:59:09 -07:00
Junio C Hamano	ec70f52f6f	convert: rename the "eol" global variable to "core_eol" Yes, it is clear that "eol" wants to mean some sort of end-of-line thing, but as the name of a global variable, it is way too short to describe what kind of end-of-line thing it wants to represent. Besides, there are many codepaths that want to use their own local "char *eol" variable to point at the end of the current line they are processing. This global variable holds what we read from core.eol configuration variable. Name it as such. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-05-09 14:58:52 -07:00
Jonathan Nieder	c9b6782a08	enums: omit trailing comma for portability Since v1.7.2-rc0~23^2~2 (Add per-repository eol normalization, 2010-05-19), building with gcc -std=gnu89 -pedantic produces warnings like the following: convert.c:21:11: warning: comma at end of enumerator list [-pedantic] gcc is right to complain --- these commas are not permitted in C89. In the spirit of v1.7.2-rc0~32^2~16 (2010-05-14), remove them. Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2011-03-16 12:31:32 -07:00
Pete Wyckoff	a2b665de4b	convert filter: supply path to external driver Filtering to support keyword expansion may need the name of the file being filtered. In particular, to support p4 keywords like $File: //depot/product/dir/script.sh $ the smudge filter needs to know the name of the file it is smudging. Allow "%f" in the custom filter command line specified in the configuration. This will be substituted by the filename inside a single-quote pair to be passed to the shell. Signed-off-by: Pete Wyckoff <pw@padd.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-12-22 10:19:32 -08:00
Eyvind Bernhardsen	43dd233285	Don't expand CRLFs when normalizing text during merge Disable CRLF expansion when convert_to_working_tree() is called from normalize_buffer(). This improves performance when merging branches with conflicting line endings when core.eol=crlf or core.autocrlf=true by making the normalization act as if core.eol=lf. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-07-02 15:45:18 -07:00
Eyvind Bernhardsen	f217f0e86d	Avoid conflicts when merging branches with mixed normalization Currently, merging across changes in line ending normalization is painful since files containing CRLF will conflict with normalized files, even if the only difference between the two versions is the line endings. Additionally, any "real" merge conflicts that exist are obscured because every line in the file has a conflict. Assume you start out with a repo that has a lot of text files with CRLF checked in (A): o---C / \ A---B---D B: Add "* text=auto" to .gitattributes and normalize all files to LF-only C: Modify some of the text files D: Try to merge C You will get a ridiculous number of LF/CRLF conflicts when trying to merge C into D, since the repository contents for C are "wrong" wrt the new .gitattributes file. Fix ll-merge so that the "base", "theirs" and "ours" stages are passed through convert_to_worktree() and convert_to_git() before a three-way merge. This ensures that all three stages are normalized in the same way, removing from consideration differences that are only due to normalization. This feature is optional for now since it changes a low-level mechanism and is not necessary for the majority of users. The "merge.renormalize" config variable enables it. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-07-02 15:43:15 -07:00
Junio C Hamano	d5cff17eda	Merge branch 'eb/core-eol' * eb/core-eol: Add "core.eol" config variable Rename the "crlf" attribute "text" Add per-repository eol normalization Add tests for per-repository eol normalization Conflicts: Documentation/config.txt Makefile	2010-06-21 06:02:49 -07:00
Junio C Hamano	d249515f29	Merge branch 'fg/autocrlf' * fg/autocrlf: autocrlf: Make it work also for un-normalized repositories	2010-06-21 06:02:47 -07:00
Junio C Hamano	8d676d85f7	Merge branch 'gv/portable' * gv/portable: test-lib: use DIFF definition from GIT-BUILD-OPTIONS build: propagate $DIFF to scripts Makefile: Tru64 portability fix Makefile: HP-UX 10.20 portability fixes Makefile: HPUX11 portability fixes Makefile: SunOS 5.6 portability fix inline declaration does not work on AIX Allow disabling "inline" Some platforms lack socklen_t type Make NO_{INET_NTOP,INET_PTON} configured independently Makefile: some platforms do not have hstrerror anywhere git-compat-util.h: some platforms with mmap() lack MAP_FAILED definition test_cmp: do not use "diff -u" on platforms that lack one fixup: do not unconditionally disable "diff -u" tests: use "test_cmp", not "diff", when verifying the result Do not use "diff" found on PATH while building and installing enums: omit trailing comma for portability Makefile: -lpthread may still be necessary when libc has only pthread stubs Rewrite dynamic structure initializations to runtime assignment Makefile: pass CPPFLAGS through to fllow customization Conflicts: Makefile wt-status.h	2010-06-21 06:02:44 -07:00
Eyvind Bernhardsen	942e774767	Add "core.eol" config variable Introduce a new configuration variable, "core.eol", that allows the user to set which line endings to use for end-of-line-normalized files in the working directory. It defaults to "native", which means CRLF on Windows and LF everywhere else. Note that "core.autocrlf" overrides core.eol. This means that [core] autocrlf = true puts CRLFs in the working directory even if core.eol is set to "lf". Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-06-06 21:20:04 -07:00
Gary V. Vaughan	66dbfd55e3	Rewrite dynamic structure initializations to runtime assignment Unfortunately, there are still plenty of production systems with vendor compilers that choke unless all compound declarations can be determined statically at compile time, for example hpux10.20 (I can provide a comprehensive list of our supported platforms that exhibit this problem if necessary). This patch simply breaks apart any compound declarations with dynamic initialisation expressions, and moves the initialisation until after the last declaration in the same block, in all the places necessary to have the offending compilers accept the code. Signed-off-by: Gary V. Vaughan <gary@thewrittenword.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-31 16:59:26 -07:00
Eyvind Bernhardsen	5ec3e67052	Rename the "crlf" attribute "text" As discussed on the list, "crlf" is not an optimal name. Linus suggested "text", which is much better. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-19 20:42:34 -07:00
Eyvind Bernhardsen	fd6cce9e89	Add per-repository eol normalization Change the semantics of the "crlf" attribute so that it enables end-of-line normalization when it is set, regardless of "core.autocrlf". Add a new setting for "crlf": "auto", which enables end-of-line conversion but does not override the automatic text file detection. Add a new attribute "eol" with possible values "crlf" and "lf". When set, this attribute enables normalization and forces git to use CRLF or LF line endings in the working directory, respectively. The line ending style to be used for normalized text files in the working directory is set using "core.autocrlf". When it is set to "true", CRLFs are used in the working directory; when set to "input" or "false", LFs are used. Signed-off-by: Eyvind Bernhardsen <eyvind.bernhardsen@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-19 20:36:15 -07:00
Finn Arne Gangstad	c4805393d7	autocrlf: Make it work also for un-normalized repositories Previously, autocrlf would only work well for normalized repositories. Any text files that contained CRLF in the repository would cause problems, and would be modified when handled with core.autocrlf set. Change autocrlf to not do any conversions to files that in the repository already contain a CR. git with autocrlf set will never create such a file, or change a LF only file to contain CRs, so the (new) assumption is that if a file contains a CR, it is intentional, and autocrlf should not change that. The following sequence should now always be a NOP even with autocrlf set (assuming a clean working directory): git checkout <something> touch * git add -A . (will add nothing) git commit (nothing to commit) Previously this would break for any text file containing a CR. Some of you may have been folowing Eyvind's excellent thread about trying to make end-of-line translation in git a bit smoother. I decided to attack the problem from a different angle: Is it possible to make autocrlf behave non-destructively for all the previous problem cases? Stealing the problem from Eyvind's initial mail (paraphrased and summarized a bit): 1. Setting autocrlf globally is a pain since autocrlf does not work well with CRLF in the repo 2. Setting it in individual repos is hard since you do it "too late" (the clone will get it wrong) 3. If someone checks in a file with CRLF later, you get into problems again 4. If a repository once has contained CRLF, you can't tell autocrlf at which commit everything is sane again 5. autocrlf does needless work if you know that all your users want the same EOL style. I belive that this patch makes autocrlf a safe (and good) default setting for Windows, and this solves problems 1-4 (it solves 2 by being set by default, which is early enough for clone). I implemented it by looking for CR charactes in the index, and aborting any conversion attempt if this is found. Signed-off-by: Finn Arne Gangstad <finag@pvv.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-05-11 23:02:49 -07:00
Henrik Grubbström	07814d9009	convert: Keep foreign $Id$ on checkout. If there are foreign $Id$ keywords in the repository, they are most likely there for a reason. Let's keep them on checkout (which is also what the documentation indicates). Foreign $Id$ keywords are now recognized by there being multiple space separated fields in $Id:xxxxx$. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-10 21:45:01 -07:00
Henrik Grubbström	a9f3049f6c	convert: Safer handling of $Id$ contraction. The code to contract $Id:xxxxx$ strings could eat an arbitrary amount of source text if the terminating $ was lost. It now refuses to contract $Id:xxxxx$ strings spanning multiple lines. Signed-off-by: Henrik Grubbström <grubba@grubba.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-04-10 21:45:00 -07:00
Junio C Hamano	76d44c8cfd	Merge branch 'sp/maint-push-sideband' into sp/push-sideband * sp/maint-push-sideband: receive-pack: Send hook output over side band #2 receive-pack: Wrap status reports inside side-band-64k receive-pack: Refactor how capabilities are shown to the client send-pack: demultiplex a sideband stream with status data run-command: support custom fd-set in async run-command: Allow stderr to be a caller supplied pipe Update git fsck --full short description to mention packs Conflicts: run-command.c	2010-02-05 21:08:53 -08:00
Erik Faye-Lund	ae6a5609c0	run-command: support custom fd-set in async This patch adds the possibility to supply a set of non-0 file descriptors for async process communication instead of the default-created pipe. Additionally, we now support bi-directional communiction with the async procedure, by giving the async function both read and write file descriptors. To retain compatiblity and similar "API feel" with start_command, we require start_async callers to set .out = -1 to get a readable file descriptor. If either of .in or .out is 0, we supply no file descriptor to the async process. [sp: Note: Erik started this patch, and a huge bulk of it is his work. All bugs were introduced later by Shawn.] Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-02-05 20:57:22 -08:00
Junio C Hamano	06dbc1ea57	Merge branch 'jc/conflict-marker-size' * jc/conflict-marker-size: rerere: honor conflict-marker-size attribute rerere: prepare for customizable conflict marker length conflict-marker-size: new attribute rerere: use ll_merge() instead of using xdl_merge() merge-tree: use ll_merge() not xdl_merge() xdl_merge(): allow passing down marker_size in xmparam_t xdl_merge(): introduce xmparam_t for merge specific parameters git_attr(): fix function signature Conflicts: builtin-merge-file.c ll-merge.c xdiff/xdiff.h xdiff/xmerge.c	2010-01-20 20:28:51 -08:00
Junio C Hamano	7fb0eaa289	git_attr(): fix function signature The function took (name, namelen) as its arguments, but all the public callers wanted to pass a full string. Demote the counted-string interface to an internal API status, and allow public callers to just pass the string to the function. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-16 20:39:59 -08:00
Jeff King	ac0ba18df0	run-command: convert simple callsites to use_shell Now that we have the use_shell feature, these callsites can all be converted with small changes. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2010-01-05 23:41:50 -08:00
Johannes Sixt	5709e0363a	run_command: return exit code as positive value As a general guideline, functions in git's code return zero to indicate success and negative values to indicate failure. The run_command family of functions followed this guideline. But there are actually two different kinds of failure: - failures of system calls; - non-zero exit code of the program that was run. Usually, a non-zero exit code of the program is a failure and means a failure to the caller. Except that sometimes it does not. For example, the exit code of merge programs (e.g. external merge drivers) conveys information about how the merge failed, and not all exit calls are actually failures. Furthermore, the return value of run_command is sometimes used as exit code by the caller. This change arranges that the exit code of the program is returned as a positive value, which can now be regarded as the "result" of the function. System call failures continue to be reported as negative values. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2009-07-05 12:16:27 -07:00
Brandon Casey	f285a2d7ed	Replace calls to strbuf_init(&foo, 0) with STRBUF_INIT initializer Many call sites use strbuf_init(&foo, 0) to initialize local strbuf variable "foo" which has not been accessed since its declaration. These can be replaced with a static initialization using the STRBUF_INIT macro which is just as readable, saves a function call, and takes up fewer lines. Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2008-10-12 12:36:19 -07:00
Dmitry Kakurin	f9dd4bf4e5	Fixed text file auto-detection: treat EOF character 032 at the end of file as printable Signed-off-by: Dmitry Kakurin <Dmitry.Kakurin@gmail.com> Signed-off-by: Steffen Prohaska <prohaska@zib.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-07-11 21:14:27 -07:00
Brian Hetro	cd8be6c9b6	convert.c: Use 'git_config_string' to get 'smudge' and 'clean' Signed-off-by: Brian Hetro <whee@smaertness.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-07-05 17:42:30 -07:00
Johannes Schindelin	ef90d6d420	Provide git_config with a callback-data parameter git_config() only had a function parameter, but no callback data parameter. This assumes that all callback functions only modify global variables. With this patch, every callback gets a void * parameter, and it is hoped that this will help the libification effort. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-05-14 12:34:44 -07:00
Junio C Hamano	2ac4b4b222	Merge branch 'sp/safecrlf' * sp/safecrlf: safecrlf: Add mechanism to warn about irreversible crlf conversions	2008-02-16 17:59:20 -08:00
Junio C Hamano	a7269e5cb7	convert.c: guard config parser from value=NULL filter..smudge and filter..clean configuration variables expect a string value. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-02-11 13:11:36 -08:00
Steffen Prohaska	21e5ad50fc	safecrlf: Add mechanism to warn about irreversible crlf conversions CRLF conversion bears a slight chance of corrupting data. autocrlf=true will convert CRLF to LF during commit and LF to CRLF during checkout. A file that contains a mixture of LF and CRLF before the commit cannot be recreated by git. For text files this is the right thing to do: it corrects line endings such that we have only LF line endings in the repository. But for binary files that are accidentally classified as text the conversion can corrupt data. If you recognize such corruption early you can easily fix it by setting the conversion type explicitly in .gitattributes. Right after committing you still have the original file in your work tree and this file is not yet corrupted. You can explicitly tell git that this file is binary and git will handle the file appropriately. Unfortunately, the desired effect of cleaning up text files with mixed line endings and the undesired effect of corrupting binary files cannot be distinguished. In both cases CRLFs are removed in an irreversible way. For text files this is the right thing to do because CRLFs are line endings, while for binary files converting CRLFs corrupts data. This patch adds a mechanism that can either warn the user about an irreversible conversion or can even refuse to convert. The mechanism is controlled by the variable core.safecrlf, with the following values: - false: disable safecrlf mechanism - warn: warn about irreversible conversions - true: refuse irreversible conversions The default is to warn. Users are only affected by this default if core.autocrlf is set. But the current default of git is to leave core.autocrlf unset, so users will not see warnings unless they deliberately chose to activate the autocrlf mechanism. The safecrlf mechanism's details depend on the git command. The general principles when safecrlf is active (not false) are: - we warn/error out if files in the work tree can modified in an irreversible way without giving the user a chance to backup the original file. - for read-only operations that do not modify files in the work tree we do not not print annoying warnings. There are exceptions. Even though... - "git add" itself does not touch the files in the work tree, the next checkout would, so the safety triggers; - "git apply" to update a text file with a patch does touch the files in the work tree, but the operation is about text files and CRLF conversion is about fixing the line ending inconsistencies, so the safety does not trigger; - "git diff" itself does not touch the files in the work tree, it is often run to inspect the changes you intend to next "git add". To catch potential problems early, safety triggers. The concept of a safety check was originally proposed in a similar way by Linus Torvalds. Thanks to Dimitry Potapov for insisting on getting the naked LF/autocrlf=true case right. Signed-off-by: Steffen Prohaska <prohaska@zib.de>	2008-02-06 13:07:28 -08:00
Dmitry Potapov	28624193b2	treat any file with NUL as binary There are two heuristics in Git to detect whether a file is binary or text. One in xdiff-interface.c (which is taken from GNU diff) relies on existence of the NUL byte at the beginning. However, convert.c used a different heuristic, which relied on the percent of non-printable symbols (less than 1% for text files). Due to differences in detection whether a file is binary or not, it was possible that a file that diff treats as binary could be treated as text by CRLF conversion. This is very confusing for a user who sees that 'git diff' shows the file as binary expects it to be added as binary. This patch makes is_binary to consider any file that contains at least one NUL character as binary, to ensure that the heuristics used for CRLF conversion is tighter than what is used by diff. Signed-off-by: Dmitry Potapov <dpotapov@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2008-01-16 09:10:34 -08:00
Johannes Sixt	546bb58232	Use the asyncronous function infrastructure to run the content filter. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-10-21 01:30:42 -04:00
Johannes Sixt	7683b6e81f	Avoid a dup2(2) in apply_filter() - start_command() can do it for us. When apply_filter() runs the external (clean or smudge) filter program, it needs to pass the writable end of a pipe as its stdout. For this purpose, it used to dup2(2) the file descriptor explicitly to stdout. Now we use the facilities of start_command() to do it for us. Furthermore, the path argument of a subordinate function, filter_buffer(), was not used, so here we replace it to pass the fd instead. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-10-21 01:30:42 -04:00
Johannes Sixt	dc1bfdcd1a	Use start_command() to run content filters instead of explicit fork/exec. The previous code already used finish_command() to wait for the process to terminate, but did not use start_command() to run it. Signed-off-by: Johannes Sixt <johannes.sixt@telecom.at> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-10-21 01:30:39 -04:00
Pierre Habouzit	90d16ec032	Fix in-place editing functions in convert.c * crlf_to_git and ident_to_git: Don't grow the buffer if there is enough space in the first place. As a side effect, when the editing is done "in place", we don't grow, so the buffer pointer doesn't changes, and `src' isn't invalidated anymore. Thanks to Bernt Hansen for the bug report. * apply_filter: Fix memory leak due to fake in-place editing that didn't collected the old buffer when the filter succeeds. Also a cosmetic fix. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Lars Hjemli <hjemli@gmail.com> Signed-off-by: Shawn O. Pearce <spearce@spearce.org>	2007-10-15 21:38:09 -04:00
Pierre Habouzit	b315c5c081	strbuf change: be sure ->buf is never ever NULL. For that purpose, the ->buf is always initialized with a char * buf living in the strbuf module. It is made a char * so that we can sloppily accept things that perform: sb->buf[0] = '\0', and because you can't pass "" as an initializer for ->buf without making gcc unhappy for very good reasons. strbuf_init/_detach/_grow have been fixed to trust ->alloc and not ->buf anymore. as a consequence strbuf_detach is _mandatory_ to detach a buffer, copying ->buf isn't an option anymore, if ->buf is going to escape from the scope, and eventually be free'd. API changes: * strbuf_setlen now always works, so just make strbuf_reset a convenience macro. * strbuf_detatch takes a size_t* optional argument (meaning it can be NULL) to copy the buffer's len, as it was needed for this refactor to make the code more readable, and working like the callers. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-29 02:13:33 -07:00
Pierre Habouzit	182af8343c	Use xmemdupz() in many places. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-18 17:42:17 -07:00
Pierre Habouzit	ba3ed09728	Now that cache.h needs strbuf.h, remove useless includes. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-16 17:30:03 -07:00
Pierre Habouzit	5ecd293d14	Rewrite convert_to_{git,working_tree} to use strbuf's. * Now, those functions take an "out" strbuf argument, where they store their result if any. In that case, it also returns 1, else it returns 0. * those functions support "in place" editing, in the sense that it's OK to call them this way: convert_to_git(path, sb->buf, sb->len, sb); When doable, conversions are done in place for real, else the strbuf content is just replaced with the new one, transparentely for the caller. If you want to create a new filter working this way, being the accumulation of filter1, filter2, ... filtern, then your meta_filter would be: int meta_filter(..., const char src, size_t len, struct strbuf sb) { int ret = 0; ret \|= filter1(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } ret \|= filter2(...., src, len, sb); if (ret) { src = sb->buf; len = sb->len; } .... return ret \| filtern(..., src, len, sb); } That's why subfilters the convert_to_* functions called were also rewritten to work this way. Signed-off-by: Pierre Habouzit <madcoder@debian.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-16 17:30:03 -07:00
René Scharfe	89b4256cfb	Remove unused function convert_sha1_file() convert_sha1_file() became unused by the previous patch -- remove it. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2007-09-03 16:46:23 -07:00
Andy Parkins	c23290d528	Fix mishandling of $Id$ expanded in the repository copy in convert.c If the repository contained an expanded ident keyword (i.e. $Id:XXXX$), then the wrong bytes were discarded, and the Id keyword was not expanded. The fault was in convert.c:ident_to_worktree(). Previously, when a "$Id:" was found in the repository version, ident_to_worktree() would search for the next "$" after this, and discarded everything it found until then. That was done with the loop: do { ch = cp++; if (ch == '$') break; rem--; } while (rem); The above loop left cp pointing one character _after_ the final "$" (because of ch = cp++). This was different from the non-expanded case, were cp is left pointing at the "$", and was different from the comment which stated "discard up to but not including the closing $". This patch fixes that by making the loop: do { ch = *cp; if (ch == '$') break; cp++; rem--; } while (rem); That is, cp is tested _then_ incremented. This loop exits if it finds a "$" or if it runs out of bytes in the source. After this loop, if there was no closing "$" the expansion is skipped, and the outer loop is allowed to continue leaving this non-keyword as it was. However, when the "$" is found, size is corrected, before running the expansion: size -= (cp - src); This is wrong; size is going to be corrected anyway after the expansion, so there is no need to do it here. This patch removes that redundant correction. To help find this bug, I heavily commented the routine; those comments are included here as a bonus. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-26 01:12:43 -07:00
Andy Parkins	760f0c62ef	Fix crlf attribute handling to match documentation gitattributes.txt says, of the crlf attribute: Set:: Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. That is to say that the crlf attribute does not force the file to have CRLF line endings, instead it removes the autocrlf guesswork and forces the file to be treated as text. Then, whatever line ending is defined by the autocrlf setting is applied. However, that is not what convert.c was doing. The conversion to CRLF was being skipped in crlf_to_worktree() when the following condition was true: action == CRLF_GUESS && auto_crlf <= 0 That is to say conversion took place when not in guess mode (crlf attribute not specified) or core.autocrlf set to true. This was wrong. It meant that the crlf attribute being on for a given file _forced_ CRLF conversion, when actually it should force the file to be treated as text, and converted accordingly. The real test should simply be auto_crlf <= 0 That is to say, if core.autocrlf is falsei (or input), conversion from LF to CRLF is never done. When core.autocrlf is true, conversion from LF to CRLF is done only when in CRLF_GUESS (and the guess is "text"), or CRLF_TEXT mode. Similarly for crlf_to_worktree(), if core.autocrlf is false, no conversion should _ever_ take place. In reality it was only not taking place if core.autocrlf was false _and_ the crlf attribute was unspecified. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-18 17:02:47 -07:00
René Scharfe	5e6cfc80e2	git-archive: convert archive entries like checkouts do As noted by Johan Herland, git-archive is a kind of checkout and needs to apply any checkout filters that might be configured. This patch adds the convenience function convert_sha1_file which returns a buffer containing the object's contents, after converting, if necessary (i.e. it's a combination of read_sha1_file and convert_to_working_tree). Direct calls to read_sha1_file in git-archive are then replaced by calls to convert_sha1_file. Since convert_sha1_file expects its path argument to be NUL-terminated -- a convention it inherits from convert_to_working_tree -- the patch also changes the path handling in archive-tar.c to always NUL-terminate the string. It used to solely rely on the len field of struct strbuf before. archive-zip.c already NUL-terminates the path and thus needs no such change. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-18 16:36:45 -07:00
Andy Parkins	af9b54bb2c	Use $Id$ as the ident attribute keyword rather than $ident$ to be consistent with other VCSs $Id$ is present already in SVN and CVS; it would mean that people converting their existing repositories won't have to make any changes to the source files should they want to make use of the ident attribute. Given that it's a feature that's meant to calm those very people, it seems obtuse to make them edit every file just to make use of it. I think that bzr uses $Id$; Mercurial has examples hooks for $Id$; monotone has $Id$ on its wishlist. I can't think of a good reason not to stick with the de-facto standard and call ours $Id$ instead of $ident$. Signed-off-by: Andy Parkins <andyparkins@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-05-14 19:03:32 -07:00
Junio C Hamano	aa4ed402c9	Add 'filter' attribute and external filter driver definition. The interface is similar to the custom low-level merge drivers. First you configure your filter driver by defining 'filter.<name>.' variables in the configuration. filter.<name>.clean filter command to run upon checkin filter.<name>.smudge filter command to run upon checkout Then you assign filter attribute to each path, whose name matches the custom filter driver's name. Example: (in .gitattributes) .c filter=indent (in config) [filter "indent"] clean = indent smudge = cat Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-24 22:38:51 -07:00
Junio C Hamano	3fed15f568	Add 'ident' conversion. The 'ident' attribute set to path squashes "$ident:<any bytes except dollor sign>$" to "$ident$" upon checkin, and expands it to "$ident: <blob SHA-1> $" upon checkout. As we have two conversions that affect checkin/checkout paths, clarify how they interact with each other. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-24 22:38:51 -07:00
Alex Riesen	67e22ed58f	Fix a typo in crlf conversion code Also, noticed by valgrind: the code caused a read out-of-bounds. Some comments updated as well (they still reflected old calling conventions). Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-22 10:44:38 -07:00
Junio C Hamano	6073ee8571	convert.c: restructure the attribute checking part. This separates the checkattr() call and interpretation of the returned value specific to the 'crlf' attribute into separate routines, so that we can run a single call to checkattr() to check for more than one attributes, and then interprete what the returned settings mean separately. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-21 11:55:23 -07:00
Alex Riesen	ac78e54804	Simplify calling of CR/LF conversion routines Signed-off-by: Alex Riesen <raa.lkml@gmail.com> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-20 23:24:34 -07:00
Junio C Hamano	163b959194	Update 'crlf' attribute semantics. This updates the semantics of 'crlf' so that .gitattributes file can say "this is text, even though it may look funny". Setting the `crlf` attribute on a path is meant to mark the path as a "text" file. 'core.autocrlf' conversion takes place without guessing the content type by inspection. Unsetting the `crlf` attribute on a path is meant to mark the path as a "binary" file. The path never goes through line endings conversion upon checkin/checkout. Unspecified `crlf` attribute tells git to apply the `core.autocrlf` conversion when the file content looks like text. Setting the `crlf` attribut to string value "input" is similar to setting the attribute to `true`, but also forces git to act as if `core.autocrlf` is set to `input` for the path. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-19 22:37:44 -07:00
Junio C Hamano	a5e92abde6	Fix funny types used in attribute value representation It was bothering me a lot that I abused small integer values casted to (void ) to represent non string values in gitattributes. This corrects it by making the type of attribute values (const char ), and using the address of a few statically allocated character buffer to denote true/false. Unset attributes are represented as having NULLs as their values. Added in-header documentation to explain how git_checkattr() routine should be called. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-18 16:17:13 -07:00
Junio C Hamano	515106fa13	Allow more than true/false to attributes. This allows you to define three values (and possibly more) to each attribute: true, false, and unset. Typically the handlers that notice and act on attribute values treat "unset" attribute to mean "do your default thing" (e.g. crlf that is unset would trigger "guess from contents"), so being able to override a setting to an unset state is actually useful. - If you want to set the attribute value to true, have an entry in .gitattributes file that mentions the attribute name; e.g. .o binary - If you want to set the attribute value explicitly to false, use '-'; e.g. .a -diff - If you want to make the attribute value _unset_, perhaps to override an earlier entry, use '!'; e.g. *.a -diff c.i.a !diff This also allows string values to attributes, with the natural syntax: attrname=attrvalue but you cannot use it, as nobody takes notice and acts on it yet. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-17 01:04:59 -07:00
Junio C Hamano	201ac8efc7	Fix 'crlf' attribute semantics. Earlier we said 'crlf lets the path go through core.autocrlf process while !crlf disables it altogether'. This fixes the semantics to: - Lack of 'crlf' attribute makes core.autocrlf to apply (i.e. we guess based on the contents and if platform expresses its desire to have CRLF line endings via core.autocrlf, we do so). - Setting 'crlf' attribute to true forces CRLF line endings in working tree files, even if blob does not look like text (e.g. contains NUL or other bytes we consider binary). - Setting 'crlf' attribute to false disables conversion. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-15 13:35:45 -07:00
Junio C Hamano	35ebfd6a0c	Define 'crlf' attribute. This defines the semantics of 'crlf' attribute as an example. When a path has this attribute unset (i.e. '!crlf'), autocrlf line-end conversion is not applied. Eventually we would want to let users to build a pipeline of processing to munge blob data to filesystem format (and in the other direction) based on combination of attributes, and at that point the mechanism in convert_to_{git,working_tree}() that looks at 'crlf' attribute needs to be enhanced. Perhaps the existing 'crlf' would become the first step in the input chain, and the last step in the output chain. Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-04-14 08:57:06 -07:00
Linus Torvalds	d7f4633405	Make AutoCRLF ternary variable. This allows you to do: [core] AutoCRLF = input and it should do only the CRLF->LF translation (ie it simplifies CRLF only when reading working tree files, but when checking out files, it leaves the LF alone, and doesn't turn it into a CRLF). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-14 11:19:28 -08:00
Linus Torvalds	6c510bee20	Lazy man's auto-CRLF It currently does NOT know about file attributes, so it does its conversion purely based on content. Maybe that is more in the "git philosophy" anyway, since content is king, but I think we should try to do the file attributes to turn it off on demand. Anyway, BY DEFAULT it is off regardless, because it requires a [core] AutoCRLF = true in your config file to be enabled. We could make that the default for Windows, of course, the same way we do some other things (filemode etc). But you can actually enable it on UNIX, and it will cause: - "git update-index" will write blobs without CRLF - "git diff" will diff working tree files without CRLF - "git checkout" will write files to the working tree _with_ CRLF and things work fine. Funnily, it actually shows an odd file in git itself: git clone -n git test-crlf cd test-crlf git config core.autocrlf true git checkout git diff shows a diff for "Documentation/docbook-xsl.css". Why? Because we have actually checked in that file with CRLF! So when "core.autocrlf" is true, we'll always generate a different hash for it in the index, because the index hash will be for the content _without_ CRLF. Is this complete? I dunno. It seems to work for me. It doesn't use the filename at all right now, and that's probably a deficiency (we could certainly make the "is_binary()" heuristics also take standard filename heuristics into account). I don't pass in the filename at all for the "index_fd()" case (git-update-index), so that would need to be passed around, but this actually works fine. NOTE NOTE NOTE! The "is_binary()" heuristics are totally made-up by yours truly. I will not guarantee that they work at all reasonable. Caveat emptor. But it _is_ simple, and it _is_ safe, since it's all off by default. The patch is pretty simple - the biggest part is the new "convert.c" file, but even that is really just basic stuff that anybody can write in "Teaching C 101" as a final project for their first class in programming. Not to say that it's bug-free, of course - but at least we're not talking about rocket surgery here. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <junkio@cox.net>	2007-02-14 11:19:22 -08:00

1 2 3 4

174 Commits