git-commit-vandalism/run-command.c

1203 lines
26 KiB
C
Raw Normal View History

#include "cache.h"
#include "run-command.h"
#include "exec_cmd.h"
#include "sigchain.h"
#include "argv-array.h"
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 01:04:10 +01:00
#include "thread-utils.h"
#include "strbuf.h"
void child_process_init(struct child_process *child)
{
memset(child, 0, sizeof(*child));
argv_array_init(&child->args);
argv_array_init(&child->env_array);
}
void child_process_clear(struct child_process *child)
{
argv_array_clear(&child->args);
argv_array_clear(&child->env_array);
}
struct child_to_clean {
pid_t pid;
struct child_to_clean *next;
};
static struct child_to_clean *children_to_clean;
static int installed_child_cleanup_handler;
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
static void cleanup_children(int sig, int in_signal)
{
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
kill(p->pid, sig);
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
if (!in_signal)
free(p);
}
}
static void cleanup_children_on_signal(int sig)
{
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
cleanup_children(sig, 1);
sigchain_pop(sig);
raise(sig);
}
static void cleanup_children_on_exit(void)
{
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
cleanup_children(SIGTERM, 0);
}
static void mark_child_for_cleanup(pid_t pid)
{
struct child_to_clean *p = xmalloc(sizeof(*p));
p->pid = pid;
p->next = children_to_clean;
children_to_clean = p;
if (!installed_child_cleanup_handler) {
atexit(cleanup_children_on_exit);
sigchain_push_common(cleanup_children_on_signal);
installed_child_cleanup_handler = 1;
}
}
static void clear_child_for_cleanup(pid_t pid)
{
struct child_to_clean **pp;
for (pp = &children_to_clean; *pp; pp = &(*pp)->next) {
struct child_to_clean *clean_me = *pp;
if (clean_me->pid == pid) {
*pp = clean_me->next;
free(clean_me);
return;
}
}
}
static inline void close_pair(int fd[2])
{
close(fd[0]);
close(fd[1]);
}
#ifndef GIT_WINDOWS_NATIVE
static inline void dup_devnull(int to)
{
int fd = open("/dev/null", O_RDWR);
if (fd < 0)
die_errno(_("open /dev/null failed"));
if (dup2(fd, to) < 0)
die_errno(_("dup2(%d,%d) failed"), fd, to);
close(fd);
}
Windows: avoid the "dup dance" when spawning a child process When stdin, stdout, or stderr must be redirected for a child process that on Windows is spawned using one of the spawn() functions of Microsoft's C runtime, then there is no choice other than to 1. make a backup copy of fd 0,1,2 with dup 2. dup2 the redirection source fd into 0,1,2 3. spawn 4. dup2 the backup back into 0,1,2 5. close the backup copy and the redirection source We used this idiom as well -- but we are not using the spawn() functions anymore! Instead, we have our own implementation. We had hardcoded that stdin, stdout, and stderr of the child process were inherited from the parent's fds 0, 1, and 2. But we can actually specify any fd. With this patch, the fds to inherit are passed from start_command()'s WIN32 section to our spawn implementation. This way, we can avoid the backup copies of the fds. The backup copies were a bug waiting to surface: The OS handles underlying the dup()ed fds were inherited by the child process (but were not associated with a file descriptor in the child). Consequently, the file or pipe represented by the OS handle remained open even after the backup copy was closed in the parent process until the child exited. Since our implementation of pipe() creates non-inheritable OS handles, we still dup() file descriptors in start_command() because dup() happens to create inheritable duplicates. (A nice side effect is that the fd cleanup in start_command is the same for Windows and Unix and remains unchanged.) Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-15 21:12:18 +01:00
#endif
static char *locate_in_PATH(const char *file)
{
const char *p = getenv("PATH");
struct strbuf buf = STRBUF_INIT;
if (!p || !*p)
return NULL;
while (1) {
const char *end = strchrnul(p, ':');
strbuf_reset(&buf);
/* POSIX specifies an empty entry as the current directory. */
if (end != p) {
strbuf_add(&buf, p, end - p);
strbuf_addch(&buf, '/');
}
strbuf_addstr(&buf, file);
if (!access(buf.buf, F_OK))
return strbuf_detach(&buf, NULL);
if (!*end)
break;
p = end + 1;
}
strbuf_release(&buf);
return NULL;
}
static int exists_in_PATH(const char *file)
{
char *r = locate_in_PATH(file);
free(r);
return r != NULL;
}
int sane_execvp(const char *file, char * const argv[])
{
if (!execvp(file, argv))
return 0; /* cannot happen ;-) */
/*
* When a command can't be found because one of the directories
* listed in $PATH is unsearchable, execvp reports EACCES, but
* careful usability testing (read: analysis of occasional bug
* reports) reveals that "No such file or directory" is more
* intuitive.
*
* We avoid commands with "/", because execvp will not do $PATH
* lookups in that case.
*
* The reassignment of EACCES to errno looks like a no-op below,
* but we need to protect against exists_in_PATH overwriting errno.
*/
if (errno == EACCES && !strchr(file, '/'))
errno = exists_in_PATH(file) ? EACCES : ENOENT;
else if (errno == ENOTDIR && !strchr(file, '/'))
errno = ENOENT;
return -1;
}
static const char **prepare_shell_cmd(const char **argv)
{
int argc, nargc = 0;
const char **nargv;
for (argc = 0; argv[argc]; argc++)
; /* just counting */
/* +1 for NULL, +3 for "sh -c" plus extra $0 */
nargv = xmalloc(sizeof(*nargv) * (argc + 1 + 3));
if (argc < 1)
die("BUG: shell command is empty");
if (strcspn(argv[0], "|&;<>()$`\\\"' \t\n*?[#~=%") != strlen(argv[0])) {
#ifndef GIT_WINDOWS_NATIVE
Use SHELL_PATH from build system in run_command.c:prepare_shell_cmd During the testing of the 1.7.10 rc series on Solaris for OpenCSW, it was discovered that t7006-pager was failing due to finding a bad "sh" in PATH after a call to execvp("sh", ...). This call was setup by run_command.c:prepare_shell_cmd. The PATH in use at the time saw /opt/csw/bin given precedence to traditional Solaris paths such as /usr/bin and /usr/xpg4/bin. A package named schilyutils (Joerg Schilling's utilities) was installed on the build system and it delivered a modified version of the traditional Solaris /usr/bin/sh as /opt/csw/bin/sh. This version of sh suffers from many of the same problems as /usr/bin/sh. The command-specific pager test failed due to the broken "sh" handling ^ as a pipe character. It tried to fork two processes when it encountered "sed s/^/foo:/" as the pager command. This problem was entirely dependent on the PATH of the user at runtime. Possible fixes for this issue are: 1. Use the standard system() or popen() which both launch a POSIX shell on Solaris as long as _POSIX_SOURCE is defined. 2. The git wrapper could prepend SANE_TOOL_PATH to PATH thus forcing all unqualified commands run to use the known good tools on the system. 3. The run_command.c:prepare_shell_command() could use the same SHELL_PATH that is in the #! line of all all scripts and not rely on PATH to find the sh to run. Option 1 would preclude opening a bidirectional pipe to a filter script and would also break git for Windows as cmd.exe is spawned from system() (cf. v1.7.5-rc0~144^2, "alias: use run_command api to execute aliases, 2011-01-07). Option 2 is not friendly to users as it would negate their ability to use tools of their choice in many cases. Alternately, injecting SANE_TOOL_PATH such that it takes precedence over /bin and /usr/bin (and anything with lower precedence than those paths) as git-sh-setup.sh does would not solve the problem either as the user environment could still allow a bad sh to be found. (Many OpenCSW users will have /opt/csw/bin leading their PATH and some subset would have schilyutils installed.) Option 3 allows us to use a known good shell while still honouring the users' PATH for the utilities being run. Thus, it solves the problem while not negatively impacting either users or git's ability to run external commands in convenient ways. Essentially, the shell is a special case of tool that should not rely on SANE_TOOL_PATH and must be called explicitly. With this patch applied, any code path leading to run_command.c:prepare_shell_cmd can count on using the same sane shell that all shell scripts in the git suite use. Both the build system and run_command.c will default this shell to /bin/sh unless overridden. Signed-off-by: Ben Walton <bwalton@artsci.utoronto.ca> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-03-31 03:33:21 +02:00
nargv[nargc++] = SHELL_PATH;
#else
nargv[nargc++] = "sh";
#endif
nargv[nargc++] = "-c";
if (argc < 2)
nargv[nargc++] = argv[0];
else {
struct strbuf arg0 = STRBUF_INIT;
strbuf_addf(&arg0, "%s \"$@\"", argv[0]);
nargv[nargc++] = strbuf_detach(&arg0, NULL);
}
}
for (argc = 0; argv[argc]; argc++)
nargv[nargc++] = argv[argc];
nargv[nargc] = NULL;
return nargv;
}
#ifndef GIT_WINDOWS_NATIVE
static int execv_shell_cmd(const char **argv)
{
const char **nargv = prepare_shell_cmd(argv);
trace_argv_printf(nargv, "trace: exec:");
sane_execvp(nargv[0], (char **)nargv);
free(nargv);
return -1;
}
#endif
#ifndef GIT_WINDOWS_NATIVE
static int child_notifier = -1;
static void notify_parent(void)
{
/*
* execvp failed. If possible, we'd like to let start_command
* know, so failures like ENOENT can be handled right away; but
* otherwise, finish_command will still report the error.
*/
xwrite(child_notifier, "", 1);
}
#endif
static inline void set_cloexec(int fd)
{
int flags = fcntl(fd, F_GETFD);
if (flags >= 0)
fcntl(fd, F_SETFD, flags | FD_CLOEXEC);
}
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
static int wait_or_whine(pid_t pid, const char *argv0, int in_signal)
{
int status, code = -1;
pid_t waiting;
int failed_errno = 0;
while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
; /* nothing */
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
if (in_signal)
return 0;
if (waiting < 0) {
failed_errno = errno;
error("waitpid for %s failed: %s", argv0, strerror(errno));
} else if (waiting != pid) {
error("waitpid is confused (%s)", argv0);
} else if (WIFSIGNALED(status)) {
code = WTERMSIG(status);
run-command: don't warn on SIGPIPE deaths When git executes a sub-command, we print a warning if the command dies due to a signal, but make an exception for "uninteresting" cases like SIGINT and SIGQUIT (since the user presumably just hit ^C). We should make a similar exception for SIGPIPE, because it's an expected and uninteresting return in most cases; it generally means the user quit the pager before git had finished generating all output. This used to be very hard to trigger in practice, because: 1. We only complain if we see a real SIGPIPE death, not the shell-induced 141 exit code. This means that anything we run via the shell does not trigger the warning, which includes most non-trivial aliases. 2. The common case for SIGPIPE is the user quitting the pager before git has finished generating all output. But if the user triggers a pager with "-p", we redirect the git wrapper's stderr to that pager, too. Since the pager is dead, it means that the message goes nowhere. 3. You can see it if you run your own pager, like "git foo | head". But that only happens if "foo" is a non-builtin (so it doesn't work with "log", for example). However, it may become more common after 86d26f2, which teaches alias to re-exec builtins rather than running them in the same process. This case doesn't trigger (1), as we don't need a shell to run a git command. It doesn't trigger (2), because the pager is not started by the original git, but by the inner re-exec of git. And it doesn't trigger (3), because builtins are treated more like non-builtins in this case. Given how flaky this message already is (e.g., you cannot even know whether you will see it, as git optimizes out some shell invocations behind the scenes based on the contents of the command!), and that it is unlikely to ever provide useful information, let's suppress it for all cases of SIGPIPE. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-29 09:12:22 +01:00
if (code != SIGINT && code != SIGQUIT && code != SIGPIPE)
error("%s died of signal %d", argv0, code);
/*
* This return value is chosen so that code & 0xff
* mimics the exit code that a POSIX shell would report for
* a program that died from this signal.
*/
run-command: encode signal death as a positive integer When a sub-command dies due to a signal, we encode the signal number into the numeric exit status as "signal - 128". This is easy to identify (versus a regular positive error code), and when cast to an unsigned integer (e.g., by feeding it to exit), matches what a POSIX shell would return when reporting a signal death in $? or through its own exit code. So we have a negative value inside the code, but once it passes across an exit() barrier, it looks positive (and any code we receive from a sub-shell will have the positive form). E.g., death by SIGPIPE (signal 13) will look like -115 to us in inside git, but will end up as 141 when we call exit() with it. And a program killed by SIGPIPE but run via the shell will come to us with an exit code of 141. Unfortunately, this means that when the "use_shell" option is set, we need to be on the lookout for _both_ forms. We might or might not have actually invoked the shell (because we optimize out some useless shell calls). If we didn't invoke the shell, we will will see the sub-process's signal death directly, and run-command converts it into a negative value. But if we did invoke the shell, we will see the shell's 128+signal exit status. To be thorough, we would need to check both, or cast the value to an unsigned char (after checking that it is not -1, which is a magic error value). Fortunately, most callsites do not care at all whether the exit was from a code or from a signal; they merely check for a non-zero status, and sometimes propagate the error via exit(). But for the callers that do care, we can make life slightly easier by just using the consistent positive form. This actually fixes two minor bugs: 1. In launch_editor, we check whether the editor died from SIGINT or SIGQUIT. But we checked only the negative form, meaning that we would fail to notice a signal death exit code which was propagated through the shell. 2. In handle_alias, we assume that a negative return value from run_command means that errno tells us something interesting (like a fork failure, or ENOENT). Otherwise, we simply propagate the exit code. Negative signal death codes confuse us, and we print a useless "unable to run alias 'foo': Success" message. By encoding signal deaths using the positive form, the existing code just propagates it as it would a normal non-zero exit code. The downside is that callers of run_command can no longer differentiate between a signal received directly by the sub-process, and one propagated. However, no caller currently cares, and since we already optimize out some calls to the shell under the hood, that distinction is not something that should be relied upon by callers. Fix the same logic in t/test-terminal.perl for consistency [jc: raised by Jonathan in the discussion]. Signed-off-by: Jeff King <peff@peff.net> Acked-by: Johannes Sixt <j6t@kdbg.org> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-01-05 15:49:49 +01:00
code += 128;
} else if (WIFEXITED(status)) {
code = WEXITSTATUS(status);
/*
* Convert special exit code when execvp failed.
*/
if (code == 127) {
code = -1;
failed_errno = ENOENT;
}
} else {
error("waitpid is confused (%s)", argv0);
}
clear_child_for_cleanup(pid);
errno = failed_errno;
return code;
}
int start_command(struct child_process *cmd)
{
int need_in, need_out, need_err;
int fdin[2], fdout[2], fderr[2];
int failed_errno;
char *str;
if (!cmd->argv)
cmd->argv = cmd->args.argv;
if (!cmd->env)
cmd->env = cmd->env_array.argv;
/*
* In case of errors we must keep the promise to close FDs
* that have been passed in via ->in and ->out.
*/
need_in = !cmd->no_stdin && cmd->in < 0;
if (need_in) {
if (pipe(fdin) < 0) {
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
failed_errno = errno;
if (cmd->out > 0)
close(cmd->out);
str = "standard input";
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
goto fail_pipe;
}
cmd->in = fdin[1];
}
need_out = !cmd->no_stdout
&& !cmd->stdout_to_stderr
&& cmd->out < 0;
if (need_out) {
if (pipe(fdout) < 0) {
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
failed_errno = errno;
if (need_in)
close_pair(fdin);
else if (cmd->in)
close(cmd->in);
str = "standard output";
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
goto fail_pipe;
}
cmd->out = fdout[0];
}
need_err = !cmd->no_stderr && cmd->err < 0;
if (need_err) {
if (pipe(fderr) < 0) {
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
failed_errno = errno;
if (need_in)
close_pair(fdin);
else if (cmd->in)
close(cmd->in);
if (need_out)
close_pair(fdout);
else if (cmd->out)
close(cmd->out);
str = "standard error";
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
fail_pipe:
error("cannot create %s pipe for %s: %s",
str, cmd->argv[0], strerror(failed_errno));
child_process_clear(cmd);
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
errno = failed_errno;
return -1;
}
cmd->err = fderr[0];
}
trace_argv_printf(cmd->argv, "trace: run_command:");
fflush(NULL);
#ifndef GIT_WINDOWS_NATIVE
{
int notify_pipe[2];
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
/*
* Redirect the channel to write syscall error messages to
* before redirecting the process's stderr so that all die()
* in subsequent call paths use the parent's stderr.
*/
if (cmd->no_stderr || need_err) {
int child_err = dup(2);
set_cloexec(child_err);
set_error_handle(fdopen(child_err, "w"));
}
close(notify_pipe[0]);
set_cloexec(notify_pipe[1]);
child_notifier = notify_pipe[1];
atexit(notify_parent);
if (cmd->no_stdin)
dup_devnull(0);
else if (need_in) {
dup2(fdin[0], 0);
close_pair(fdin);
} else if (cmd->in) {
dup2(cmd->in, 0);
close(cmd->in);
}
if (cmd->no_stderr)
dup_devnull(2);
else if (need_err) {
dup2(fderr[1], 2);
close_pair(fderr);
} else if (cmd->err > 1) {
dup2(cmd->err, 2);
close(cmd->err);
}
if (cmd->no_stdout)
dup_devnull(1);
else if (cmd->stdout_to_stderr)
dup2(2, 1);
else if (need_out) {
dup2(fdout[1], 1);
close_pair(fdout);
} else if (cmd->out > 1) {
dup2(cmd->out, 1);
close(cmd->out);
}
if (cmd->dir && chdir(cmd->dir))
die_errno("exec '%s': cd to '%s' failed", cmd->argv[0],
cmd->dir);
if (cmd->env) {
for (; *cmd->env; cmd->env++) {
if (strchr(*cmd->env, '='))
putenv((char *)*cmd->env);
else
unsetenv(*cmd->env);
}
}
if (cmd->git_cmd)
execv_git_cmd(cmd->argv);
else if (cmd->use_shell)
execv_shell_cmd(cmd->argv);
else
sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
notice error exit from pager If the pager fails to run, git produces no output, e.g.: $ GIT_PAGER=not-a-command git log The error reporting fails for two reasons: (1) start_command: There is a mechanism that detects errors during execvp introduced in 2b541bf8 (start_command: detect execvp failures early). The child writes one byte to a pipe only if execvp fails. The parent waits for either EOF, when the successful execvp automatically closes the pipe (see FD_CLOEXEC in fcntl(1)), or it reads a single byte, in which case it knows that the execvp failed. This mechanism is incompatible with the workaround introduced in 35ce8622 (pager: Work around window resizing bug in 'less'), which waits for input from the parent before the exec. Since both the parent and the child are waiting for input from each other, that would result in a deadlock. In order to avoid that, the mechanism is disabled by closing the child_notifier file descriptor. (2) finish_command: The parent correctly detects the 127 exit status from the child, but the error output goes nowhere, since by that time it is already being redirected to the child. No simple solution for (1) comes to mind. Number (2) can be solved by not sending error output to the pager. Not redirecting error output to the pager can result in the pager overwriting error output with standard output, however. Since there is no reliable way to handle error reporting in the parent, produce the output in the child instead. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01 19:59:21 +02:00
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
error("cannot run %s: %s", cmd->argv[0],
strerror(ENOENT));
exit(127);
notice error exit from pager If the pager fails to run, git produces no output, e.g.: $ GIT_PAGER=not-a-command git log The error reporting fails for two reasons: (1) start_command: There is a mechanism that detects errors during execvp introduced in 2b541bf8 (start_command: detect execvp failures early). The child writes one byte to a pipe only if execvp fails. The parent waits for either EOF, when the successful execvp automatically closes the pipe (see FD_CLOEXEC in fcntl(1)), or it reads a single byte, in which case it knows that the execvp failed. This mechanism is incompatible with the workaround introduced in 35ce8622 (pager: Work around window resizing bug in 'less'), which waits for input from the parent before the exec. Since both the parent and the child are waiting for input from each other, that would result in a deadlock. In order to avoid that, the mechanism is disabled by closing the child_notifier file descriptor. (2) finish_command: The parent correctly detects the 127 exit status from the child, but the error output goes nowhere, since by that time it is already being redirected to the child. No simple solution for (1) comes to mind. Number (2) can be solved by not sending error output to the pager. Not redirecting error output to the pager can result in the pager overwriting error output with standard output, however. Since there is no reliable way to handle error reporting in the parent, produce the output in the child instead. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01 19:59:21 +02:00
} else {
die_errno("cannot exec '%s'", cmd->argv[0]);
notice error exit from pager If the pager fails to run, git produces no output, e.g.: $ GIT_PAGER=not-a-command git log The error reporting fails for two reasons: (1) start_command: There is a mechanism that detects errors during execvp introduced in 2b541bf8 (start_command: detect execvp failures early). The child writes one byte to a pipe only if execvp fails. The parent waits for either EOF, when the successful execvp automatically closes the pipe (see FD_CLOEXEC in fcntl(1)), or it reads a single byte, in which case it knows that the execvp failed. This mechanism is incompatible with the workaround introduced in 35ce8622 (pager: Work around window resizing bug in 'less'), which waits for input from the parent before the exec. Since both the parent and the child are waiting for input from each other, that would result in a deadlock. In order to avoid that, the mechanism is disabled by closing the child_notifier file descriptor. (2) finish_command: The parent correctly detects the 127 exit status from the child, but the error output goes nowhere, since by that time it is already being redirected to the child. No simple solution for (1) comes to mind. Number (2) can be solved by not sending error output to the pager. Not redirecting error output to the pager can result in the pager overwriting error output with standard output, however. Since there is no reliable way to handle error reporting in the parent, produce the output in the child instead. Signed-off-by: Clemens Buchacher <drizzd@aon.at> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-01 19:59:21 +02:00
}
}
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
if (cmd->pid < 0)
error("cannot fork() for %s: %s", cmd->argv[0],
strerror(errno));
else if (cmd->clean_on_exit)
mark_child_for_cleanup(cmd->pid);
/*
* Wait for child's execvp. If the execvp succeeds (or if fork()
* failed), EOF is seen immediately by the parent. Otherwise, the
* child process sends a single byte.
* Note that use of this infrastructure is completely advisory,
* therefore, we keep error checks minimal.
*/
close(notify_pipe[1]);
if (read(notify_pipe[0], &notify_pipe[1], 1) == 1) {
/*
* At this point we know that fork() succeeded, but execvp()
* failed. Errors have been reported to our stderr.
*/
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
wait_or_whine(cmd->pid, cmd->argv[0], 0);
failed_errno = errno;
cmd->pid = -1;
}
close(notify_pipe[0]);
}
#else
{
Windows: avoid the "dup dance" when spawning a child process When stdin, stdout, or stderr must be redirected for a child process that on Windows is spawned using one of the spawn() functions of Microsoft's C runtime, then there is no choice other than to 1. make a backup copy of fd 0,1,2 with dup 2. dup2 the redirection source fd into 0,1,2 3. spawn 4. dup2 the backup back into 0,1,2 5. close the backup copy and the redirection source We used this idiom as well -- but we are not using the spawn() functions anymore! Instead, we have our own implementation. We had hardcoded that stdin, stdout, and stderr of the child process were inherited from the parent's fds 0, 1, and 2. But we can actually specify any fd. With this patch, the fds to inherit are passed from start_command()'s WIN32 section to our spawn implementation. This way, we can avoid the backup copies of the fds. The backup copies were a bug waiting to surface: The OS handles underlying the dup()ed fds were inherited by the child process (but were not associated with a file descriptor in the child). Consequently, the file or pipe represented by the OS handle remained open even after the backup copy was closed in the parent process until the child exited. Since our implementation of pipe() creates non-inheritable OS handles, we still dup() file descriptors in start_command() because dup() happens to create inheritable duplicates. (A nice side effect is that the fd cleanup in start_command is the same for Windows and Unix and remains unchanged.) Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-15 21:12:18 +01:00
int fhin = 0, fhout = 1, fherr = 2;
const char **sargv = cmd->argv;
Windows: avoid the "dup dance" when spawning a child process When stdin, stdout, or stderr must be redirected for a child process that on Windows is spawned using one of the spawn() functions of Microsoft's C runtime, then there is no choice other than to 1. make a backup copy of fd 0,1,2 with dup 2. dup2 the redirection source fd into 0,1,2 3. spawn 4. dup2 the backup back into 0,1,2 5. close the backup copy and the redirection source We used this idiom as well -- but we are not using the spawn() functions anymore! Instead, we have our own implementation. We had hardcoded that stdin, stdout, and stderr of the child process were inherited from the parent's fds 0, 1, and 2. But we can actually specify any fd. With this patch, the fds to inherit are passed from start_command()'s WIN32 section to our spawn implementation. This way, we can avoid the backup copies of the fds. The backup copies were a bug waiting to surface: The OS handles underlying the dup()ed fds were inherited by the child process (but were not associated with a file descriptor in the child). Consequently, the file or pipe represented by the OS handle remained open even after the backup copy was closed in the parent process until the child exited. Since our implementation of pipe() creates non-inheritable OS handles, we still dup() file descriptors in start_command() because dup() happens to create inheritable duplicates. (A nice side effect is that the fd cleanup in start_command is the same for Windows and Unix and remains unchanged.) Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-15 21:12:18 +01:00
if (cmd->no_stdin)
fhin = open("/dev/null", O_RDWR);
else if (need_in)
fhin = dup(fdin[0]);
else if (cmd->in)
fhin = dup(cmd->in);
if (cmd->no_stderr)
fherr = open("/dev/null", O_RDWR);
else if (need_err)
fherr = dup(fderr[1]);
else if (cmd->err > 2)
fherr = dup(cmd->err);
Windows: avoid the "dup dance" when spawning a child process When stdin, stdout, or stderr must be redirected for a child process that on Windows is spawned using one of the spawn() functions of Microsoft's C runtime, then there is no choice other than to 1. make a backup copy of fd 0,1,2 with dup 2. dup2 the redirection source fd into 0,1,2 3. spawn 4. dup2 the backup back into 0,1,2 5. close the backup copy and the redirection source We used this idiom as well -- but we are not using the spawn() functions anymore! Instead, we have our own implementation. We had hardcoded that stdin, stdout, and stderr of the child process were inherited from the parent's fds 0, 1, and 2. But we can actually specify any fd. With this patch, the fds to inherit are passed from start_command()'s WIN32 section to our spawn implementation. This way, we can avoid the backup copies of the fds. The backup copies were a bug waiting to surface: The OS handles underlying the dup()ed fds were inherited by the child process (but were not associated with a file descriptor in the child). Consequently, the file or pipe represented by the OS handle remained open even after the backup copy was closed in the parent process until the child exited. Since our implementation of pipe() creates non-inheritable OS handles, we still dup() file descriptors in start_command() because dup() happens to create inheritable duplicates. (A nice side effect is that the fd cleanup in start_command is the same for Windows and Unix and remains unchanged.) Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-15 21:12:18 +01:00
if (cmd->no_stdout)
fhout = open("/dev/null", O_RDWR);
else if (cmd->stdout_to_stderr)
fhout = dup(fherr);
else if (need_out)
fhout = dup(fdout[1]);
else if (cmd->out > 1)
fhout = dup(cmd->out);
if (cmd->git_cmd)
cmd->argv = prepare_git_cmd(cmd->argv);
else if (cmd->use_shell)
cmd->argv = prepare_shell_cmd(cmd->argv);
cmd->pid = mingw_spawnvpe(cmd->argv[0], cmd->argv, (char**) cmd->env,
cmd->dir, fhin, fhout, fherr);
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
failed_errno = errno;
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
error("cannot spawn %s: %s", cmd->argv[0], strerror(errno));
if (cmd->clean_on_exit && cmd->pid >= 0)
mark_child_for_cleanup(cmd->pid);
if (cmd->git_cmd)
free(cmd->argv);
cmd->argv = sargv;
Windows: avoid the "dup dance" when spawning a child process When stdin, stdout, or stderr must be redirected for a child process that on Windows is spawned using one of the spawn() functions of Microsoft's C runtime, then there is no choice other than to 1. make a backup copy of fd 0,1,2 with dup 2. dup2 the redirection source fd into 0,1,2 3. spawn 4. dup2 the backup back into 0,1,2 5. close the backup copy and the redirection source We used this idiom as well -- but we are not using the spawn() functions anymore! Instead, we have our own implementation. We had hardcoded that stdin, stdout, and stderr of the child process were inherited from the parent's fds 0, 1, and 2. But we can actually specify any fd. With this patch, the fds to inherit are passed from start_command()'s WIN32 section to our spawn implementation. This way, we can avoid the backup copies of the fds. The backup copies were a bug waiting to surface: The OS handles underlying the dup()ed fds were inherited by the child process (but were not associated with a file descriptor in the child). Consequently, the file or pipe represented by the OS handle remained open even after the backup copy was closed in the parent process until the child exited. Since our implementation of pipe() creates non-inheritable OS handles, we still dup() file descriptors in start_command() because dup() happens to create inheritable duplicates. (A nice side effect is that the fd cleanup in start_command is the same for Windows and Unix and remains unchanged.) Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-01-15 21:12:18 +01:00
if (fhin != 0)
close(fhin);
if (fhout != 1)
close(fhout);
if (fherr != 2)
close(fherr);
}
#endif
if (cmd->pid < 0) {
if (need_in)
close_pair(fdin);
else if (cmd->in)
close(cmd->in);
if (need_out)
close_pair(fdout);
else if (cmd->out)
close(cmd->out);
if (need_err)
close_pair(fderr);
else if (cmd->err)
close(cmd->err);
child_process_clear(cmd);
run_command: report system call errors instead of returning error codes The motivation for this change is that system call failures are serious errors that should be reported to the user, but only few callers took the burden to decode the error codes that the functions returned into error messages. If at all, then only an unspecific error message was given. A prominent example is this: $ git upload-pack . | : fatal: unable to run 'git-upload-pack' In this example, git-upload-pack, the external command invoked through the git wrapper, dies due to SIGPIPE, but the git wrapper does not bother to report the real cause. In fact, this very error message is copied to the syslog if git-daemon's client aborts the connection early. With this change, system call failures are reported immediately after the failure and only a generic failure code is returned to the caller. In the above example the error is now to the point: $ git upload-pack . | : error: git-upload-pack died of signal Note that there is no error report if the invoked program terminated with a non-zero exit code, because it is reasonable to expect that the invoked program has already reported an error. (But many run_command call sites nevertheless write a generic error message.) There was one special return code that was used to identify the case where run_command failed because the requested program could not be exec'd. This special case is now treated like a system call failure with errno set to ENOENT. No error is reported in this case, because the call site in git.c expects this as a normal result. Therefore, the callers that carefully decoded the return value still check for this condition. Signed-off-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-07-04 21:26:40 +02:00
errno = failed_errno;
return -1;
}
if (need_in)
close(fdin[0]);
else if (cmd->in)
close(cmd->in);
if (need_out)
close(fdout[1]);
else if (cmd->out)
close(cmd->out);
if (need_err)
close(fderr[1]);
else if (cmd->err)
close(cmd->err);
return 0;
}
int finish_command(struct child_process *cmd)
{
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
int ret = wait_or_whine(cmd->pid, cmd->argv[0], 0);
child_process_clear(cmd);
return ret;
}
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
int finish_command_in_signal(struct child_process *cmd)
{
return wait_or_whine(cmd->pid, cmd->argv[0], 1);
}
int run_command(struct child_process *cmd)
{
int code;
if (cmd->out < 0 || cmd->err < 0)
die("BUG: run_command with a pipe can cause deadlock");
code = start_command(cmd);
if (code)
return code;
return finish_command(cmd);
}
int run_command_v_opt(const char **argv, int opt)
{
return run_command_v_opt_cd_env(argv, opt, NULL, NULL);
}
int run_command_v_opt_cd_env(const char **argv, int opt, const char *dir, const char *const *env)
{
struct child_process cmd = CHILD_PROCESS_INIT;
cmd.argv = argv;
cmd.no_stdin = opt & RUN_COMMAND_NO_STDIN ? 1 : 0;
cmd.git_cmd = opt & RUN_GIT_CMD ? 1 : 0;
cmd.stdout_to_stderr = opt & RUN_COMMAND_STDOUT_TO_STDERR ? 1 : 0;
cmd.silent_exec_failure = opt & RUN_SILENT_EXEC_FAILURE ? 1 : 0;
cmd.use_shell = opt & RUN_USING_SHELL ? 1 : 0;
cmd.clean_on_exit = opt & RUN_CLEAN_ON_EXIT ? 1 : 0;
cmd.dir = dir;
cmd.env = env;
return run_command(&cmd);
}
#ifndef NO_PTHREADS
static pthread_t main_thread;
static int main_thread_set;
static pthread_key_t async_key;
run-command: use thread-aware die_is_recursing routine If we die from an async thread, we do not actually exit the program, but just kill the thread. This confuses the static counter in usage.c's default die_is_recursing function; it updates the counter once for the thread death, and then when the main program calls die() itself, it erroneously thinks we are recursing. The end result is that we print "recursion detected in die handler" instead of the real error in such a case (the easiest way to trigger this is having a remote connection hang up while running a sideband demultiplexer). This patch solves it by using a per-thread counter when the async_die function is installed; we detect recursion in each thread (including the main one), but they do not step on each other's toes. Other threaded code does not need to worry about this, as they do not install specialized die handlers; they just let a die() from a sub-thread take down the whole program. Since we are overriding the default recursion-check function, there is an interesting corner case that is not a problem, but bears some explanation. Imagine the main thread calls die(), and then in the die_routine starts an async call. We will switch to using thread-local storage, which starts at 0, for the main thread's counter, even though the original counter was actually at 1. That's OK, though, for two reasons: 1. It would miss only the first level of recursion, and would still find recursive failures inside the async helper. 2. We do not currently and are not likely to start doing anything as heavyweight as starting an async routine from within a die routine or helper function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 21:50:07 +02:00
static pthread_key_t async_die_counter;
static void *run_thread(void *data)
{
struct async *async = data;
intptr_t ret;
pthread_setspecific(async_key, async);
ret = async->proc(async->proc_in, async->proc_out, async->data);
return (void *)ret;
}
static NORETURN void die_async(const char *err, va_list params)
{
vreportf("fatal: ", err, params);
if (in_async()) {
struct async *async = pthread_getspecific(async_key);
if (async->proc_in >= 0)
close(async->proc_in);
if (async->proc_out >= 0)
close(async->proc_out);
pthread_exit((void *)128);
}
exit(128);
}
run-command: use thread-aware die_is_recursing routine If we die from an async thread, we do not actually exit the program, but just kill the thread. This confuses the static counter in usage.c's default die_is_recursing function; it updates the counter once for the thread death, and then when the main program calls die() itself, it erroneously thinks we are recursing. The end result is that we print "recursion detected in die handler" instead of the real error in such a case (the easiest way to trigger this is having a remote connection hang up while running a sideband demultiplexer). This patch solves it by using a per-thread counter when the async_die function is installed; we detect recursion in each thread (including the main one), but they do not step on each other's toes. Other threaded code does not need to worry about this, as they do not install specialized die handlers; they just let a die() from a sub-thread take down the whole program. Since we are overriding the default recursion-check function, there is an interesting corner case that is not a problem, but bears some explanation. Imagine the main thread calls die(), and then in the die_routine starts an async call. We will switch to using thread-local storage, which starts at 0, for the main thread's counter, even though the original counter was actually at 1. That's OK, though, for two reasons: 1. It would miss only the first level of recursion, and would still find recursive failures inside the async helper. 2. We do not currently and are not likely to start doing anything as heavyweight as starting an async routine from within a die routine or helper function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 21:50:07 +02:00
static int async_die_is_recursing(void)
{
void *ret = pthread_getspecific(async_die_counter);
pthread_setspecific(async_die_counter, (void *)1);
return ret != NULL;
}
int in_async(void)
{
if (!main_thread_set)
return 0; /* no asyncs started yet */
return !pthread_equal(main_thread, pthread_self());
}
#else
static struct {
void (**handlers)(void);
size_t nr;
size_t alloc;
} git_atexit_hdlrs;
static int git_atexit_installed;
static void git_atexit_dispatch(void)
{
size_t i;
for (i=git_atexit_hdlrs.nr ; i ; i--)
git_atexit_hdlrs.handlers[i-1]();
}
static void git_atexit_clear(void)
{
free(git_atexit_hdlrs.handlers);
memset(&git_atexit_hdlrs, 0, sizeof(git_atexit_hdlrs));
git_atexit_installed = 0;
}
#undef atexit
int git_atexit(void (*handler)(void))
{
ALLOC_GROW(git_atexit_hdlrs.handlers, git_atexit_hdlrs.nr + 1, git_atexit_hdlrs.alloc);
git_atexit_hdlrs.handlers[git_atexit_hdlrs.nr++] = handler;
if (!git_atexit_installed) {
if (atexit(&git_atexit_dispatch))
return -1;
git_atexit_installed = 1;
}
return 0;
}
#define atexit git_atexit
static int process_is_async;
int in_async(void)
{
return process_is_async;
}
#endif
int start_async(struct async *async)
{
int need_in, need_out;
int fdin[2], fdout[2];
int proc_in, proc_out;
need_in = async->in < 0;
if (need_in) {
if (pipe(fdin) < 0) {
if (async->out > 0)
close(async->out);
return error("cannot create pipe: %s", strerror(errno));
}
async->in = fdin[1];
}
need_out = async->out < 0;
if (need_out) {
if (pipe(fdout) < 0) {
if (need_in)
close_pair(fdin);
else if (async->in)
close(async->in);
return error("cannot create pipe: %s", strerror(errno));
}
async->out = fdout[0];
}
if (need_in)
proc_in = fdin[0];
else if (async->in)
proc_in = async->in;
else
proc_in = -1;
if (need_out)
proc_out = fdout[1];
else if (async->out)
proc_out = async->out;
else
proc_out = -1;
#ifdef NO_PTHREADS
/* Flush stdio before fork() to avoid cloning buffers */
fflush(NULL);
async->pid = fork();
if (async->pid < 0) {
error("fork (async) failed: %s", strerror(errno));
goto error;
}
if (!async->pid) {
if (need_in)
close(fdin[1]);
if (need_out)
close(fdout[0]);
git_atexit_clear();
process_is_async = 1;
exit(!!async->proc(proc_in, proc_out, async->data));
}
mark_child_for_cleanup(async->pid);
if (need_in)
close(fdin[0]);
else if (async->in)
close(async->in);
if (need_out)
close(fdout[1]);
else if (async->out)
close(async->out);
#else
if (!main_thread_set) {
/*
* We assume that the first time that start_async is called
* it is from the main thread.
*/
main_thread_set = 1;
main_thread = pthread_self();
pthread_key_create(&async_key, NULL);
run-command: use thread-aware die_is_recursing routine If we die from an async thread, we do not actually exit the program, but just kill the thread. This confuses the static counter in usage.c's default die_is_recursing function; it updates the counter once for the thread death, and then when the main program calls die() itself, it erroneously thinks we are recursing. The end result is that we print "recursion detected in die handler" instead of the real error in such a case (the easiest way to trigger this is having a remote connection hang up while running a sideband demultiplexer). This patch solves it by using a per-thread counter when the async_die function is installed; we detect recursion in each thread (including the main one), but they do not step on each other's toes. Other threaded code does not need to worry about this, as they do not install specialized die handlers; they just let a die() from a sub-thread take down the whole program. Since we are overriding the default recursion-check function, there is an interesting corner case that is not a problem, but bears some explanation. Imagine the main thread calls die(), and then in the die_routine starts an async call. We will switch to using thread-local storage, which starts at 0, for the main thread's counter, even though the original counter was actually at 1. That's OK, though, for two reasons: 1. It would miss only the first level of recursion, and would still find recursive failures inside the async helper. 2. We do not currently and are not likely to start doing anything as heavyweight as starting an async routine from within a die routine or helper function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 21:50:07 +02:00
pthread_key_create(&async_die_counter, NULL);
set_die_routine(die_async);
run-command: use thread-aware die_is_recursing routine If we die from an async thread, we do not actually exit the program, but just kill the thread. This confuses the static counter in usage.c's default die_is_recursing function; it updates the counter once for the thread death, and then when the main program calls die() itself, it erroneously thinks we are recursing. The end result is that we print "recursion detected in die handler" instead of the real error in such a case (the easiest way to trigger this is having a remote connection hang up while running a sideband demultiplexer). This patch solves it by using a per-thread counter when the async_die function is installed; we detect recursion in each thread (including the main one), but they do not step on each other's toes. Other threaded code does not need to worry about this, as they do not install specialized die handlers; they just let a die() from a sub-thread take down the whole program. Since we are overriding the default recursion-check function, there is an interesting corner case that is not a problem, but bears some explanation. Imagine the main thread calls die(), and then in the die_routine starts an async call. We will switch to using thread-local storage, which starts at 0, for the main thread's counter, even though the original counter was actually at 1. That's OK, though, for two reasons: 1. It would miss only the first level of recursion, and would still find recursive failures inside the async helper. 2. We do not currently and are not likely to start doing anything as heavyweight as starting an async routine from within a die routine or helper function. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-16 21:50:07 +02:00
set_die_is_recursing_routine(async_die_is_recursing);
}
if (proc_in >= 0)
set_cloexec(proc_in);
if (proc_out >= 0)
set_cloexec(proc_out);
async->proc_in = proc_in;
async->proc_out = proc_out;
{
int err = pthread_create(&async->tid, NULL, run_thread, async);
if (err) {
error("cannot create thread: %s", strerror(err));
goto error;
}
}
#endif
return 0;
error:
if (need_in)
close_pair(fdin);
else if (async->in)
close(async->in);
if (need_out)
close_pair(fdout);
else if (async->out)
close(async->out);
return -1;
}
int finish_async(struct async *async)
{
#ifdef NO_PTHREADS
pager: don't use unsafe functions in signal handlers Since the commit a3da8821208d (pager: do wait_for_pager on signal death), we call wait_for_pager() in the pager's signal handler. The recent bug report revealed that this causes a deadlock in glibc at aborting "git log" [*1*]. When this happens, git process is left unterminated, and it can't be killed by SIGTERM but only by SIGKILL. The problem is that wait_for_pager() function does more than waiting for pager process's termination, but it does cleanups and printing errors. Unfortunately, the functions that may be used in a signal handler are very limited [*2*]. Particularly, malloc(), free() and the variants can't be used in a signal handler because they take a mutex internally in glibc. This was the cause of the deadlock above. Other than the direct calls of malloc/free, many functions calling malloc/free can't be used. strerror() is such one, either. Also the usage of fflush() and printf() in a signal handler is bad, although it seems working so far. In a safer side, we should avoid them, too. This patch tries to reduce the calls of such functions in signal handlers. wait_for_signal() takes a flag and avoids the unsafe calls. Also, finish_command_in_signal() is introduced for the same reason. There the free() calls are removed, and only waits for the children without whining at errors. [*1*] https://bugzilla.opensuse.org/show_bug.cgi?id=942297 [*2*] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03 Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 11:35:57 +02:00
return wait_or_whine(async->pid, "child process", 0);
#else
void *ret = (void *)(intptr_t)(-1);
if (pthread_join(async->tid, &ret))
error("pthread_join failed");
return (int)(intptr_t)ret;
#endif
}
const char *find_hook(const char *name)
{
static struct strbuf path = STRBUF_INIT;
strbuf_reset(&path);
strbuf_git_path(&path, "hooks/%s", name);
if (access(path.buf, X_OK) < 0)
return NULL;
return path.buf;
}
int run_hook_ve(const char *const *env, const char *name, va_list args)
{
struct child_process hook = CHILD_PROCESS_INIT;
const char *p;
p = find_hook(name);
if (!p)
return 0;
argv_array_push(&hook.args, p);
while ((p = va_arg(args, const char *)))
argv_array_push(&hook.args, p);
hook.env = env;
hook.no_stdin = 1;
hook.stdout_to_stderr = 1;
return run_command(&hook);
}
int run_hook_le(const char *const *env, const char *name, ...)
{
va_list args;
int ret;
va_start(args, name);
ret = run_hook_ve(env, name, args);
va_end(args);
return ret;
}
int capture_command(struct child_process *cmd, struct strbuf *buf, size_t hint)
{
cmd->out = -1;
if (start_command(cmd) < 0)
return -1;
if (strbuf_read(buf, cmd->out, hint) < 0) {
close(cmd->out);
finish_command(cmd); /* throw away exit code */
return -1;
}
close(cmd->out);
return finish_command(cmd);
}
run-command: add an asynchronous parallel child processor This allows to run external commands in parallel with ordered output on stderr. If we run external commands in parallel we cannot pipe the output directly to the our stdout/err as it would mix up. So each process's output will flow through a pipe, which we buffer. One subprocess can be directly piped to out stdout/err for a low latency feedback to the user. Example: Let's assume we have 5 submodules A,B,C,D,E and each fetch takes a different amount of time as the different submodules vary in size, then the output of fetches in sequential order might look like this: time --> output: |---A---| |-B-| |-------C-------| |-D-| |-E-| When we schedule these submodules into maximal two parallel processes, a schedule and sample output over time may look like this: process 1: |---A---| |-D-| |-E-| process 2: |-B-| |-------C-------| output: |---A---|B|---C-------|DE So A will be perceived as it would run normally in the single child version. As B has finished by the time A is done, we can dump its whole progress buffer on stderr, such that it looks like it finished in no time. Once that is done, C is determined to be the visible child and its progress will be reported in real time. So this way of output is really good for human consumption, as it only changes the timing, not the actual output. For machine consumption the output needs to be prepared in the tasks, by either having a prefix per line or per block to indicate whose tasks output is displayed, because the output order may not follow the original sequential ordering: |----A----| |--B--| |-C-| will be scheduled to be all parallel: process 1: |----A----| process 2: |--B--| process 3: |-C-| output: |----A----|CB This happens because C finished before B did, so it will be queued for output before B. To detect when a child has finished executing, we check interleaved with other actions (such as checking the liveliness of children or starting new processes) whether the stderr pipe still exists. Once a child closed its stderr stream, we assume it is terminating very soon, and use `finish_command()` from the single external process execution interface to collect the exit status. By maintaining the strong assumption of stderr being open until the very end of a child process, we can avoid other hassle such as an implementation using `waitpid(-1)`, which is not implemented in Windows. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-16 01:04:10 +01:00
enum child_state {
GIT_CP_FREE,
GIT_CP_WORKING,
GIT_CP_WAIT_CLEANUP,
};
struct parallel_processes {
void *data;
int max_processes;
int nr_processes;
get_next_task_fn get_next_task;
start_failure_fn start_failure;
task_finished_fn task_finished;
struct {
enum child_state state;
struct child_process process;
struct strbuf err;
void *data;
} *children;
/*
* The struct pollfd is logically part of *children,
* but the system call expects it as its own array.
*/
struct pollfd *pfd;
unsigned shutdown : 1;
int output_owner;
struct strbuf buffered_output; /* of finished children */
};
static int default_start_failure(struct child_process *cp,
struct strbuf *err,
void *pp_cb,
void *pp_task_cb)
{
int i;
strbuf_addstr(err, "Starting a child failed:");
for (i = 0; cp->argv[i]; i++)
strbuf_addf(err, " %s", cp->argv[i]);
return 0;
}
static int default_task_finished(int result,
struct child_process *cp,
struct strbuf *err,
void *pp_cb,
void *pp_task_cb)
{
int i;
if (!result)
return 0;
strbuf_addf(err, "A child failed with return code %d:", result);
for (i = 0; cp->argv[i]; i++)
strbuf_addf(err, " %s", cp->argv[i]);
return 0;
}
static void kill_children(struct parallel_processes *pp, int signo)
{
int i, n = pp->max_processes;
for (i = 0; i < n; i++)
if (pp->children[i].state == GIT_CP_WORKING)
kill(pp->children[i].process.pid, signo);
}
static struct parallel_processes *pp_for_signal;
static void handle_children_on_signal(int signo)
{
kill_children(pp_for_signal, signo);
sigchain_pop(signo);
raise(signo);
}
static void pp_init(struct parallel_processes *pp,
int n,
get_next_task_fn get_next_task,
start_failure_fn start_failure,
task_finished_fn task_finished,
void *data)
{
int i;
if (n < 1)
n = online_cpus();
pp->max_processes = n;
trace_printf("run_processes_parallel: preparing to run up to %d tasks", n);
pp->data = data;
if (!get_next_task)
die("BUG: you need to specify a get_next_task function");
pp->get_next_task = get_next_task;
pp->start_failure = start_failure ? start_failure : default_start_failure;
pp->task_finished = task_finished ? task_finished : default_task_finished;
pp->nr_processes = 0;
pp->output_owner = 0;
pp->shutdown = 0;
pp->children = xcalloc(n, sizeof(*pp->children));
pp->pfd = xcalloc(n, sizeof(*pp->pfd));
strbuf_init(&pp->buffered_output, 0);
for (i = 0; i < n; i++) {
strbuf_init(&pp->children[i].err, 0);
child_process_init(&pp->children[i].process);
pp->pfd[i].events = POLLIN | POLLHUP;
pp->pfd[i].fd = -1;
}
pp_for_signal = pp;
sigchain_push_common(handle_children_on_signal);
}
static void pp_cleanup(struct parallel_processes *pp)
{
int i;
trace_printf("run_processes_parallel: done");
for (i = 0; i < pp->max_processes; i++) {
strbuf_release(&pp->children[i].err);
child_process_clear(&pp->children[i].process);
}
free(pp->children);
free(pp->pfd);
/*
* When get_next_task added messages to the buffer in its last
* iteration, the buffered output is non empty.
*/
fputs(pp->buffered_output.buf, stderr);
strbuf_release(&pp->buffered_output);
sigchain_pop_common();
}
/* returns
* 0 if a new task was started.
* 1 if no new jobs was started (get_next_task ran out of work, non critical
* problem with starting a new command)
* <0 no new job was started, user wishes to shutdown early. Use negative code
* to signal the children.
*/
static int pp_start_one(struct parallel_processes *pp)
{
int i, code;
for (i = 0; i < pp->max_processes; i++)
if (pp->children[i].state == GIT_CP_FREE)
break;
if (i == pp->max_processes)
die("BUG: bookkeeping is hard");
code = pp->get_next_task(&pp->children[i].process,
&pp->children[i].err,
pp->data,
&pp->children[i].data);
if (!code) {
strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
strbuf_reset(&pp->children[i].err);
return 1;
}
pp->children[i].process.err = -1;
pp->children[i].process.stdout_to_stderr = 1;
pp->children[i].process.no_stdin = 1;
if (start_command(&pp->children[i].process)) {
code = pp->start_failure(&pp->children[i].process,
&pp->children[i].err,
pp->data,
&pp->children[i].data);
strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
strbuf_reset(&pp->children[i].err);
if (code)
pp->shutdown = 1;
return code;
}
pp->nr_processes++;
pp->children[i].state = GIT_CP_WORKING;
pp->pfd[i].fd = pp->children[i].process.err;
return 0;
}
static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout)
{
int i;
while ((i = poll(pp->pfd, pp->max_processes, output_timeout)) < 0) {
if (errno == EINTR)
continue;
pp_cleanup(pp);
die_errno("poll");
}
/* Buffer output from all pipes. */
for (i = 0; i < pp->max_processes; i++) {
if (pp->children[i].state == GIT_CP_WORKING &&
pp->pfd[i].revents & (POLLIN | POLLHUP)) {
int n = strbuf_read_once(&pp->children[i].err,
pp->children[i].process.err, 0);
if (n == 0) {
close(pp->children[i].process.err);
pp->children[i].state = GIT_CP_WAIT_CLEANUP;
} else if (n < 0)
if (errno != EAGAIN)
die_errno("read");
}
}
}
static void pp_output(struct parallel_processes *pp)
{
int i = pp->output_owner;
if (pp->children[i].state == GIT_CP_WORKING &&
pp->children[i].err.len) {
fputs(pp->children[i].err.buf, stderr);
strbuf_reset(&pp->children[i].err);
}
}
static int pp_collect_finished(struct parallel_processes *pp)
{
int i, code;
int n = pp->max_processes;
int result = 0;
while (pp->nr_processes > 0) {
for (i = 0; i < pp->max_processes; i++)
if (pp->children[i].state == GIT_CP_WAIT_CLEANUP)
break;
if (i == pp->max_processes)
break;
code = finish_command(&pp->children[i].process);
code = pp->task_finished(code, &pp->children[i].process,
&pp->children[i].err, pp->data,
&pp->children[i].data);
if (code)
result = code;
if (code < 0)
break;
pp->nr_processes--;
pp->children[i].state = GIT_CP_FREE;
pp->pfd[i].fd = -1;
child_process_init(&pp->children[i].process);
if (i != pp->output_owner) {
strbuf_addbuf(&pp->buffered_output, &pp->children[i].err);
strbuf_reset(&pp->children[i].err);
} else {
fputs(pp->children[i].err.buf, stderr);
strbuf_reset(&pp->children[i].err);
/* Output all other finished child processes */
fputs(pp->buffered_output.buf, stderr);
strbuf_reset(&pp->buffered_output);
/*
* Pick next process to output live.
* NEEDSWORK:
* For now we pick it randomly by doing a round
* robin. Later we may want to pick the one with
* the most output or the longest or shortest
* running process time.
*/
for (i = 0; i < n; i++)
if (pp->children[(pp->output_owner + i) % n].state == GIT_CP_WORKING)
break;
pp->output_owner = (pp->output_owner + i) % n;
}
}
return result;
}
int run_processes_parallel(int n,
get_next_task_fn get_next_task,
start_failure_fn start_failure,
task_finished_fn task_finished,
void *pp_cb)
{
int i, code;
int output_timeout = 100;
int spawn_cap = 4;
struct parallel_processes pp;
pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb);
while (1) {
for (i = 0;
i < spawn_cap && !pp.shutdown &&
pp.nr_processes < pp.max_processes;
i++) {
code = pp_start_one(&pp);
if (!code)
continue;
if (code < 0) {
pp.shutdown = 1;
kill_children(&pp, -code);
}
break;
}
if (!pp.nr_processes)
break;
pp_buffer_stderr(&pp, output_timeout);
pp_output(&pp);
code = pp_collect_finished(&pp);
if (code) {
pp.shutdown = 1;
if (code < 0)
kill_children(&pp, -code);
}
}
pp_cleanup(&pp);
return 0;
}