git-commit-vandalism/csum-file.h
Derrick Stolee 2ca245f8be csum-file.h: increase hashfile buffer size
The hashfile API uses a hard-coded buffer size of 8KB and has ever since
it was introduced in c38138c (git-pack-objects: write the pack files
with a SHA1 csum, 2005-06-26). It performs a similar function to the
hashing buffers in read-cache.c, but that code was updated from 8KB to
128KB in f279894 (read-cache: make the index write buffer size 128K,
2021-02-18). The justification there was that do_write_index() improves
from 1.02s to 0.72s. Since our end goal is to have the index writing
code use the hashfile API, we need to unify this buffer size to avoid a
performance regression.

There is a buffer, 'check_buffer', that is used to verify the check_fd
file descriptor. When this buffer increases to 128K to fit the data
being flushed, it causes the stack to overflow the limits placed in the
test suite. To avoid issues with stack size, move both 'buffer' and
'check_buffer' to be heap pointers within 'struct hashfile'. The
'check_buffer' member is left as NULL unless check_fd is set in
hashfd_check(). Both buffers are cleared as part of finalize_hashfile()
which also frees the full structure.

Since these buffers are now on the heap, we can adjust their size based
on the needs of the consumer. In particular, callers to
hashfd_throughput() are expecting to report progress indicators as the
buffer flushes. These callers would prefer the smaller 8k buffer to
avoid large delays between updates, especially for users with slower
networks. When the progress indicator is not used, the larger buffer is
preferrable.

By adding a new trace2 region in the chunk-format API, we can see that
the writing portion of 'git multi-pack-index write' lowers from ~1.49s
to ~1.47s on a Linux machine. These effects may be more pronounced or
diminished on other filesystems. The end-to-end timing is too noisy to
have a definitive change either way.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-19 16:41:21 +09:00

75 lines
1.7 KiB
C

#ifndef CSUM_FILE_H
#define CSUM_FILE_H
#include "hash.h"
struct progress;
/* A SHA1-protected file */
struct hashfile {
int fd;
int check_fd;
unsigned int offset;
git_hash_ctx ctx;
off_t total;
struct progress *tp;
const char *name;
int do_crc;
uint32_t crc32;
size_t buffer_len;
unsigned char *buffer;
unsigned char *check_buffer;
};
/* Checkpoint */
struct hashfile_checkpoint {
off_t offset;
git_hash_ctx ctx;
};
void hashfile_checkpoint(struct hashfile *, struct hashfile_checkpoint *);
int hashfile_truncate(struct hashfile *, struct hashfile_checkpoint *);
/* finalize_hashfile flags */
#define CSUM_CLOSE 1
#define CSUM_FSYNC 2
#define CSUM_HASH_IN_STREAM 4
struct hashfile *hashfd(int fd, const char *name);
struct hashfile *hashfd_check(const char *name);
struct hashfile *hashfd_throughput(int fd, const char *name, struct progress *tp);
int finalize_hashfile(struct hashfile *, unsigned char *, unsigned int);
void hashwrite(struct hashfile *, const void *, unsigned int);
void hashflush(struct hashfile *f);
void crc32_begin(struct hashfile *);
uint32_t crc32_end(struct hashfile *);
/*
* Returns the total number of bytes fed to the hashfile so far (including ones
* that have not been written out to the descriptor yet).
*/
static inline off_t hashfile_total(struct hashfile *f)
{
return f->total + f->offset;
}
static inline void hashwrite_u8(struct hashfile *f, uint8_t data)
{
hashwrite(f, &data, sizeof(data));
}
static inline void hashwrite_be32(struct hashfile *f, uint32_t data)
{
data = htonl(data);
hashwrite(f, &data, sizeof(data));
}
static inline size_t hashwrite_be64(struct hashfile *f, uint64_t data)
{
data = htonll(data);
hashwrite(f, &data, sizeof(data));
return sizeof(data);
}
#endif