git-commit-vandalism/contrib/update-unicode
Beat Bolli 3f0a386309 update_unicode.sh: pin the uniset repo to a known good commit
The uniset upstream has added more commits that for example change the
hexadecimal output in '--32' mode to decimal. Let's pin the repo to a
commit that still outputs the width tables in the format we want.

Signed-off-by: Beat Bolli <dev+git@drbeat.li>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-13 16:12:48 -08:00
..
.gitignore update_unicode.sh: move it into contrib/update-unicode 2016-12-13 16:12:47 -08:00
README update_unicode.sh: move it into contrib/update-unicode 2016-12-13 16:12:47 -08:00
update_unicode.sh update_unicode.sh: pin the uniset repo to a known good commit 2016-12-13 16:12:48 -08:00

TL;DR: Run update_unicode.sh after the publication of a new Unicode
standard and commit the resulting unicode_widths.h file.

The long version
================

The Git source code ships the file unicode_widths.h which contains
tables of zero and double width Unicode code points, respectively.
These tables are generated using update_unicode.sh in this directory.
update_unicode.sh itself uses a third-party tool, uniset, to query two
Unicode data files for the interesting code points.

On first run, update_unicode.sh clones uniset from Github and builds it.
This requires a current-ish version of autoconf (2.69 works per December
2016).

On each run, update_unicode.sh checks whether more recent Unicode data
files are available from the Unicode consortium, and rebuilds the header
unicode_widths.h with the new data. The new header can then be
committed.