Baptiste Daroussin
2015-11-10 22:26:37 UTC
Hi all,
When merging the new collation, the locales has been reworked.
ache@ raised a good point about LOCALE C and POSIX and by extension the locales
US-ASCII: should we take the opportunity to change that:
First a desciption of the situation: nl_langinfo is not normalised each OS can
return the encoding they want. While it is pretty obvious about what should be
returned for for regular encodings (iso-8859* or UTF-8), for C and POSIX
locales, FreeBSD used to return US-ASCII (and does it again since today).
Lots of third party application (python, perl, tcl etc) tries to figure out the
encoding by matching against a table of "known" output of nl_langinfo()
The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does
not. which means tcl is not able to determine what encoding is needed for the C
and POSIX locales.
On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most
application knows what linux returns.
That means we need to teach all upstream about US-ASCII all the time.
The proposals are:
- Do not change what we have always done.
- Change it to something that makes sense "C" (what we tried with "POSIX" which
was a very bad idea, but "C" seems to be commonly recognised by application as
ASCII)
- Let's report the same as Linux, that will simplify portability
- Let's be obvious and report ASCII (also commonly recognised by applications)
The next question is if we change the above, would it make sense to also report
ASCII for ASCII locales:
- en_AU.US-ASCII
- en_CA.US-ASCII
- en_GB.US-ASCII
- en_NZ.US-ASCII
- en_US.US-ASCII
- en_ZA.US-ASCII
Which would require some work or should we make them return ASCII or even
ANSI_X3.4-1968.
Please share your opinion here
Best regards,
Bapt
When merging the new collation, the locales has been reworked.
ache@ raised a good point about LOCALE C and POSIX and by extension the locales
US-ASCII: should we take the opportunity to change that:
First a desciption of the situation: nl_langinfo is not normalised each OS can
return the encoding they want. While it is pretty obvious about what should be
returned for for regular encodings (iso-8859* or UTF-8), for C and POSIX
locales, FreeBSD used to return US-ASCII (and does it again since today).
Lots of third party application (python, perl, tcl etc) tries to figure out the
encoding by matching against a table of "known" output of nl_langinfo()
The thing is not all are aware that FreeBSD uses US-ASCII, for example tcl does
not. which means tcl is not able to determine what encoding is needed for the C
and POSIX locales.
On Linux they to return ANSI_X3.4-1968 (also known as US-ASCII) and most
application knows what linux returns.
That means we need to teach all upstream about US-ASCII all the time.
The proposals are:
- Do not change what we have always done.
- Change it to something that makes sense "C" (what we tried with "POSIX" which
was a very bad idea, but "C" seems to be commonly recognised by application as
ASCII)
- Let's report the same as Linux, that will simplify portability
- Let's be obvious and report ASCII (also commonly recognised by applications)
The next question is if we change the above, would it make sense to also report
ASCII for ASCII locales:
- en_AU.US-ASCII
- en_CA.US-ASCII
- en_GB.US-ASCII
- en_NZ.US-ASCII
- en_US.US-ASCII
- en_ZA.US-ASCII
Which would require some work or should we make them return ASCII or even
ANSI_X3.4-1968.
Please share your opinion here
Best regards,
Bapt