Release: Free recode 3.5

Release: Free recode 3.5

Post by Fran?ois Pinar » Mon, 07 Jun 1999 04:00:00

Hello, everybody.

After nearly five years since the last official release, I'm pleased to
announce that version 3.5 of the Free `recode' program and library is
now available.  The canonical distribution point for this version is either:

An MS-DOS port of a late pretest may be found as file `recode.exe' in the
same directory.  Thanks to all contributors, pretesters, and forum members.

The `recode' library converts files between character sets and usages.
The library recognises or produces 215 different character sets and surfaces
under 673 names and is able to transliterate files between almost any pair.
When exact transliteration are not possible, it may get rid of the offending
characters or fall back on approximations.  Most RFC 1345 character sets
are supported.  The `recode' program is a handy front-end to the library.

Most reported problems are solved.  Some bugs or suggestions may be pending.

under construction at `'.

Here are the user visible changes since the last official version:

| NEWS |

* Incompatible changes
 + A double dot `..' should now be used instead of a colon `:'.
 + Option --force (-f) is needed to pursue recoding despite errors.
 + There is no more quoting for special characters within charsets names.
 + Auto check (`-a') and popen (`-o') options have been withdrawn.
 + Some charsets and aliases were deleted, see `Charsets & aliases' below.

* Extended features
 + Program messages are available in localised form for many languages.
 + Long character names are available in French, if LANGUAGE is set to `fr'.
 + A new request syntax allows for recode chaining, and for surfaces.
 + Option --header-file (-h) accepts a language parameter, and Perl is new.
 + Full charset listings now show the UCS-2 value for characters.
 + Option --known=PAIRS (-k) also accepts octal and hexadecimal numbers.
 + Option --list (-l) better sorts charsets and aliases, also fully written.
 + Charset `RFC1345' implements mnemonic+ascii+38, and is now reversible.
 + HTML is not limited anymore to Latin-1, HTML 4.0 entities are supported.

* New features
 + Euro support.
 + Updated RFC 1345 set of tables, from Keld Simonsen.
 + Some African charsets and transliterated forms.
 + Conversions for ISO 10646 and Unicode.
 + Combining or explosion of UCS-2 diacriticized characters and ligatures.
 + Implementation of surfaces, see `Surfaces & aliases' below.
 + Mixed mode for recoding only comments and strings in C sources or PO files.
 + A stand-alone recoding library gets installed, often as a shared library.
 + Option --find-subsets (-T) lists charsets which are subsets of another.
 + The library may generate testing data, and study character frequencies.

* Charsets & aliases

 + New ISO 10646 and Unicode charsets
  - combined-UCS-2: pseudo-charset.
  - count-characters: pseudo-charset.
  - dump-with-names: pseudo-charset.
  - ISO-10646-UCS-2: aliases are UNICODE-1-1, BMP, rune and u2.
  - ISO-10646-UCS-4: aliases are 10646, ISO-10646, UCS-4 and u4.
  - UNICODE-1-1-UTF-7: aliases are TF-7 and u7.
  - UTF-8: aliases are UTF-2, UTF-FSS, FSS_UTF, TF-8 and u8.
  - UTF-16: aliases are Unicode, TF-16 and u6.

 + RFC 1345.bis matters
  - Deleted charsets
     dk-us, us-dk (because of &duplicate which `recode' does not handle yet).
  - New charsets
     baltic (alias is iso-ir-179); CP1250 (1250, ms-ee, windows-1250);
     CP1251 (1251, ms-cyrl, windows-1251); CP1252 (1252, ms-ansi, windows-1252);
     CP1253 (1253, ms-greek, windows-1253);
     CP1254 (1254, ms-turk, windows-1254); CP1255 (1255, ms-hebr, windows-1255);
     CP1256 (1256, ms-arab, windows-1256);
     CP1257 (1257, WinBaltRim, windows-1257);
     CWI (CWI-2, cp-hu); EBCDIC-IS-FRISS (friss);
     GOST_19768-87 with aliases of previous GOST_19768-74;
     IBM256 (256, CP256, EBCDIC-INT1); IBM875 (875, CP875, EBCDIC-Greek);
     IBM1004 (1004, CP1004, os2latin1); IBM1047 (1047, CP1047);
     ISO-8859-13 (ISO_8859-13:1998, iso-baltic, iso-ir-179a, l7, latin7);
     ISO-8859-14 (ISO_8859-14:1998, iso-celtic, iso-ir-199, l8, latin8);
     ISO-8859-15 (ISO_8859-15:1998, iso-ir-203, l9, latin9);
     KOI-7; KOI-8 (GOST_19768-74); KOI8-R; KOI8-RU; KOI8-U;
     macintosh_ce (macce); mac-is;
     NeXTSTEP (next) yet previous `recode' had it outside RFC 1345.
  - Alias promoted to charset (with previous charset becoming alias)
     ISO-646.basic (with ISO-646.basic:1983); ISO-646.irv (ISO-646.irv:1983);
     ISO_5427-ext (ISO_5427:1981); ISO_5428 (ISO_5428:1980);
     ISO-8859-1 (ISO_8859-1:1987); ISO-8859-2 (ISO_8859-2:1987);
     ISO-8859-3 (ISO_8859-3:1988); ISO-8859-4 (ISO_8859-4:1988);
     ISO-8859-5 (ISO_8859-5:1988); ISO-8859-6 (ISO_8859-6:1987);
     ISO-8859-7 (ISO_8859-7:1987); ISO-8859-8 (ISO_8859-8:1988);
     ISO-8859-9 (ISO_8859-9:1989); ISO-8859-10 (latin6);
     NC_NC00-10 (NC_NC00-10:81); sami (latin-lap).
  - New aliases
     037 (for charset IBM037); 038 (IBM038); 273 (IBM273); 274 (IBM274);
     275 (IBM275); 278 (IBM278); 280 (IBM280); 281 (IBM281); 284 (IBM284);
     285 (IBM285); 290 (IBM290); 297 (IBM297); 367 (ANSI_X3.4-1968);
     420 (IBM420); 423 (IBM423); 424 (IBM424); 500, 500V1 (IBM500);
     819 (ISO-8859-1); 864 (IBM864); 868 (IBM868); 870 (IBM870);
     871 (IBM871); 880 (IBM880); 891 (IBM891); 903 (IBM903); 905 (IBM905);
     912, CP912, IBM912 (ISO-8859-2); 918 (IBM918); 1026 (IBM1026);
     ECMA-113, ECMA-113:1986 (ECMA-Cyrillic); GOST_19768-74 (KOI8);
     ISO_8859-N (ISO-8859-N) for N = 1 through 10 and 13 through 15;
     ISO_8859-10:1993 (ISO-8869-10); iso-ir-170 (INVARIANT);
     KOI8_L2 (CSN_369103); pclatin2, pcl2 (IBM852); SS636127 (SEN_850200_B).

 + New African charsets
  - AFRL1-101-BPI_OCIL: aliases are t-francais and t-fra.
  - AFRFUL-102-BPI_OCIL: aliases are bambara, bra, ewondo and fulfulde.
  - AFRFUL-103-BPI_OCIL: aliases are t-bambara, t-bra, t-ewondo and t-fulfulde.
  - AFRLIN-104-BPI_OCIL: aliases are lingala, lin, sango and wolof.
  - AFRLIN-105-BPI_OCIL: aliases are t-lingala, t-lin, t-sango and t-wolof.

 + Extra miscellaneous charsets
  - KEYBCS2, Kamenicky.
  - CORK, T1.
  - KOI-8_CS2.

 + New HTML pseudo-charsets
  - HTML_1.1: alias is h1.
  - HTML_2.0: aliases are RFC 1866, 1866 and h2.
  - HTML-i18n: alias is RFC 2070.
  - HTML_3.2: reimplemented; alias is h3.
  - HTML_4.0: aliases are h4, HTML and h.
  - Deleted aliases: HTF, 8859, ISO 8859, Entities, SGML, WWW, w3.

* Surfaces & aliases

 + New MIME encoding surfaces
  - Base64: aliases are 64 and b64.
  - Quoted-Printable: aliases are qp and Quote-Printable.

 + New permutation surfaces
  - 21-Permutation: alias is swabytes.
  - 4321-Permutation.

 + New end of line surfaces
  - CR.
  - CR-LF: alias is cl.

 + New (fully reversible) dump surfaces
  - Decimal-1: aliases are d and d1.
  - Decimal-2: alias is d2.
  - Decimal-4: alias is d4.
  - Hexadecimal-1: aliases are x and x1.
  - Hexadecimal-2: alias is x2.
  - Hexadecimal-4: alias is x4.
  - Octal-1: aliases are o and o1.
  - Octal-2: alias is o2.
  - Octal-4: alias is o4.

 + New miscellaneous surfaces.
  - data, test7, test8, test15, test16.

Fran?ois Pinard


1. upgrading 3.5 Release to 4.0 Release error

i am trying to upgrade my freebsd 3.5 to 4.0.  i downloaded the source
using cvsup and have all the files.

but when i go and do

make buildworld

it stops with an error code.  it says:

install: /usr/src/gnu/lib/libobjc/../../../contrib/gcc/objc/hash.h: No
such file or directory

do i need to get this separately?

peter choe

2. Multi-boot 95, 98, NT4 & Linux on separate partitions???

3. Ch (C compatible) shell 3.5 released

4. how to use a port below 1024 from user (not root)

5. What happen to 3.5-RELEASE?

6. unTerminal VGAN Plus under ODT 1.1

7. Need help on mail( SUN OS4 Release 3.5)

8. Just try this, it will work

9. FreeBSD 3.5 RELEASE notes??

10. CD-Rom Control 3.5 Released

11. ANNOUNCE: FREE Eval of NetTracker 3.5 Web Log Analysis Software for Linux Web Servers

12. Free Windows NT Server 3.5 evaluation

13. problems compiling ppp2.3.5/BLOCK ON FREELIST AT nnnnnn ISN'T FREE