list of accepted encodings that can be passed to mb_convert_encoding as arguments
I was looking at the documentation for PHP’s mb_convert_encoding function. ()
I was having difficulty finding the full list of strings that i could pass to its arguments to represent encodings - string $to_encoding [, mixed $from_encoding ]
If I wanted to pass a latin 1 encoding as the first argument, what, I wondered, was the exact string to use. The issue of character encoding is tricky enough, so, when things are not working, its good to know that you are at least using the correct letters and numbers to represent the character code that you mean.
UPDATE: Well I’ve just taken piece of text from the list at the URL above - “Windows-1251 (CP1251)” - passed it as an argument to the mb-convert-encoding function got an error!
“Warning: mb_convert_encoding() [function.mb-convert-encoding]: Illegal character encoding specified in…”
So, now I’ve learned learned two things -
- Even though it says “Any of those Character encodings can be specified in the encoding parameter of mbstring functions” - its not a syntactically correct list of one character encoding per list item.
- If i get it wrong - at least I get an error back - so trial and error is not going to be as laborious as I first thought.
[On that page it also refers to http://www.iana.org which is holds a “character registry of names”.]
This page may be of interest too: http://www2.uiah.fi/~joorava/charset/preferred.php
This page shows the list I was search for:
http://uk3.php.net/mbstring (scroll down the page a bit)
Here is a copy of the list at it stands on jan 2008
- UCS-4
- UCS-4BE
- UCS-4LE
- UCS-2
- UCS-2BE
- UCS-2LE
- UTF-32
- UTF-32BE
- UTF-32LE
- UTF-16
- UTF-16BE
- UTF-16LE
- UTF-7
- UTF7-IMAP
- UTF-8
- ASCII
- EUC-JP
- SJIS
- eucJP-win
- SJIS-win
- ISO-2022-JP
- JIS
- ISO-8859-1
- ISO-8859-2
- ISO-8859-3
- ISO-8859-4
- ISO-8859-5
- ISO-8859-6
- ISO-8859-7
- ISO-8859-8
- ISO-8859-9
- ISO-8859-10
- ISO-8859-13
- ISO-8859-14
- ISO-8859-15
- byte2be
- byte2le
- byte4be
- byte4le
- BASE64
- HTML-ENTITIES
- 7bit
- 8bit
- EUC-CN
- CP936
- HZ
- EUC-TW
- CP950
- BIG-5
- EUC-KR
- UHC (CP949)
- ISO-2022-KR
- Windows-1251 (CP1251)
- Windows-1252 (CP1252)
- CP866 (IBM866)
- KOI8-R