Character encoding: the light comes on...

Apr 27, 2004

Well, after being given a Thai language file for the Forum, a Bulgarian language file for the Ringmaker, and throwing myself at character sets I couldn't read at all... I think I'm finally starting to get the hang of this character encoding business.

I mean, I understood it before, just not how the different sets worked together and how they were displayed when the current document uses the right encoding.

For instance, why a character would display correctly or incorrectly in UTF-8 was voodoo to me. :) Now I understand!

Anyway, I found a nice script to recode Bulgarian to UTF-8 at PHP.net but I couldn't find the same for windows-874 (Thai).

So after much searching I asked on Usenet and what do you know? I got pointed to a Perl script that recoded Thai and from there it was a simple matter to translate it to PHP. If you'd like the function I came up with, you can download it from my PHP page.


Comments closed

Recent posts

  1. Book Review - The Forever Man by Gordon R Dickson Jun 2025
  2. How to calculate the day of the week from a date... in your head Mar 2025
  3. Version 1.54 of the Virtual Keyboard Interface Javascript Released Nov 2024
  4. Customize Clipboard Content on Copy: Caveats Dec 2023
  5. Orcinus Site Search now available on Github Apr 2023

Archive