Hmmm... what's happening I think is that when you just type the character, your browser is sometimes sending it in an ISO-8959-1 encoding (which my script transforms correctly into the appropriate 7-bit clean HTML encoding) whereas other times it decides to go down a two-byte UTF-8 route. It's not clear why it would choose one rather than another. I shall try and see if I can get this server to ask your browser always to use ISO-8959-1 (which will mean people typing in Mandarin will become unstuck); if not I shall investigate PHP's abilities to look at what you send and see if it can do the transformation properly.