If you have the dream of writing the next big Web application that takes the world by storm, I have a tip for you. Use Unicode. Make sure that every aspect of your Web infrastructure for your application supports the free flow of Unicode data.
What is Unicode? It is a type of character encoding that supports the widest variety of characters. If you you want your Web app to have a chance of correctly taking in English, Chinese, and Arabic characters, you want to use Unicode.
You may have seen the meta tag in your HTML tag with a reference to UTF-8. Yep that’s setting the character encoding to your page.
<meta http-equiv=“Content-Type” content=“text/html; charset=UTF-8” />
Unfortunately, there is soooooooo much more to the process and I don’t have the time to write about it all right now.
For now just trust me, make sure that you’re using Unicode (utf-8, a type of Unicode) everywhere. When I mean everywhere, I mean your browser, HTML, PHP, server, your database… everywhere. Ultimately, doing this now will save you a whole lot of time down the road.
Seconded! I’m in the process now of converting our homebrew CMS to UTF with international support and it is proving to be a MUCH bigger headache then anyone had thought. Definitly wish we had gone to UTF-8 years ago but alas, hindsight is 20/20.
Third’ed, Unicode is the way to go. I have been digging deep into the Zend Framework which is awesome! Unicode support, with all kinds of localization and other neat features. Hopefully going to be publishing a Search Tutorial here soon!
I am currently using Zend Search Lucene at http://www.swfl-news.com/search
oh yea I have been passing in PHP Headers for UTF Character sets.
header("Content-type: text/html; charset=utf-8");
Every time I hear that someone is working on adding Unicode support to an existing app, I hear about what a headache it is. This (poor unicode support) is probably one of the biggest strikes, for me, against Ruby on Rails. Good work spreading the word!