Character encoding assigns characters and symbols of human writing into numbers so that they can be digitally stored and transmitted. These sets of encoded characters are called character sets and there are many different character sets. For HTML documents, it is essential to choose the appropriate character set so that web page content can be rendered and displayed correctly.
When you see random squares and symbols in a web page content, this means that it is not using the correct character set. Not only does this affect user experience and text readability but this can also affect your SEO. Search engine crawlers cannot index your content on the right keywords because it cannot understand the data that is indexed from your webpage.
What character set should I use?
In most cases, the UTF-8 character set is enough for use in your HTML document. This character set contains the majority of symbols and characters used in the human language.
How do I tell my browser to use UTF-8 character encoding?
There are two common ways to specify the character set in HTML documents:
- HTTP headers – the HTTP protocol facilitates the transfer of information on the web. When you browse the internet, your browser sends a request to a web server. The web server sends back a response which contains the header and the body (content). The header contains additional information about the response, one of which is the character encoding used. Add this to the header to use UTF-8 content:
Content-Type: text/html; charset=utf-8
- HTML Code – another way to specify a character set is through the <meta> html tag. Add this html code inside the <head> tag to use UTF-8:
<meta http-equiv=”Content-Type” content=”text/html; charset=utf-8″>
or
<meta charset=”utf-8″>
Please note, however, that the specification in the HTTP header supersedes that in the HTML code. So it is best to set up your web server correctly.
Further Reading:
Character Encoding – seobility