If you use anything other than the most basic English text, people may not be able to read the content you create unless you say what character encoding you used. Web pages must be able to communicate seamlessly with back-end scripts, databases, and such.
Character entity references are case-sensitive. In this case, they are proposing that the HTTP header say nothing about the document encoding.
Any character encoding declaration in the HTTP header will override declarations inside the page.
This means that you couldn't use this to correct incorrect declarations either. If no BOM can be found it returns undef in scalar context and an empty list in list context.
Unfortunately, there are many different character sets and character encodings, ie. Basically, you can visualise this by assuming that all characters are stored in computers using a special code, like the ciphers used in espionage.
Four character entity references deserve special mention since they are frequently used to escape special characters:. For HTML4: This routine should not be used with strings with the UTF-8 flag turned on.
On the other hand, because of the disadvantages listed above we recommend that you should always declare the encoding information inside the document as well.
Using UTF-8 not only simplifies authoring of pages, it avoids unexpected results on form submission and URL encodings, which use the document's character encoding by default. For such encodings, or when hardware or software configurations do not allow users to input some document characters directly, authors may use SGML character references.
Feel free to just skip to the section Further reading. Whether the routine should look for an encoding declaration in the XML declaration of the document if any , defaults to 1.
The new Encoding specification now provides a list that has been tested against actual browser implementations. Content authors should declare the character encoding of their pages using one of the methods described in Declaring character encodings in HTML.
Which character encoding should I use for my content, and how do I apply it to my content? If you use the meta element with a charset attribute this is not something you need to consider. If, for some reason, you have no choice, here are some rules for declaring the encoding.