Previously, in this post, I recommended that webmasters tweak their IIS MIME type settings so that static HTML files would specify a charset of UTF-8. It turns out that this causes some problems.
The issue is that some web crawlers / search engines send an HTTP request header like the following:
Accept: text/html
If you’ve configured IIS MIME types to send “text/html; charset=utf-8”, this does not match “text/html”, so IIS will actually return an error (i.e. a 40x or 50x HTTP status code), and the page contents will not be returned to the crawler.
Thus, one should not do as I’ve described in this post. Oh well, looks like setting the charset via a Meta Tag is the way to go for static HTML pages.
I happened to discover this issue by running LogParser on my IIS log files, looking for any unexpected HTTP status codes.