According to the HTTPArchive, the average transfer size of all HTML responses for a single website is 50kB. That's not much -- especially when compared to the size of JS, CSS, image, or even font resources -- but those 50kB nonetheless add to the ever increasing bloat of webpages. Bloated webpages make websites dreadfully slow, and even a minor increase in load time can cause a significant decrease in sales, according to both Google and Amazon. So HTML transfer size matters; it's an important metric to pay attention to and to reduce. That's why I'll be focusing on several ways in which to reduce that 50kb number in this article.
To begin, let's take a look at a few websites. A cursory check of the top 500 websites shows that they contain dozens of useless tags that neither add value to end-users nor to browsers tasked with rendering them. A case in point is the Engadget website, which can be inspected by right-clicking on the site's homepage and selecting View Source from the resultant context menu. Doing so reveals a multitude of tags that can be safely deleted without negatively impacting the website. Tags such as the keywords, favicons, pragma, comment, Twitter (yes, I'll explain shortly) and numerous other white-space characters can all be safely deleted. Don't believe me? Let's take a closer look at each of the aforementioned tags.
The keywords meta tag was once an important signal for search engines in determining the contents of a webpage; they used it in their ranking and classification algorithms. Over time, however, websites started to abuse the keywords tag by stuffing it with irrelevant content so as to improve their SEO. As a consequence, search engines like Google, Bing, and Yahoo stopped using the tag. So why are some websites still using it? The short answer is habit; that is to say, marketers and developers (and to a large extent Content Management Systems) continue to include it as a matter of course -- either automatically or as part of a routine or policy. The long answer is that there's still a lack of knowledge about the tag's depreciation -- and about SEO best practices in general -- within the online marketing community. So even though it's been almost a decade since search engines first announced that they'd stop using the tag, many websites still continue to use it to this day.
The favicon tags
Unlike the keywords tag, the favicon meta tag is still useful; it's the small image (typically 16x16 in size) that shows up in the upper left-hand corner of a browser's tab/window for a specific website. To indicate to the browser which favicon it should show, many websites use a link tag -- or several link tags, for that matter -- as is the case with the Engadget website. However, few websites realize (or more appropriately, their marketers and developers) that the link tag isn't necessary for displaying a favicon. That's because a browser will automatically check for a favicon at the website's root; in fact, not having it there can lead to artificially slow page load times and unnecessary 404 entries in a server's log file if it is nonexistent. Even on Apple devices, the apple-touch-icon-precomposed.png and apple-touch-icon.png icons are retrieved from the website's root by default, according to Wikipedia. So, when is using a link tag actually necessary for serving a favicon? The answer is whenever the favicon needs to change (a consequence of aggressive caching of the previous favicon on the part of browsers) or whenever multiple favicon sizes are preferred. Ironically, for most websites neither of the two aforementioned cases apply; that's because it's extremely rare that a website's favicon changes (it typically happens after a re-branding), and there's little evidence that end-users tend to save a website to their phone's Home screen (as opposed to just bookmarking the website or better still downloading its App) to warrant using multiple favicon sizes. Barring a lack of access to a website's server, I'd recommend avoiding the favicon and so-called "touch" icons altogether.
The Twitter tags
Wait, wait, wait... Let me explain! I have nothing against Twitter's meta tags; it's just that they're often used in conjunction with Facebook's OpenGraph meta tags (and that makes total sense, as they're both very popular social networks), but doing so creates a lot of redundancy. If you read Twitter's documentation (something that marketers and developers rarely do), it clearly states that there are specific "OpenGraph fallback behavior for each Twitter tag". That is to say, in the absent of the twitter:card meta tag, the OpenGraph og:type tag will be used. The same is true for the twitter:description meta tag, which fallback to using the OpenGraph og:description tag. And then there's the twitter:title and twitter:image meta tags, both of which fallback to using the og:title and og:image tags, respectively. In fact, "if an og:type, og:title and og:description exist in the markup but twitter:card is absent, then a summary card may [still] be rendered", according to Twitter. So as it turns out, you can indeed kill two birds with one stone; or, meta tag in this case, by relying on Facebook's OpenGraph meta tags as substitutes for Twitter's.
The pragma, expires, and cache-control tags
Instead of being hard-coded in HTML, the pragma, expires, and cache-control meta tags should instead be specified as headers on the server-side -- typically as part of the HTTP response object (the exception being for websites without direct server access). Then again, if a website is so concerned with the caching of its contents, I think it should invest in a dedicated server in order to properly manage said caching. But I digress. Let's hear what Microsoft has to say about the aforementioned tags. According to Microsoft, the "Pragma: No-cache" tag might not even prevent a webpage from being cached, at least not in Internet Explorer; and then there's the MDN web docs, which states that the pragma tag is not a reliable replacement for the general purpose cache-control tag. Even the expires tag can be ignored if there's a cache-control header with the "max-age" or "s-max-age" directive in the response, says MDN. So if the pragma and expires tags are useless, why are websites like Engadget using them? I suspect that this is again due to habit, and to both marketers and developers not being fully informed about this information.
The comments and whitespaces
All of the comments and whitespaces on websites are there solely for the benefit of marketers and developers; they serve no practical purpose to end-users or to browsers. And while both comments and whitespaces aid in the understanding and debugging of code, in so doing they actually do a disservice to users -- especially when deployed Production. I argue that they're not needed, especially in the year 2017 when there are a multitude of other options available to developers such as server-side comments (those that don't output in HTML code) and impressive browser-based debuggers like Chrome's DevTools or even URL parameter based feature-flags that enable a debug mode. Frankly, there's no need to penalize all of a website's users today with unnecessarily bloated webpages.
In conclusion, I've only touched upon a few of the many ways in which marketers and developers can reduce HTML transfer size by deleting useless code from their websites. There are many others; namely, by removing quotes from specific HTML attributes; by not specifying default attributes for HTML forms; by removing unnecessary prefixes from URLs; etc. A great resource to learn more about reducing HTML transfer size is Google's PageSpeed. I hope you found this article helpful.
To begin, let's take a look at a few websites. A cursory check of the top 500 websites shows that they contain dozens of useless tags that neither add value to end-users nor to browsers tasked with rendering them. A case in point is the Engadget website, which can be inspected by right-clicking on the site's homepage and selecting View Source from the resultant context menu. Doing so reveals a multitude of tags that can be safely deleted without negatively impacting the website. Tags such as the keywords, favicons, pragma, comment, Twitter (yes, I'll explain shortly) and numerous other white-space characters can all be safely deleted. Don't believe me? Let's take a closer look at each of the aforementioned tags.
The keywords tag
The keywords meta tag was once an important signal for search engines in determining the contents of a webpage; they used it in their ranking and classification algorithms. Over time, however, websites started to abuse the keywords tag by stuffing it with irrelevant content so as to improve their SEO. As a consequence, search engines like Google, Bing, and Yahoo stopped using the tag. So why are some websites still using it? The short answer is habit; that is to say, marketers and developers (and to a large extent Content Management Systems) continue to include it as a matter of course -- either automatically or as part of a routine or policy. The long answer is that there's still a lack of knowledge about the tag's depreciation -- and about SEO best practices in general -- within the online marketing community. So even though it's been almost a decade since search engines first announced that they'd stop using the tag, many websites still continue to use it to this day.
The favicon tags
Unlike the keywords tag, the favicon meta tag is still useful; it's the small image (typically 16x16 in size) that shows up in the upper left-hand corner of a browser's tab/window for a specific website. To indicate to the browser which favicon it should show, many websites use a link tag -- or several link tags, for that matter -- as is the case with the Engadget website. However, few websites realize (or more appropriately, their marketers and developers) that the link tag isn't necessary for displaying a favicon. That's because a browser will automatically check for a favicon at the website's root; in fact, not having it there can lead to artificially slow page load times and unnecessary 404 entries in a server's log file if it is nonexistent. Even on Apple devices, the apple-touch-icon-precomposed.png and apple-touch-icon.png icons are retrieved from the website's root by default, according to Wikipedia. So, when is using a link tag actually necessary for serving a favicon? The answer is whenever the favicon needs to change (a consequence of aggressive caching of the previous favicon on the part of browsers) or whenever multiple favicon sizes are preferred. Ironically, for most websites neither of the two aforementioned cases apply; that's because it's extremely rare that a website's favicon changes (it typically happens after a re-branding), and there's little evidence that end-users tend to save a website to their phone's Home screen (as opposed to just bookmarking the website or better still downloading its App) to warrant using multiple favicon sizes. Barring a lack of access to a website's server, I'd recommend avoiding the favicon and so-called "touch" icons altogether.
The Twitter tags
Wait, wait, wait... Let me explain! I have nothing against Twitter's meta tags; it's just that they're often used in conjunction with Facebook's OpenGraph meta tags (and that makes total sense, as they're both very popular social networks), but doing so creates a lot of redundancy. If you read Twitter's documentation (something that marketers and developers rarely do), it clearly states that there are specific "OpenGraph fallback behavior for each Twitter tag". That is to say, in the absent of the twitter:card meta tag, the OpenGraph og:type tag will be used. The same is true for the twitter:description meta tag, which fallback to using the OpenGraph og:description tag. And then there's the twitter:title and twitter:image meta tags, both of which fallback to using the og:title and og:image tags, respectively. In fact, "if an og:type, og:title and og:description exist in the markup but twitter:card is absent, then a summary card may [still] be rendered", according to Twitter. So as it turns out, you can indeed kill two birds with one stone; or, meta tag in this case, by relying on Facebook's OpenGraph meta tags as substitutes for Twitter's.
The pragma, expires, and cache-control tags
Instead of being hard-coded in HTML, the pragma, expires, and cache-control meta tags should instead be specified as headers on the server-side -- typically as part of the HTTP response object (the exception being for websites without direct server access). Then again, if a website is so concerned with the caching of its contents, I think it should invest in a dedicated server in order to properly manage said caching. But I digress. Let's hear what Microsoft has to say about the aforementioned tags. According to Microsoft, the "Pragma: No-cache" tag might not even prevent a webpage from being cached, at least not in Internet Explorer; and then there's the MDN web docs, which states that the pragma tag is not a reliable replacement for the general purpose cache-control tag. Even the expires tag can be ignored if there's a cache-control header with the "max-age" or "s-max-age" directive in the response, says MDN. So if the pragma and expires tags are useless, why are websites like Engadget using them? I suspect that this is again due to habit, and to both marketers and developers not being fully informed about this information.
The comments and whitespaces
All of the comments and whitespaces on websites are there solely for the benefit of marketers and developers; they serve no practical purpose to end-users or to browsers. And while both comments and whitespaces aid in the understanding and debugging of code, in so doing they actually do a disservice to users -- especially when deployed Production. I argue that they're not needed, especially in the year 2017 when there are a multitude of other options available to developers such as server-side comments (those that don't output in HTML code) and impressive browser-based debuggers like Chrome's DevTools or even URL parameter based feature-flags that enable a debug mode. Frankly, there's no need to penalize all of a website's users today with unnecessarily bloated webpages.
In conclusion, I've only touched upon a few of the many ways in which marketers and developers can reduce HTML transfer size by deleting useless code from their websites. There are many others; namely, by removing quotes from specific HTML attributes; by not specifying default attributes for HTML forms; by removing unnecessary prefixes from URLs; etc. A great resource to learn more about reducing HTML transfer size is Google's PageSpeed. I hope you found this article helpful.
Comments
Post a Comment