Skip to main content

Reduce the transfer size of HTML responses

According to the HTTPArchive, the average transfer size of all HTML responses for a single website is 50kB.  That's not much -- especially when compared to the size of JS, CSS, image, or even font resources -- but those 50kB nonetheless add to the ever increasing bloat of webpages.  Bloated webpages make websites dreadfully slow, and even a minor increase in load time can cause a significant decrease in sales, according to both Google and Amazon.  So HTML transfer size matters; it's an important metric to pay attention to and to reduce.  That's why I'll be focusing on several ways in which to reduce that 50kb number in this article.

To begin, let's take a look at a few websites. A cursory check of the top 500 websites shows that they contain dozens of useless tags that neither add value to end-users nor to browsers tasked with rendering them.  A case in point is the Engadget website, which can be inspected by right-clicking on the site's homepage and selecting View Source from the resultant context menu.  Doing so reveals a multitude of tags that can be safely deleted without negatively impacting the website.  Tags such as the keywords, favicons, pragma, comment, Twitter (yes, I'll explain shortly) and numerous other white-space characters can all be safely deleted.  Don't believe me?  Let's take a closer look at each of the aforementioned tags.

The keywords tag

The keywords meta tag was once an important signal for search engines in determining the contents of a webpage; they used it in their ranking and classification algorithms.  Over time, however, websites started to abuse the keywords tag by stuffing it with irrelevant content so as to improve their SEO.  As a consequence, search engines like Google, Bing, and Yahoo stopped using the tag.  So why are some websites still using it?  The short answer is habit; that is to say, marketers and developers (and to a large extent Content Management Systems) continue to include it as a matter of course -- either automatically or as part of a routine or policy.  The long answer is that there's still a lack of knowledge about the tag's depreciation -- and about SEO best practices in general -- within the online marketing community.  So even though it's been almost a decade since search engines first announced that they'd stop using the tag, many websites still continue to use it to this day.

The favicon tags

Unlike the keywords tag, the favicon meta tag is still useful; it's the small image (typically 16x16 in size) that shows up in the upper left-hand corner of a browser's tab/window for a specific website.  To indicate to the browser which favicon it should show, many websites use a link tag -- or several link tags, for that matter -- as is the case with the Engadget website.  However, few websites realize (or more appropriately, their marketers and developers) that the link tag isn't necessary for displaying a favicon.  That's because a browser will automatically check for a favicon at the website's root; in fact, not having it there can lead to artificially slow page load times and unnecessary 404 entries in a server's log file if it is nonexistent.  Even on Apple devices, the apple-touch-icon-precomposed.png and apple-touch-icon.png icons are retrieved from the website's root by default, according to Wikipedia.  So, when is using a link tag actually necessary for serving a favicon?  The answer is whenever the favicon needs to change (a consequence of aggressive caching of the previous favicon on the part of browsers) or whenever multiple favicon sizes are preferred.  Ironically, for most websites neither of the two aforementioned cases apply; that's because it's extremely rare that a website's favicon changes (it typically happens after a re-branding), and there's little evidence that end-users tend to save a website to their phone's Home screen (as opposed to just bookmarking the website or better still downloading its App) to warrant using multiple favicon sizes.  Barring a lack of access to a website's server, I'd recommend avoiding the favicon and so-called "touch" icons altogether.

The Twitter tags

Wait, wait, wait... Let me explain!  I have nothing against Twitter's meta tags; it's just that they're often used in conjunction with Facebook's OpenGraph meta tags (and that makes total sense, as they're both very popular social networks), but doing so creates a lot of redundancy.  If you read Twitter's documentation (something that marketers and developers rarely do), it clearly states that there are specific "OpenGraph fallback behavior for each Twitter tag".  That is to say, in the absent of the twitter:card meta tag, the OpenGraph og:type tag will be used.  The same is true for the twitter:description meta tag, which fallback to using the OpenGraph og:description tag.  And then there's the twitter:title and twitter:image meta tags, both of which fallback to using the og:title and og:image tags, respectively.  In fact, "if an og:type, og:title and og:description exist in the markup but twitter:card is absent, then a summary card may [still] be rendered", according to Twitter.  So as it turns out, you can indeed kill two birds with one stone; or, meta tag in this case, by relying on Facebook's OpenGraph meta tags as substitutes for Twitter's.

The pragma, expires, and cache-control tags

Instead of being hard-coded in HTML, the pragma, expires, and cache-control meta tags should instead be specified as headers on the server-side -- typically as part of the HTTP response object (the exception being for websites without direct server access).  Then again, if a website is so concerned with the caching of its contents, I think it should invest in a dedicated server in order to properly manage said caching.  But I digress.  Let's hear what Microsoft has to say about the aforementioned tags.  According to Microsoft, the "Pragma: No-cache" tag might not even prevent a webpage from being cached, at least not in Internet Explorer; and then there's the MDN web docs, which states that the pragma tag is not a reliable replacement for the general purpose cache-control tag.  Even the expires tag can be ignored if there's a cache-control header with the "max-age" or "s-max-age" directive in the response, says MDN.  So if the pragma and expires tags are useless, why are websites like Engadget using them?  I suspect that this is again due to habit, and to both marketers and developers not being fully informed about this information.

The comments and whitespaces

All of the comments and whitespaces on websites are there solely for the benefit of marketers and developers; they serve no practical purpose to end-users or to browsers.  And while both comments and whitespaces aid in the understanding and debugging of code, in so doing they actually do a disservice to users -- especially when deployed Production.  I argue that they're not needed, especially in the year 2017 when there are a multitude of other options available to developers such as server-side comments (those that don't output in HTML code) and impressive browser-based debuggers like Chrome's DevTools or even URL parameter based feature-flags that enable a debug mode.  Frankly, there's no need to penalize all of a website's users today with unnecessarily bloated webpages.

In conclusion, I've only touched upon a few of the many ways in which marketers and developers can reduce HTML transfer size by deleting useless code from their websites.  There are many others; namely, by removing quotes from specific HTML attributes; by not specifying default attributes for HTML forms; by removing unnecessary prefixes from URLs; etc.  A great resource to learn more about reducing HTML transfer size is Google's PageSpeed.  I hope you found this article helpful.


Post a Comment

Popular posts from this blog

A better UI/UX for Cookie consent banners

I'm sure you've seen them before; those pesky, inescapable  Cookie consent banners !  They typically appear at the top or bottom of websites -- often obscuring important content.  For example, if you were to visit  CNN ,  Zara , or  Unicef  today; or, any other news, e-commerce, or charitable website for that matter -- especially those with an international presence -- you'd likely see one; a UI / UX eyesore.  Such Cookie consent banners, ubiquitous and omnipresent, have become the defacto solution for complying with an important part of the European Union's (EU) ePrivacy Directive  (ePD). If you're unfamiliar with the ePD, it basically mandates that websites first obtain a user's consent before storing and/or retrieving any Personally Identifiable Information  (PII) about them in and/or from HTTP cookies.  ( HTTP Cookies are small pieces of data stored by websites in a user's web browser for easier retrieval later.)  The Cookie Law, as the ePD has becom

Happy Father's, Mother's, Sister's, Brother's, Son's, and Daughter's Days

Today is Father's Day in the US. And to celebrate it, my wife and kids got me 6 pairs of socks, 2 shirts, several packs of sour candies, a $25 Domino's Pizza gift card, and a mug emblazoned with the phrase "Good Man, Great Dad". I'll probably never use any of those things; they're all crappy IMHO. (Well, maybe I'll use the gift card and eat the candies; I love sour candies.) But this post isn't a Father's Day rant about the crappy gifts that men receive in comparison to women on Mother's Day; rather, it's about a conversation that I had with my son Kyle about why there isn't a Brother's or Sister's Day too. To quote him: "The world should really have a Brother's Day and a Sister's Day. If not, they should get rid of Mother's Day and Father's Day. I know it's traditional but It's really not fair."  Clearly, he felt left out! Not wanting to let a good opportunity to have an in depth conversation w

A case for WordPress; or, not your own CMS

I've worked at several major companies thus far in my career; and, at each of those companies, I've had to use an in-house Content Management System (CMS)  to create and modify digital content.  For a long time, I couldn't understand why those companies would choose to allocate time, resources, and ultimately money to create a customized CMS in spite of having functionally equivalent, open source solutions readily available in the marketplace.  What was their rationale for doing so, I often wondered?  Was their businesses so unique that an off-the-shelf CMS simply wouldn't cut it; were the security risks associated with an open source CMS too great of a burden to bear; or, was it because the software engineer(s) in charge at the time simply wanted to showcase their PHP, Java, Python, etc. programming skills by building yet another CMS from scratch.  Well, in the years since I first pondered that question, I've come to realize that the answer is often the latter rea