Skip to main content

Reduce the transfer size of HTML responses

According to the HTTPArchive, the average transfer size of all HTML responses for a single website is 50kB.  That's not much -- especially when compared to the size of JS, CSS, image, or even font resources -- but those 50kB nonetheless add to the ever increasing bloat of webpages.  Bloated webpages make websites dreadfully slow, and even a minor increase in load time can cause a significant decrease in sales, according to both Google and Amazon.  So HTML transfer size matters; it's an important metric to pay attention to and to reduce.  That's why I'll be focusing on several ways in which to reduce that 50kb number in this article.

To begin, let's take a look at a few websites. A cursory check of the top 500 websites shows that they contain dozens of useless tags that neither add value to end-users nor to browsers tasked with rendering them.  A case in point is the Engadget website, which can be inspected by right-clicking on the site's homepage and selecting View Source from the resultant context menu.  Doing so reveals a multitude of tags that can be safely deleted without negatively impacting the website.  Tags such as the keywords, favicons, pragma, comment, Twitter (yes, I'll explain shortly) and numerous other white-space characters can all be safely deleted.  Don't believe me?  Let's take a closer look at each of the aforementioned tags.

The keywords tag

The keywords meta tag was once an important signal for search engines in determining the contents of a webpage; they used it in their ranking and classification algorithms.  Over time, however, websites started to abuse the keywords tag by stuffing it with irrelevant content so as to improve their SEO.  As a consequence, search engines like Google, Bing, and Yahoo stopped using the tag.  So why are some websites still using it?  The short answer is habit; that is to say, marketers and developers (and to a large extent Content Management Systems) continue to include it as a matter of course -- either automatically or as part of a routine or policy.  The long answer is that there's still a lack of knowledge about the tag's depreciation -- and about SEO best practices in general -- within the online marketing community.  So even though it's been almost a decade since search engines first announced that they'd stop using the tag, many websites still continue to use it to this day.

The favicon tags

Unlike the keywords tag, the favicon meta tag is still useful; it's the small image (typically 16x16 in size) that shows up in the upper left-hand corner of a browser's tab/window for a specific website.  To indicate to the browser which favicon it should show, many websites use a link tag -- or several link tags, for that matter -- as is the case with the Engadget website.  However, few websites realize (or more appropriately, their marketers and developers) that the link tag isn't necessary for displaying a favicon.  That's because a browser will automatically check for a favicon at the website's root; in fact, not having it there can lead to artificially slow page load times and unnecessary 404 entries in a server's log file if it is nonexistent.  Even on Apple devices, the apple-touch-icon-precomposed.png and apple-touch-icon.png icons are retrieved from the website's root by default, according to Wikipedia.  So, when is using a link tag actually necessary for serving a favicon?  The answer is whenever the favicon needs to change (a consequence of aggressive caching of the previous favicon on the part of browsers) or whenever multiple favicon sizes are preferred.  Ironically, for most websites neither of the two aforementioned cases apply; that's because it's extremely rare that a website's favicon changes (it typically happens after a re-branding), and there's little evidence that end-users tend to save a website to their phone's Home screen (as opposed to just bookmarking the website or better still downloading its App) to warrant using multiple favicon sizes.  Barring a lack of access to a website's server, I'd recommend avoiding the favicon and so-called "touch" icons altogether.

The Twitter tags

Wait, wait, wait... Let me explain!  I have nothing against Twitter's meta tags; it's just that they're often used in conjunction with Facebook's OpenGraph meta tags (and that makes total sense, as they're both very popular social networks), but doing so creates a lot of redundancy.  If you read Twitter's documentation (something that marketers and developers rarely do), it clearly states that there are specific "OpenGraph fallback behavior for each Twitter tag".  That is to say, in the absent of the twitter:card meta tag, the OpenGraph og:type tag will be used.  The same is true for the twitter:description meta tag, which fallback to using the OpenGraph og:description tag.  And then there's the twitter:title and twitter:image meta tags, both of which fallback to using the og:title and og:image tags, respectively.  In fact, "if an og:type, og:title and og:description exist in the markup but twitter:card is absent, then a summary card may [still] be rendered", according to Twitter.  So as it turns out, you can indeed kill two birds with one stone; or, meta tag in this case, by relying on Facebook's OpenGraph meta tags as substitutes for Twitter's.

The pragma, expires, and cache-control tags

Instead of being hard-coded in HTML, the pragma, expires, and cache-control meta tags should instead be specified as headers on the server-side -- typically as part of the HTTP response object (the exception being for websites without direct server access).  Then again, if a website is so concerned with the caching of its contents, I think it should invest in a dedicated server in order to properly manage said caching.  But I digress.  Let's hear what Microsoft has to say about the aforementioned tags.  According to Microsoft, the "Pragma: No-cache" tag might not even prevent a webpage from being cached, at least not in Internet Explorer; and then there's the MDN web docs, which states that the pragma tag is not a reliable replacement for the general purpose cache-control tag.  Even the expires tag can be ignored if there's a cache-control header with the "max-age" or "s-max-age" directive in the response, says MDN.  So if the pragma and expires tags are useless, why are websites like Engadget using them?  I suspect that this is again due to habit, and to both marketers and developers not being fully informed about this information.

The comments and whitespaces

All of the comments and whitespaces on websites are there solely for the benefit of marketers and developers; they serve no practical purpose to end-users or to browsers.  And while both comments and whitespaces aid in the understanding and debugging of code, in so doing they actually do a disservice to users -- especially when deployed Production.  I argue that they're not needed, especially in the year 2017 when there are a multitude of other options available to developers such as server-side comments (those that don't output in HTML code) and impressive browser-based debuggers like Chrome's DevTools or even URL parameter based feature-flags that enable a debug mode.  Frankly, there's no need to penalize all of a website's users today with unnecessarily bloated webpages.

In conclusion, I've only touched upon a few of the many ways in which marketers and developers can reduce HTML transfer size by deleting useless code from their websites.  There are many others; namely, by removing quotes from specific HTML attributes; by not specifying default attributes for HTML forms; by removing unnecessary prefixes from URLs; etc.  A great resource to learn more about reducing HTML transfer size is Google's PageSpeed.  I hope you found this article helpful.


Popular posts from this blog

A better UI/UX for Cookie consent banners

I'm sure you've seen them before; those pesky, inescapable  Cookie consent banners !  They typically appear at the top or bottom of websites -- often obscuring important content.  For example, if you were to visit  CNN ,  Zara , or  Unicef  today; or, any other news, e-commerce, or charitable website for that matter -- especially those with an international presence -- you'd likely see one; a UI / UX eyesore.  Such Cookie consent banners, ubiquitous and omnipresent, have become the defacto solution for complying with an important part of the European Union's (EU) ePrivacy Directive  (ePD). If you're unfamiliar with the ePD, it basically mandates that websites first obtain a user's consent before storing and/or retrieving any Personally Identifiable Information  (PII) about them in and/or from HTTP cookies.  ( HTTP Cookies are small pieces of data stored by websites in a user's web browser for easier retrieval later.)  The Cookie Law, as the ePD has becom

The Crucial Role of Service Level Agreements (SLAs) and Service Level Objectives (SLOs) in Software Applications

In today's digital era, software applications are at the heart of business operations and customer experiences. From e-commerce platforms to enterprise solutions, the performance and reliability of software applications can make or break an organization's success. To ensure seamless operations and meet customer expectations, having robust Service Level Agreements (SLAs) and Service Level Objectives (SLOs) in place has become paramount. In this blog post, we will explore the importance of SLAs and SLOs and how they contribute to the success of software applications. Defining SLAs and SLOs A Service Level Agreement (SLA) is a contractual agreement between a service provider and a customer that defines the level of service expected. It outlines the metrics and targets the service provider commits to achieving, such as uptime, response times, and resolution times. SLAs establish a mutual understanding between the parties involved and provide a framework for measuring and managing s

Using HTML tables for website layout

I first became a front-end web developer in the year of our Lord, 1998.  Back then, the HTML specification had just reached version 4.0; Internet Explorer 7 was the dominant browser; and, the mantra of separation-of-concerns  was still being preached to web developers.  (Back then merely uttering the phrase CSS-in-JS  would've gotten you killed, professionally speaking.)  What's more, back then, HTML tables were still de rigueur; in fact, many websites used them for layout purposes ( DIV-itis hadn't caught on with the masses as yet; that would happen several years later.) Yes, it was the stone ages of the web -- in comparison to today.  Today, there's a wealth of newer technologies for developers to choose from when building websites, i.e. HTML5 , CSS4 , ES9 , etc.  Long gone is the mantra of separation-of-concerns and in its place sits CSS-in-JS, mockingly.  And, long gone are table-based layouts too; they gave way to the aforementioned DIV-itis phenomenon and t