Decoration Circle
Advanced SEO Textbook
6

JavaScript SEO

Almost all websites these days are built and powered by JavaScript. In this chapter, we will equip you with the skills and knowledge to optimise the JavaScript on your site by following the best practices as recommended by Google.

Topic Details
Clock icon Time: 43
Difficulty Hard

Most websites are built using and powered by JavaScript, which means that JavaScript SEO is an important skill for every modern SEO to have in their arsenal. Ensuring that your content is discoverable by Google (and other search engines) is one of the main concerns for an SEO when it comes to JavaScript-powered websites. In this chapter, we’ll discuss the basics of JavaScript before taking a closer look at how Googlebot processes JS. Finally, we’ll highlight some of the best practices and common mistakes for JavaScript SEO.

The Fundamentals of JavaScript

Although HTML (which is used to define the content of web pages) and CSS (which is used to define the design/layout of the content of web pages) are vital components of web development, JavaScript is undoubtedly the future. As mentioned before, virtually every single website today is built and powered by JavaScript.

What is JavaScript?

JavaScript (which is often abbreviated as JS) is a programming language invented by Brendan Eich in 1995 that is used to program the behavior of web pages. JavaScript code is indicated using <script> tags that are embedded within the HTML document, or saved as separate .js files that are then linked or referenced.

Unlike HTML, where the code is static, JavaScript allows you to dynamically update the contents of a page which allows for a richer and more sophisticated user experience.

For example, one of the most common use cases for JavaScript is for client side validation of form elements where instead of having to send data to the web server every time it is submitted.

Building a website using only HTML and CSS is fine, but JavaScript is what transforms a website into being dynamic and interactive.

What Are JavaScript Frameworks?

In computer programming, a software framework serves as an abstraction where a piece of software that provides general functionality can be adapted by web developers with additional user-written code.

Therefore, a JavaScript framework is an application framework that is written in JavaScript, and is used by programmers to manipulate the JavaScript functionality and define their web application’s design. JS Frameworks make working with JavaScript much smoother and easier to work with as well as more organised.

The main advantage of JavaScript frameworks is their efficiency. Predefined functions that apply to most web applications and do not need to be custom-built help speed up the development process as it requires less manual code to be written.

Below we have outlined the three most popular JavaScript frameworks around.

Angular

Angular is a framework developed by Google in 2009, this makes it the oldest available JavaScript framework. With Google’s backing, it’s one of the most powerful, efficient and widely used frameworks around for “creating efficient and sophisticated single-page apps”.

Advantages

  • Regular updates and developments
  • Easy to reuse, maintain, understand the code.
  • TypeScript feature provides familiar language for those with background in object-oriented programming i.e. Java and C#
  • Backed and developed by Google

Disadvantages

  • Learning curve can be steep for beginners
  • TypeScript may be a barrier to adoption
  • Poor startup metrics in benchmarks

React

The React framework was developed by Facebook engineer Jordan Walke in May 2013. Unlike Angular which is a full JS framework, React is a “JavaScript library for building user interfaces”. React is used to develop and power dynamic UIs for web pages that see a lot of traffic, for example, React is used by Netflix and AirBnb.

Advantages

  • Backed and developed by Facebook
  • Perfect for building dynamic, responsive and interactive user interfaces.
  • Great for cross-platform websites i.e. desktop, mobile, tablets etc. as it is based on reusable components.
  • Syntax is easy to learn if you already know JavaScript.

Disadvantages

  • Not a full framework like Angular, so functions like routing, state management and data fetching are not supported and left to third parties.

Vue.js

Created and released by independent software developer Evan You in 2014, Vue.js is arguably one of the easiest frameworks to pick up for building user interfaces and single-page applications. For this reason, it’s quickly become one of the leaders in JS frameworks.

Advantages

  • Easy to learn, especially if you already have experience with HTML, CSS and JavaScript
  • Performs simple things incredibly well, but it can also be extended and combined with other tools and libraries, making it flexible
  • Reliable for building cross-platform applications.
  • Versatile; scales “between a library and a full-featured framework”.

Disadvantages

  • Relatively young framework – resources, support is limited.

Choosing A JavaScript Framework

Although Angular, React and Vue.js are the three market leaders, there are many other frameworks to choose from, as seen in this extensive guide from JavaScript Report. The main factors to consider when choosing a JavaScript framework to work with are:

  • Your coding experience – is it easy to learn?
  • Flexibility of the framework – does it work with other third party libraries and tools?
  • Reusability – does it offer code that can be reused for various platforms and/or components?
  • Versatility of the framework – does it work on multiple platforms?

The DOM

Google uses the DOM, or Document Object Model to analyse and understand web pages. Therefore, as an SEO, it’s incredibly important that you understand what the DOM is and how it works.

The DOM is an application programming interface (API) that is used by web browsers to assemble structured HTML and XML documents. It also defines how the logical structure of these documents are accessed and manipulated.

The Critical Rendering Path

The critical rendering path is the process of how the web browser goes from the source code of a raw HTML document to displaying an interactive web page in the viewport to the user.

Below is a breakdown of the critical rendering path,

1. Construction of the DOM Tree – the DOM tree is an Object representation of the fully parsed HTML document where each element of the HTML code is treated as a node, and nested elements are treated as child nodes.

2. Construction of the CSSOM Tree – the CSS DOM (or CSSOM) is the Object representation of the styles that are associated with the DOM.

3. Running JavaScript – once the HTML document has been fully parsed, the JavaScript code is executed. This is because JavaScript is considered a parser blocking resource i.e. the parser stops and fetches and executes the JavaScript code whenever it discovers a <script> tag.

4. Creating the Render Tree – the Render Tree is a combination of the DOM and the CSSOM. It encapsulates exactly what the browser should render.

5. Generating the Layout – this is where the browser determines the layout of how the contents should be rendered based on the viewport i.e. the dimensions of the screen that the page is going to be displayed on, the viewport is generally defined to fit the device’s screen.

6. Painting – this is where the visible content is displayed (painted) onto the screen.

DOM and JavaScript

The DOM also allows programs and scripts which are usually written in JavaScript, to dynamically access and update the content, structure and style of the documents even after the page has initially loaded.

Think of it as the intermediary between the raw HTML code and what is actually displayed to the user (or rendered by a search engine).

Therefore, the DOM is an essential factor to consider for any JavaScript-driven website because this is what the JS code interacts with.

From an SEO standpoint, thanks to a test run by Search Engine Land, we can infer that Google is able to analyse and understand the DOM as well as interpret dynamically inserted content like page titles, headings and even canonicals, which further emphasises its importance.

You can see what the DOM looks like (and even change it yourself) by using the “Inspect Element” function on most web browsers.

How Googlebot Processes JavaScript

JavaScript web apps are processed by Google in three main phases:

1. Crawling

2. Rendering

3. Indexing

When it comes to crawling, rendering and indexing traditional HTML content, Googlebot (and other search engine crawlers) have little trouble, it’s when you introduce JavaScript, that things start to get a little more complicated.

Below is a breakdown of how Googlebot crawls, renders and indexes JavaScript content:

1. Googlebot fetches a URL from its crawling queue via a HTTP request and checks whether it is allowed to crawl it (by reading the robots.txt file).

2. The URL is skipped if the web page is disallowed (blocked) on the robots.txt file.

3. Googlebot parses and adds the URLs of other pages that are linked to on the URL to its crawl queue.

4. If the original URL is allowed (not blocked), Googlebot downloads the raw HTML file and parses it – this is the “Crawler” stage in the below diagram.

5. Googlebot is unable to find links in the source code because they are only available after the JavaScript is executed.

6. CSS and JS files are downloaded by Googlebot.

7. Unless a robots meta tag or header tells Googlebot not to index the page, the web page is added to Googlebot’s rendering queue.

8. Once Googlebot’s resources allow (the page may stay in this queue for a few seconds), it uses its Web Rendering Service (a headless Chromium) to parse, compile and execute the JS code – this is the “Renderer” stage in the below diagram.

9. The rendered HTML is indexed.

10. Googlebot is now able to parse the rendered HTML and follow new links so that they can be added to the crawling queue.

Google’s Web Rendering Service

It’s important to understand that Googlebot doesn’t behave in the same way that a web browser does when it crawls your website. When search engine crawlers visit your website, they use a technique called headless browsing.

Headless browsing is the process of fetching and visiting websites without “seeing” the user interface. Googlebot uses a headless Chromium (based on the latest version of Chrome) to render the web pages and execute JavaScript code.

Likewise, it’s worth mentioning that Googlebot’s core purpose is to crawl the world wide web, therefore it tries to determine whether it’s necessary to render a resource. Google says “Googlebot and its Web Rendering Service (WRS) component continuously analyze and identify resources that don’t contribute essential page content and may not fetch such resources”.

This essentially means that it’s not guaranteed that Googlebot will parse and execute all of your JavaScript code as it may have deemed it as unimportant from a rendering perspective.

Below are some important points to note on how Google processes JavaScript thanks to tests run by Search Engine Land and Built Visible.

  • Googlebot follows JavaScript links
  • Googlebot follows JavaScript redirects
  • Googlebot crawls and indexes dynamically created content when it is inserted within the HTML source and when it is inserted as an external JavaScript file.
  • Googlebot crawls and indexes dynamically created metadata and page elements such as page titles, canonical tags and headings.
  • Google’s renderer has timeouts – JS that takes too long to render will be skipped.

JavaScript Rendering

Successfully processing and understanding JavaScript is more difficult for some search engine crawlers than others. If search engines aren’t able to “see” the content on a web page the same way that a web browser might be able to, it can lead to many indexing issues.

Webmasters can control how content is delivered and presented to both users and search engines by adopting various different rendering methods. For example, you can choose how much of your content is rendered by the client (i.e. the web browser or the searching engine crawler that is requesting the page) and how much is rendered by the web server that is hosting the web page.

In this section, we’ll explore how some of these methods work as well as highlight their advantages and disadvantages.

Pre-Rendering

Pre-rendering is where the client (web browser or search engine bot) sends the search engine crawler a rendered version of the web page’s DOM (Domain Object Model) i.e. what users see. This allows the bot to form the core structure of the page’s layout before the final content is rendered. This is because a static, but incomplete snapshot of the web page is immediately displayed so that the content can be viewed by the search engines without them having to do any of the rendering themselves. In other words, the web page is pre-rendered.

When implementing this type of rendering, it’s important to ensure that search engines are served a valid and accurate representation of the web page.

Advantages of Pre-rendering

  • Search engines can view some content before indexing

Disadvantages of Pre-rendering

  • Only a snapshot of the content is visible to the search engine.
  • Pre-rendered content doesn’t allow for interactivity with the end user

Client-side Rendering

With client-side rendering, the process of rendering the JavaScript code on a web page is left entirely to the client making the request. As we’ve established, most search engines have trouble processing JavaScript, so this method is not recommended.

Advantages of Client-side Rendering

  • Less strain on the server.

Disadvantages of Client-side Rendering

  • More strain is put on the client.
  • This may potentially lead to increase in page load speeds.
  • The most important drawback is that most search engine bots have difficulty when it comes to rendering JS, which leads to indexing issues as they can’t “see” the web page.

Server-side Rendering

Implementing server-side rendering means that the web server that is hosting the web resource will render the JS code before sending it to the client. In this scenario, the client simply displays the rendered content.

Advantages of Server-side rendering

  • Search engines receive a fully rendered page which improves indexability.
  • Increases page speed as the content is available almost immediately.
  • Generally produces a fast First Paint (FP) and First Contentful Paint (FCP).
  • Running page-logic and rendering on the server-side produced a faster Time to Interactive (TTI).

Disadvantages of Server-side rendering

  • More strain on the server can potentially lead to slower rendering
  • This can also lead to a slower Time to First Byte (TTFB).

Hybrid Rendering

Hybrid rendering shares the rendering load between the client and the server. Here, the main content on the page is rendered on the server-side before being sent to the client along with additional JS code which is then rendered on the client-side. By sending the additional code to the client, the user is able to interact with the page.

Advantages of Hybrid rendering

  • Availability of content is faster for client
  • Search engines are able to access and index the content much quicker.
  • Users are able to interact with the content because of the additional JS code that is rendered on the client-side.

Disadvantages of Hybrid rendering

  • Without client-side rendering, user’s aren’t able to access the whole page.
  • Rendering happens twice, which may result in slower page speeds.

Dynamic Rendering

What is Dynamic Rendering?

As a workaround for some of the problems that we have highlighted with other rendering techniques, Google announced dynamic rendering in May 2015. Simply put, dynamic rendering is a way to help Google index JavaScript websites by “switching between client-side rendered and pre-rendered content for specific user agents”.

How Does Dynamic Rendering Work?

Below is a breakdown of how dynamic rendering works:

1. The web server must detect web crawlers by checking the user agent in the HTTP header request.

2. The request from the crawler is routed to a renderer (requests from users i.e. web browsers are served normally).

3. The dynamic renderer then serves a different version of the content that is more suitable for the crawler to understand and process – this is usually in the form of a static HTML page, which Google will have no problem processing.

According to Google, dynamic rendering should be used for websites using JavaScript-generated content that:

  • Changes rapidly
  • Uses modern JavaScript features supported in Chrome 41 or above
  • Have a strong social media presence

Implementing Dynamic Rendering

In order to set up dynamic rendering for your website, you need to:

1. Install and configure a dynamic renderer i.e. Puppeteer, Rendertron, and prerender.io.

2. Select the user agents that you want to receive static HTML content. Here’s an example provided by Google of common user agents using Renderton:

export const botUserAgents = [
  'googlebot',
  'google-structured-data-testing-tool',
  'bingbot',
  'linkedinbot',
  'mediapartners-google',
];

You may also wish to add Bingbot (Bing) and Baiduspider (Baidu) too.

3. Add a cache to serve the static content if you find that the renderer is slowing down your server due to a high number of pre-rendering requests. Likewise, you may need to add verification to ensure that the requests are being made from valid crawlers.

4. Determine whether or not the user agents need desktop or mobile content and use dynamic serving to serve the appropriate content based on whether it is a mobile or desktop agent.

5. Configure the web server to deliver the static HTML to the crawlers that you have selected.

The above is a brief outline of the process, you can find more detailed information on how to implement dynamic rendering here.

6. Verify that your implementation of dynamic rendering is successful by:

a. Testing your mobile content using Google’s Mobile-Friendly Testing tool – this verifies whether Google is able to see the mobile version of your content.

b. Testing your desktop content using Google’s URL Inspection tool – this verifies whether Google is able to see the desktop version of your content.

c. Testing your structured data using Google’s Structured Data Testing tool – this verifies that the structured data on your website renders properly.

Dynamic rendering provides a suitable workaround for websites whose architecture is heavily driven by client-side JavaScript.

But it’s important to remember that not every website needs to adopt this methodology – in some cases, the other techniques mentioned may be more suitable. For example, server-side or pre-rendering is still useful because it makes your website faster for users and crawlers.

JavaScript SEO Best Practices & Common Pitfalls

The bottom line when it comes to JavaScript SEO is whether Google is able to crawl, render and index it.

With that in mind, here are some of the best practices and common pitfalls to avoid to ensure that Google is able to process the JavaScript content on your website.

Blocked JavaScript

Ensure that search engines are able to access your JavaScript files, otherwise, your content will not be crawled, rendered or indexed properly. This is because the search engine will not be able to get a full experience of the web page which can impact your website’s search presence.

To check that your JavaScript is not blocked, use Google’s URL Inspection Tool to see how it views your webpage as well as your robots.txt file to identify any resources that may be preventing Googlebot from accessing the JS resources.

Meta Robots Tag

Ensure you have implemented meta robots tags correctly. For example, adding the following code to your web page will block Google from indexing it.

<!-- Googlebot won't index this page or follow links on this page -->
<meta name="robots" content="noindex, nofollow">

Note that whenever Googlebot encounters a noindex tag in the robots meta tag before running JavaScript, the page will not be rendered or indexed.

URL Structure

JavaScript websites often generate URLs with a hash. Note that this is detrimental to your SEO because such URLs may not be crawled by Googlebot.

Bad URL: example.com#page-1
Good URL: example.com/page-1

This subtle difference is crucial in ensuring that your pages are crawled, rendered and indexed by Google.

Renderability

Apart from not being able to render your JavaScript because of blocked resources, Googlebot may have encountered a timeout or other internal error whilst trying to process the code. You can access these errors that Googlebot has encountered by clicking the “More Info” tab within the URL Inspection Tool.

Likewise, if a general user has to interact with the web page in order to trigger another event, it’s likely that Googlebot will not be able to “see” the content.

This is because Google is not able to click or scroll through a web page; which is an issue if for example, you haven’t implemented pagination correctly.

Most eCommerce websites will use pagination to break up categories that contain lots of products – this is usually indicated by a “Load More” or “Next” button as seen in the example below.

More often than not, paginated pages past the first page will be blocked so that Googlebot is unable to access them – this is a missed opportunity as it means valuable URLs that are linked from these pages will not be accessible.

In order to ensure that Googlebot is able to follow these links, use <a href> links as opposed to buttons that users have to click.

Optimise the Critical Rendering Path

Google recommends optimising the critical rendering path so that content that relates to the user’s actions is prioritized during the rendering process. By doing this, you will create a much better experience for the user as the time taken to render pages is significantly improved.

Below is a graphic from Google which illustrates the difference between an optimised and unoptimised rendering path.

You can find Google’s recommendations on optimising the critical rendering path here.

Render-blocking JavaScript

Render-blocking JavaScript is where you have any unnecessary JavaScript resources that take up too much processing time to load

This is problematic because Googlebot may timeout and be unable to finish rendering the page, and users may have a poor experience as the content takes too long to load.

Here are some solutions to help avoid this:

1. Add the JavaScript in the HTML document so that it can be parsed quicker.

2. Add the “async” attribute to the HTML tag so that the JavaScript loads asynchronously with the rest of the content.

For example, use:

<script async src=”my-javascript-file.js”>

Instead of:

<script src=”my-javascript-file.js”>

3. Move JavaScript code that perhaps isn’t’ as important or relevant for the initial page render towards the bottom of the HTML document to help improve performance and minimise resource contention.

Internal Linking

Ensure that internal links are implemented with regular anchor tags within your HTML code, or, using the <a hrefs=”https://www.example.com”/> HTML tag in the DOM instead of using JavaScript onclick events. This is because web crawlers will not associate these types of internal links with the main architecture and navigation of your website.

Indexability

There are two quick ways to check whether your JavaScript (or any other) content is being indexed by Google.

The quickest way to do this, is via a site search.

To see whether Google has indexed a particular web page, type: “site:URL”.

To see whether Google has indexed particular content on a web page, type: “ [content you want to check] site:URL”

We can see in the above example that the text “Looper pedals” is highlighted in bold by Google. If the snippet does not show up, then it means that this particular part of the content has not been indexed by Google.

If you want to be absolutely sure, use the URL Inspection Tool, which is a more accurate method.

Your JavaScript content may not have been indexed for the following reasons:

  • Googlebot encountered timeouts – it took too long for Googlebot to render the entire page, so some of the content may not have been rendered and indexed.
  • Google had rendering issues – check the URL Inspection tool to ensure there weren’t any issues when Google tried to render the web page.
  • Content is low quality – Google may have skipped the content entirely if it deemed it as offering little value.
  • Blocked resources – it is possible the resource was simply blocked which meant that Googlebot was unable to access it, let alone render and index it.

Fix Images and Lazy-Loaded Content

User experience is key to Google. One of the core factors that can drastically hinder UX is images – they can be costly on both bandwidth and performance. Google recommends using lazy-loading.

Lazy-loading is the process of only loading the images (or other resource-heavy content) when the user is about to see them.

Below are some guidelines on how to implement lazy-loading in a search-friendly way so as not to inadvertently hide content from Google:

  • Load content only when it’s visible in the viewport – make sure that your lazy-loading implementation loads the relevant content when it is visible in the viewport. This will allow Googlebot to see all of the content on the webpage. It also puts less strain on the browser and improves the user experience.
  • Infinite Scroll – if you have adopted an infinite scroll experience on your website, ensure that paginated loading is supported. This is recommended as it “allows them (users) to share and reengage with your content”, but also allows Googlebot to “show a link to a specific point in the content, rather than the top of an infinite scrolling page”. To achieve this, Google recommends that you provide a unique link to each section (i.e. each paginated page) that users can share and load directly.