SPA and SEO: Google (Googlebot) properly renders Single Page Application and execute Ajax calls

Lucamug
10 min readAug 19, 2017

--

I run some tests to understand how Google Search Engine handles a Single Page Application. I built the website for running the test in Elm but the same result should be valid also for React, Angular, or any other language/framework.

Findings Overview

  1. Googlebot run the Javascript on the page and the Ajax calls are properly executed
  2. Googlebot waits between 5 to 20 seconds before taking a snapshot of each page
  3. The fetching done on request from the Search Console (I call these “T5”) and the “natural” fetching done by Google (I call these “T20”) are different
  4. T5 take a snapshot after around 5 seconds, T20 after around 20 seconds
  5. Different sections of the page are snapshotted at different times. For example in the T20 case, the title has always T19 and the meta-description has T20
  6. There are mysterious situations where the snapshot are taken in impossible cases. For example, the snapshot is taken after 5 seconds but the page already shows the result of the Ajax call that arrived after 10 seconds

Test methodology

The website used for this test is a Single Page Application that:

  • is built in Elm 0.18
  • uses pushState to navigate across pages
  • uses forward slashes for the Url structure

This website has 5 pages:

  1. http://elm-spa-seo-testing.guupa.com/
  2. http://elm-spa-seo-testing.guupa.com/section1
  3. http://elm-spa-seo-testing.guupa.com/section2
  4. http://elm-spa-seo-testing.guupa.com/section3
  5. http://elm-spa-seo-testing.guupa.com/sitemap

Pages automatically update the title and the meta-description so when Googlebot is indexing them, is possible to verify in which status they were. Three events change the status. These are:

  1. Time: every second
  2. Type A Ajax calls: these calls are initiated with several delays
  3. Type B Ajax calls: these calls are initiated all the beginning and they replay with several delays

The delays for both Ajax types of calls are set at 0, 1, 3, 6, and 10 seconds.

This is the sequence of the calls:

The sequence of Ajax calls

This is the history of the title changes. As you can see, from bottom to top, it starts from “No JS, No Ajax”, which is what Search Engines would index if they don’t execute Javascript.

Model of Application where is possible to follow the history of the Title changes

Note: this screen is from the Elm debugger. Click on the button in the lower right corner of the screen to activate it.

Results

These are some of the first results that I got:

╒═════╤══════╤════╤══════╤══════╤══════════════════════╤═══════════╕
│Vers.│ Time │Hist│ Type │ Type │ Date │ Page │
│ │ │ory │ A │ B │ │ │
╞═════╪══════╪════╪══════╪══════╪══════════════════════╪═══════════╡
│ 1 │ │ 1 │ 0 │ NaN │ 2017-08-18 │ / │
│ 1 │ │ 1 │ 10 │ 6 │ 2017-08-19 │ / │
│ 3 │ │ 1 │ 0 │ 3 │ 2017-08-20 │ / │
│ 4 │ │ 1 │ 0 │ NaN │ 2017-08-21T19:35:57Z │ / │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:53Z │ /section1 │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:57Z │ / │
│ 7 │ 5 │ 1 │ 0 │ 6 │ 2017-08-28T07:44:57Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-28T00:00:00Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-30T00:00:00Z │ /sitemap │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-09-03T00:00:00Z │ /sitemap │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-06T00:00:00Z │ /section1 │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-07T00:00:00Z │ /section2 │
│ 7 │ 5 │ 1 │ 0 │ 1 │ 2017-09-09T13:33:50Z │ /section3 │
│ 7 │ 20 │ 1 │ 10 │ 11 │ 2017-09-10T00:00:00Z │ /section2 │
╘═════╧══════╧════╧══════╧══════╧══════════════════════╧═══════════╛

Googlebot waits between 5 and 20 seconds before taking a snapshot of the page. Ajax calls result, both Type A and Type B, seem not to agree with this assertion in case the waiting time are in the 5 seconds range.

This is an example of the search result on 24 August 2017:

You can extrapolate the data from the title or description of the page:

V5,T5,H7,A0,B3,2017-08-24T16:46:04Z,/section1
  • V5 The version of the code
  • T5 The second passed before Googlebot take a snapshot of the page
  • H7 The number of clicks (or items in the History). This number increase while browsing the site. Googlebot would probably always get “1” as a value because it doesn’t “click” on links but send new HTTP requests.
  • A0 The Type A Ajax call got only the first reply at 0 seconds
  • B3 The Type B Ajax call got the reply at 3 seconds
  • 2017–08–24T16:46:04Z Date and time when the page was indexed
  • /section1 The path of the page

I replicated the Title also in the body of the page in large font so is possible to read it also in the small previews of “Fetch as Google” in the Google Search Console. The values that show on this page are not the same on the search result. Probably Google uses different programs to create these snapshots compared to the one used for the search engine.

Screenshot of the “Fetch as Google” section of the Google Search Console

Mysterious impossible state

Another thing to note in the screenshot above is that the time of the snapshot is T5 but the time of the second Ajax answer was B10. This should be an impossible state of the page. It means that the screenshot was taken after 5 seconds but the API of type B had 10 seconds to answer. This is a typical title history and you can see that there is no such a thing as T5 and B10 at the same time:

V7,T6,H1,A6,B6,2017-08-28T12:57:20Z,/
V7,T6,H1,A6,B3,2017-08-28T12:57:20Z,/
V7,T6,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T5,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T4,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B1,2017-08-28T12:57:20Z,/
V7,T3,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T2,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B0,2017-08-28T12:57:20Z,/
V7,T1,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B[NaN],2017-08-28T12:57:20Z,/
V7,T0,H1,A[NaN],B[NaN],2017-08-28T12:57:20Z,/
No JS, No Ajax

When T5, typically there are A3 and B3.

Fetching done on request and the “natural” fetching are different

The fetching done on request (from the Search Console) and the “natural” fetching done by Google are different.

  • The fetch done on request doesn’t reach the timeout of 10 seconds but it usually waits for around 5 seconds (this is why the name “T5”) before taking a snapshot. The timestamp, in this case, is an exact time, for example, 13:55:50
  • The “natural” fetch waits for longer (19~20 seconds, this is why the name “T20”) before taking the snapshot of the page. It has always a timestamp of 00:00:00

Search result on 30 August 2017

Search result on 10 September 2017

Titles are mixed. Some of them reflect the HTML title element, others are extracted from the page content.

Search result on 26 September 2017

For the first time, all pages are indexed at T20. Note that the title is always as T19 while the description is at T20. Both A and B are 10. I noted that while the title is T19, the content of the page is T20. This is weird behavior because these two values should be the same.

Search result on 2 October 2017

There are two updated entries compared to 26 September. One entry has a date of 25 September but you now showing on 26 September. It seems that there is some delay between crawling and publishing.

Search result on 19 October 2017

Search result on November 8th

It seems that the sitemap and other pages got blocked by a weird robots.txt. Maybe my account has been hacked? [Edited: it came out that I was not hacked but surge.sh changed their policy, read below.] I restored the original robots.txt, let’s see what happens during the next few days.

Original robots.txt

User-agent: *
Disallow:
Sitemap: http://elm-spa-seo-testing.surge.sh/sitemap.txt

Wrong robots.txt

User-agent: *
Disallow: /

Update 27 November 2017

Unfortunately, it seems that surge.sh changed its policy about robots.txt. So all the pages that are under the surge.sh domain, are not indexed anymore. I moved this site under https://elm-spa-seo-testing.guupa.com/ for the moment. Google has not indexed it yet. I just created a new account in the search console and submitted the new URL.

In a few minutes Google already indexed the new site:

Search result on 28 November 2017

After 24 hours the first “T20” start to appear:

Search result on 30 November 2017

Google updated other two pages. For one of them, it decided to create its title. I believe that Google generates new titles in case it believes that the original titles are not significant. In this case, I think it got confused about all these letters and numbers.

Also interesting to see how Google is consistent in rendering T19 in the title and T20 in the meta-description. It seems that different sections of the page are rendered at different points in time.

Updates 5 December 2017

Because Google was not reindexing my new version V9 I decided to request indexing from the Search Console.

Again from the screenshots of “Fetch as Google” I get an impossible state where the snapshot seems taken after 5 seconds (T5) but the second Ajax call got already a 10 seconds result (B10).

I wonder how this impossible state, “T5,B10”, could happen. If you have any ideas, leave a comment below.

The mystery of the impossible state “T5,B10”

After a few seconds, Google updated the search result. The new version of the page is there, in the first position.

The content of the TITLE element in the HEAD section should be something like “SPA and SEO Testing — V9,T5,H1,A0,B[NaN],2017–12–05T08:07:51Z,/” but Google decided to go with a simpler title, probably coming for the H1 element.

Note also that the impossible state “T5,B10” has been replaced with a possible (but strange) state “T5,B[NaN]”. NaN means that Elm created the page but no Ajax call has been returned yet. At T5, both A and B should have already received the 3-seconds result (A3,B3).

Updates 13 December 2017

A page has been indexed with two different versions. This is another impossible state. It seems that Google renders different parts of the page in different moments. This is the same as T19 vs T20 issue that I mentioned earlier

Updates 15 November 2021

After almost 4 years I checked how Google is indexing the page and I noted that the result is not good, Google bots are not running JavaScript on the pages:

From a quick check of the search console, it seems that 4 years ago I was still using HTTP instead of HTTPS, so

  • Google bot says that it “Couldn’t fetch” my sitemap.txt.
  • sitemap.txt contains all URLs with HTTP instead of HTTPS

So today I changed the sitemap and all links to use HTTPS. I also submitted the new sitemap, let’s see if Google starts indexing the website properly again.

Updates 6 July 2022

Everything seems indexed again properly, except for one page

Updates 14 February 2024

I haven’t done any maintenance at all. Just taking this screenshot after 1.5 years. Only 4 results are now available. Couple of pages have been indexed in January 2024.

Thank you for reading!

--

--