SPA and SEO: Google (Googlebot) properly renders Single Page Application and execute Ajax calls

I run some tests to understand how Google Search Engine handles a Single Page Application. I built the website for running the test in Elm but the same result should be valid also for React, Angular, or any other language/framework.

  1. Googlebot run the Javascript on the page and the Ajax calls are properly executed
  2. Googlebot waits between 5 to 20 seconds before taking a snapshot of each page
  3. The fetching done on request from the Search Console (I call these “T5”) and the “natural” fetching done by Google (I call these “T20”) are different
  4. T5 take a snapshot after around 5 seconds, T20 after around 20 seconds
  5. Different sections of the page are snapshotted at different times. For example in the T20 case, the title has always T19 and the meta-description has T20
  6. There are mysterious situations where the snapshot are taken in impossible cases. For example, the snapshot is taken after 5 seconds but the page already shows the result of the Ajax call that arrived after 10 seconds

The website used for this test is a Single Page Application that:

  • is built in Elm 0.18
  • uses pushState to navigate across pages
  • uses forward slashes for the Url structure

This website has 5 pages:

  1. http://elm-spa-seo-testing.guupa.com/
  2. http://elm-spa-seo-testing.guupa.com/section1
  3. http://elm-spa-seo-testing.guupa.com/section2
  4. http://elm-spa-seo-testing.guupa.com/section3
  5. http://elm-spa-seo-testing.guupa.com/sitemap

Pages automatically update the title and the meta-description so when Googlebot is indexing them, is possible to verify in which status they were. Three events change the status. These are:

  1. Time: every second
  2. Type A Ajax calls: these calls are initiated with several delays
  3. Type B Ajax calls: these calls are initiated all the beginning and they replay with several delays

The delays for both Ajax types of calls are set at 0, 1, 3, 6, and 10 seconds.

This is the sequence of the calls:

The sequence of Ajax calls

This is the history of the title changes. As you can see, from bottom to top, it starts from “No JS, No Ajax”, which is what Search Engines would index if they don’t execute Javascript.

Model of Application where is possible to follow the history of the Title changes

Note: this screen is from the Elm debugger. Click on the button in the lower right corner of the screen to activate it.

Results

These are some of the first results that I got:

╒═════╤══════╤════╤══════╤══════╤══════════════════════╤═══════════╕
│Vers.│ Time │Hist│ Type │ Type │ Date │ Page │
│ │ │ory │ A │ B │ │ │
╞═════╪══════╪════╪══════╪══════╪══════════════════════╪═══════════╡
│ 1 │ │ 1 │ 0 │ NaN │ 2017-08-18 │ / │
│ 1 │ │ 1 │ 10 │ 6 │ 2017-08-19 │ / │
│ 3 │ │ 1 │ 0 │ 3 │ 2017-08-20 │ / │
│ 4 │ │ 1 │ 0 │ NaN │ 2017-08-21T19:35:57Z │ / │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:53Z │ /section1 │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:57Z │ / │
│ 7 │ 5 │ 1 │ 0 │ 6 │ 2017-08-28T07:44:57Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-28T00:00:00Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-30T00:00:00Z │ /sitemap │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-09-03T00:00:00Z │ /sitemap │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-06T00:00:00Z │ /section1 │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-07T00:00:00Z │ /section2 │
│ 7 │ 5 │ 1 │ 0 │ 1 │ 2017-09-09T13:33:50Z │ /section3 │
│ 7 │ 20 │ 1 │ 10 │ 11 │ 2017-09-10T00:00:00Z │ /section2 │
╘═════╧══════╧════╧══════╧══════╧══════════════════════╧═══════════╛

Googlebot waits between 5 and 20 seconds before taking a snapshot of the page. Ajax calls result, both Type A and Type B, seem not to agree with this assertion in case the waiting time are in the 5 seconds range.

This is an example of the search result on 24 August 2017:

You can extrapolate the data from the title or description of the page:

V5,T5,H7,A0,B3,2017-08-24T16:46:04Z,/section1
  • V5 The version of the code
  • T5 The second passed before Googlebot take a snapshot of the page
  • H7 The number of clicks (or items in the History). This number increase while browsing the site. Googlebot would probably always get “1” as a value because it doesn’t “click” on links but send new HTTP requests.
  • A0 The Type A Ajax call got only the first reply at 0 seconds
  • B3 The Type B Ajax call got the reply at 3 seconds
  • 2017–08–24T16:46:04Z Date and time when the page was indexed
  • /section1 The path of the page

I replicated the Title also in the body of the page in large font so is possible to read it also in the small previews of “Fetch as Google” in the Google Search Console. The values that show on this page are not the same on the search result. Probably Google uses different programs to create these snapshots compared to the one used for the search engine.

Screenshot of the “Fetch as Google” section of the Google Search Console

Another thing to note in the screenshot above is that the time of the snapshot is T5 but the time of the second Ajax answer was B10. This should be an impossible state of the page. It means that the screenshot was taken after 5 seconds but the API of type B had 10 seconds to answer. This is a typical title history and you can see that there is no such a thing as T5 and B10 at the same time:

V7,T6,H1,A6,B6,2017-08-28T12:57:20Z,/
V7,T6,H1,A6,B3,2017-08-28T12:57:20Z,/
V7,T6,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T5,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T4,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B1,2017-08-28T12:57:20Z,/
V7,T3,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T2,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B0,2017-08-28T12:57:20Z,/
V7,T1,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B[NaN],2017-08-28T12:57:20Z,/
V7,T0,H1,A[NaN],B[NaN],2017-08-28T12:57:20Z,/
No JS, No Ajax

When T5, typically there are A3 and B3.

The fetching done on request (from the Search Console) and the “natural” fetching done by Google are different.

  • The fetch done on request doesn’t reach the timeout of 10 seconds but it usually waits for around 5 seconds (this is why the name “T5”) before taking a snapshot. The timestamp, in this case, is an exact time, for example, 13:55:50
  • The “natural” fetch waits for longer (19~20 seconds, this is why the name “T20”) before taking the snapshot of the page. It has always a timestamp of 00:00:00

Titles are mixed. Some of them reflect the HTML title element, others are extracted from the page content.

For the first time, all pages are indexed at T20. Note that the title is always as T19 while the description is at T20. Both A and B are 10. I noted that while the title is T19, the content of the page is T20. This is weird behavior because these two values should be the same.

There are two updated entries compared to 26 September. One entry has a date of 25 September but you now showing on 26 September. It seems that there is some delay between crawling and publishing.

It seems that the sitemap and other pages got blocked by a weird robots.txt. Maybe my account has been hacked? [Edited: it came out that I was not hacked but surge.sh changed their policy, read below.] I restored the original robots.txt, let’s see what happens during the next few days.

Original robots.txt

User-agent: *
Disallow:
Sitemap: http://elm-spa-seo-testing.surge.sh/sitemap.txt

Wrong robots.txt

User-agent: *
Disallow: /

Unfortunately, it seems that surge.sh changed its policy about robots.txt. So all the pages that are under the surge.sh domain, are not indexed anymore. I moved this site under https://elm-spa-seo-testing.guupa.com/ for the moment. Google has not indexed it yet. I just created a new account in the search console and submitted the new URL.

In a few minutes Google already indexed the new site:

After 24 hours the first “T20” start to appear:

Google updated other two pages. For one of them, it decided to create its title. I believe that Google generates new titles in case it believes that the original titles are not significant. In this case, I think it got confused about all these letters and numbers.

Also interesting to see how Google is consistent in rendering T19 in the title and T20 in the meta-description. It seems that different sections of the page are rendered at different points in time.

Because Google was not reindexing my new version V9 I decided to request indexing from the Search Console.

Again from the screenshots of “Fetch as Google” I get an impossible state where the snapshot seems taken after 5 seconds (T5) but the second Ajax call got already a 10 seconds result (B10).

I wonder how this impossible state, “T5,B10”, could happen. If you have any ideas, leave a comment below.

The mystery of the impossible state “T5,B10”

After a few seconds, Google updated the search result. The new version of the page is there, in the first position.

The content of the TITLE element in the HEAD section should be something like “SPA and SEO Testing — V9,T5,H1,A0,B[NaN],2017–12–05T08:07:51Z,/” but Google decided to go with a simpler title, probably coming for the H1 element.

Note also that the impossible state “T5,B10” has been replaced with a possible (but strange) state “T5,B[NaN]”. NaN means that Elm created the page but no Ajax call has been returned yet. At T5, both A and B should have already received the 3-seconds result (A3,B3).

A page has been indexed with two different versions. This is another impossible state. It seems that Google renders different parts of the page in different moments. This is the same as T19 vs T20 issue that I mentioned earlier

After almost 4 years I checked how Google is indexing the page and I noted that the result is not good, Google bots are not running JavaScript on the pages:

From a quick check of the search console, it seems that 4 years ago I was still using HTTP instead of HTTPS, so

  • Google bot says that it “Couldn’t fetch” my sitemap.txt.
  • sitemap.txt contains all URLs with HTTP instead of HTTPS

So today I changed the sitemap and all links to use HTTPS. I also submitted the new sitemap, let’s see if Google starts indexing the website properly again.