SPA and SEO: Google (Googlebot) properly renders Single Page Application and execute Ajax calls
I run some tests to understand how Google Search Engine handles a Single Page Application. I built the website for running the test in Elm but the same result should be valid also for React, Angular, or any other language/framework.
- Googlebot waits between 5 to 20 seconds before taking a snapshot of each page
- The fetching done on request from the Search Console (I call these “T5”) and the “natural” fetching done by Google (I call these “T20”) are different
- T5 take a snapshot after around 5 seconds, T20 after around 20 seconds
- Different sections of the page are snapshotted at different times. For example in the T20 case, the title has always T19 and the meta-description has T20
- There are mysterious situations where the snapshot are taken in impossible cases. For example, the snapshot is taken after 5 seconds but the page already shows the result of the Ajax call that arrived after 10 seconds
The website used for this test is a Single Page Application that:
- is built in Elm 0.18
- uses pushState to navigate across pages
- uses forward slashes for the Url structure
This website has 5 pages:
Pages automatically update the title and the meta-description so when Googlebot is indexing them, is possible to verify in which status they were. Three events change the status. These are:
- Time: every second
- Type A Ajax calls: these calls are initiated with several delays
- Type B Ajax calls: these calls are initiated all the beginning and they replay with several delays
The delays for both Ajax types of calls are set at 0, 1, 3, 6, and 10 seconds.
This is the sequence of the calls:
Note: this screen is from the Elm debugger. Click on the button in the lower right corner of the screen to activate it.
These are some of the first results that I got:
│Vers.│ Time │Hist│ Type │ Type │ Date │ Page │
│ │ │ory │ A │ B │ │ │
│ 1 │ │ 1 │ 0 │ NaN │ 2017-08-18 │ / │
│ 1 │ │ 1 │ 10 │ 6 │ 2017-08-19 │ / │
│ 3 │ │ 1 │ 0 │ 3 │ 2017-08-20 │ / │
│ 4 │ │ 1 │ 0 │ NaN │ 2017-08-21T19:35:57Z │ / │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:53Z │ /section1 │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:57Z │ / │
│ 7 │ 5 │ 1 │ 0 │ 6 │ 2017-08-28T07:44:57Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-28T00:00:00Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-30T00:00:00Z │ /sitemap │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-09-03T00:00:00Z │ /sitemap │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-06T00:00:00Z │ /section1 │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-07T00:00:00Z │ /section2 │
│ 7 │ 5 │ 1 │ 0 │ 1 │ 2017-09-09T13:33:50Z │ /section3 │
│ 7 │ 20 │ 1 │ 10 │ 11 │ 2017-09-10T00:00:00Z │ /section2 │
Googlebot waits between 5 and 20 seconds before taking a snapshot of the page. Ajax calls result, both Type A and Type B, seem not to agree with this assertion in case the waiting time are in the 5 seconds range.
This is an example of the search result on 24 August 2017:
You can extrapolate the data from the title or description of the page:
V5The version of the code
T5The second passed before Googlebot take a snapshot of the page
H7The number of clicks (or items in the History). This number increase while browsing the site. Googlebot would probably always get “1” as a value because it doesn’t “click” on links but send new HTTP requests.
A0The Type A Ajax call got only the first reply at 0 seconds
B3The Type B Ajax call got the reply at 3 seconds
2017–08–24T16:46:04ZDate and time when the page was indexed
/section1The path of the page
I replicated the Title also in the body of the page in large font so is possible to read it also in the small previews of “Fetch as Google” in the Google Search Console. The values that show on this page are not the same on the search result. Probably Google uses different programs to create these snapshots compared to the one used for the search engine.
Mysterious impossible state
Another thing to note in the screenshot above is that the time of the snapshot is
T5 but the time of the second Ajax answer was
B10. This should be an impossible state of the page. It means that the screenshot was taken after 5 seconds but the API of type B had 10 seconds to answer. This is a typical title history and you can see that there is no such a thing as
B10 at the same time:
No JS, No Ajax
T5, typically there are
Fetching done on request and the “natural” fetching are different
The fetching done on request (from the Search Console) and the “natural” fetching done by Google are different.
- The fetch done on request doesn’t reach the timeout of 10 seconds but it usually waits for around 5 seconds (this is why the name “T5”) before taking a snapshot. The timestamp, in this case, is an exact time, for example, 13:55:50
- The “natural” fetch waits for longer (19~20 seconds, this is why the name “T20”) before taking the snapshot of the page. It has always a timestamp of 00:00:00
Search result on 30 August 2017
Search result on 10 September 2017
Titles are mixed. Some of them reflect the HTML title element, others are extracted from the page content.
Search result on 26 September 2017
For the first time, all pages are indexed at T20. Note that the title is always as T19 while the description is at T20. Both A and B are 10. I noted that while the title is T19, the content of the page is T20. This is weird behavior because these two values should be the same.
Search result on 2 October 2017
There are two updated entries compared to 26 September. One entry has a date of 25 September but you now showing on 26 September. It seems that there is some delay between crawling and publishing.
Search result on 19 October 2017
Search result on November 8th
It seems that the sitemap and other pages got blocked by a weird robots.txt. Maybe my account has been hacked? [Edited: it came out that I was not hacked but surge.sh changed their policy, read below.] I restored the original robots.txt, let’s see what happens during the next few days.
Update 27 November 2017
Unfortunately, it seems that surge.sh changed its policy about robots.txt. So all the pages that are under the surge.sh domain, are not indexed anymore. I moved this site under https://elm-spa-seo-testing.guupa.com/ for the moment. Google has not indexed it yet. I just created a new account in the search console and submitted the new URL.
In a few minutes Google already indexed the new site:
Search result on 28 November 2017
After 24 hours the first “T20” start to appear:
Search result on 30 November 2017
Google updated other two pages. For one of them, it decided to create its title. I believe that Google generates new titles in case it believes that the original titles are not significant. In this case, I think it got confused about all these letters and numbers.
Also interesting to see how Google is consistent in rendering T19 in the title and T20 in the meta-description. It seems that different sections of the page are rendered at different points in time.
Updates 5 December 2017
Because Google was not reindexing my new version V9 I decided to request indexing from the Search Console.
Again from the screenshots of “Fetch as Google” I get an impossible state where the snapshot seems taken after 5 seconds (T5) but the second Ajax call got already a 10 seconds result (B10).
I wonder how this impossible state, “T5,B10”, could happen. If you have any ideas, leave a comment below.
After a few seconds, Google updated the search result. The new version of the page is there, in the first position.
The content of the TITLE element in the HEAD section should be something like “SPA and SEO Testing — V9,T5,H1,A0,B[NaN],2017–12–05T08:07:51Z,/” but Google decided to go with a simpler title, probably coming for the H1 element.
Note also that the impossible state “T5,B10” has been replaced with a possible (but strange) state “T5,B[NaN]”. NaN means that Elm created the page but no Ajax call has been returned yet. At T5, both A and B should have already received the 3-seconds result (A3,B3).
Updates 13 December 2017
A page has been indexed with two different versions. This is another impossible state. It seems that Google renders different parts of the page in different moments. This is the same as T19 vs T20 issue that I mentioned earlier
Updates 15 November 2021
From a quick check of the search console, it seems that 4 years ago I was still using HTTP instead of HTTPS, so
- Google bot says that it “Couldn’t fetch” my sitemap.txt.
- sitemap.txt contains all URLs with HTTP instead of HTTPS
So today I changed the sitemap and all links to use HTTPS. I also submitted the new sitemap, let’s see if Google starts indexing the website properly again.
Updates 6 July 2022
Everything seems indexed again properly, except for one page