Does indexifembedded Work as Expected?

What We're Testing Here

There's a new(ish) kid on the robots directive block, indexifembedded. Currently only supported by Google, the aim is to allow content to be indexed as part of the page it's embedded on, but not on its own. To achieve that it needs to be paired with a noindex too.

You can get more in-depth detail about what indexifembedded is meant to do from Google's blog post: New robots tag: indexifembedded.

I'd given it a cursory test, but didn't think much more about it but this Twitter thread from @jakebohall and @RichTatum:

Jake Bohall on Twitter: "Ran into this same confusion around indexifembedded recently and couldn't find an answer... Started playing around with it and I'm not sure it functions like anyone thought... or what everyone thought doesn't apply to all embeds.. @JohnMu @methode @googlesearchc&qu ot; / Twitter

prompts some interesting questions, does it REALLY work the way we expect it? Does it work with PDFs?

Setting Up the Test

I created a very simplistic site in a subfolder of my testing domain here: https://testing.tamethebots.com/indexifembedded/, a basic index page, then 6 other pages each containing a different variation of robots directives, for html content and pdf content.

They are:

A page with an iframe with html embed containing noindex & indexifembedded
A page with an iframe with html embed containing just noindex
A page with an iframe with html embed, no robots meta
A page with an iframe with pdf, the pdf has an X-Robots-Tag: googlebot: noindex, indexifembedded HTTP header
A page with an iframe with pdf, the pdf has an X-Robots-Tag: googlebot: noindex HTTP header
A page with an iframe with pdf, the pdf has no X-Robots-Tag HTTP header

Each of the pages has the first paragraph of a story (if any publishers want to get in touch for a children's book series, I'm available …), with the rest in the embed.

I added the URLs to a sitemap and waited for indexing.

The Outcomes

I monitored search console for each test page, and each embed's Page indexing status, and also used the URL Inspection tool to check each page, both the indexed content and a live test.

Of note is that for all pages, including the pdf ones, the embed file was listed as successfully fetched under the More Info > Page Resources list. Something backed up by checking server logs.

Test 1. HTML content, noindex & indexifembedded

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted URL marked ‘noindex’

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 1 test page showing the content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> content flattened in the body.

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 1 test page showing the content from the iframe in the body — Output from Search Console's URL Inspector live test, showing <iframe> content flattened in body.

Search Results

Screenshot of a google search showing only the test page, not the embed returning for a search — Only the test page returns for text in the embedded file

Verdict - Works as expected

The test page shows up for content from the embedded file in search, but the embedded file is not indexed.

Test 2. HTML content, noindex

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted URL marked ‘noindex’

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 2 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> content not included in the body

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 2 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector live test, showing <iframe> content not included in the body

Search Results

Screenshot of a google search showing nothing returned for text in the embedded file. — Nothing returns for text in the embedded file

Verdict - Works as expected

The test page doesn't return in search for content from the embedded file in search, and the embedded file is not indexed. The page did return for a search for text from the first paragraph, which is on the test page.

Test 3. HTML content without robots directives

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted and Indexed

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 3 test page showing the content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> content flattened in the body.

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 3 test page showing the content from the iframe in the body — Output from Search Console's URL Inspector live test, showing <iframe> content flattened in body.

Search Results

Screenshot of a google search showing both the test page, and the embedded page returning for a search — The page, and the embedded content return.

Verdict - Works as expected

Both the test page and the html page that's embedded ended up indexed and returning for an exact match search for content from the embedded file.

Test 4. PDF content with X-Robots-Tag: googlebot: noindex, indexifembedded

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted URL marked ‘noindex’

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 4 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> PDF content not included in the body

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 4 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector Live Test, showing <iframe> PDF content not included in the body

Search Results

Screenshot of a google search showing nothing returned for text in the embedded PDF file. — The page, nor the PDF found in search.

Verdict - Sort of expected

The PDF didn't get indexed as part of the page, and the PDF didn't end up indexed or returned in a search. The page did return for a search for text from the first paragraph, which is on the test page.

Test 5. PDF content with X-Robots-Tag: googlebot: noindex

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted URL marked ‘noindex’

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 5 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> PDF content not included in the body

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 5 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector Live Test, showing <iframe> PDF content not included in the body

Search Results

Verdict - Works as expected

Test 6. PDF content with no X-Robots-Tag header

See: The Test Page | The Embedded File

Search Console Status

Test Page: Submitted and Indexed
Embedded File: Submitted and indexed

URL Inspector - Crawled HTML

Screenshot of search console's URL inspector test of the crawled and indexed Chapter 6 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector, showing <iframe> PDF content not included in the body

URL Inspector - Live Test HTML

Screenshot of search console's URL inspector live test of the Chapter 6 test page showing no content from the iframe in the body — Output from Search Console's URL Inspector Live Test, showing <iframe> PDF content not included in the body

Search Results

Screenshot of a google search showing only the PDF returned for text in the embedded PDF file. — Only the PDF was found in a search.

Verdict - Sort of expected

The PDF didn't get indexed as part of the page, but the PDF did get indexed and returned in a search for content it contained. The page did return for a search for text from the first paragraph, which is on the test page.

Conclusions

The new tag works as I would expect, and of note, it enables what test 1 covers, being able to have something indexed as part of a page, but no risk of it surfacing independently, like in test 3.

In the past, there wasn't a real way to achieve that, canonical tags on the content being embedded could perhaps point to a search engine what you are wanting to achieve, but the embed is only ever part of the content, not the whole content. I can see a lot of times this might have been ignored. Suggestion versus directive.

Naturally, the tests are a very simplistic use case, but I can see the usefulness of the tag in other situations.

Perhaps something like a Jobs board offering an iframe widget with a company's vacancies for them to add to their pages, or even simplify things like the Disqus comments I use on this site.

Another thing the tests show is that PDF content isn't flattened into the main document anyway, which was something I wasn't 100% sure what the behaviour was there anyway, hence the 'Sort of expected' verdict. But it seems they're never flattened into the main document and adding the indexifembedded tag doesn't change that, a test Jake Bohall focused on more in their indexifembedded testing. Whilst ultimately we ended up roughly the same, we did get different results for the PDF with noindex, indexifembedded, Jake's ended up indexed, which I certainly wouldn't expect, mine didn't. I suspect that Google isn't seeing the noindex for some reason.

Are the testing tools Google provides sufficient?

I would say yes, but with a caveat. The URL inspection tool matched what would be flattened into the main document and what wouldn't.

I would say that checking PDF statuses are harder than it should be. Or at least it's more limited. For a page, it's possible for anyone to check if a page could be indexed in in something like the mobile-friendly tool, or the rich-result test. But these are limited to web pages, so checking a PDF leads to a rather unhelpful "Something Went Wrong". So you can only check with the URL inspector live test. The rendered HTML in a live test will make no sense, but you can see responses through the more info tab.

For PDFs that are indexed, if you look at the rendered HTML in the initial (not live) URL inspector test, you can see the HTML representation Google create from your PDFs to enable them to be indexed and shown in web search.

non-indexable PDF tested in search console's URL inspection tool live test showing noindex detected. — URL inspection live test for PDF with X-Robots-Tag: googlebot: noindex, indexifembedded HTTP header

Indexable PDF tested in search console's URL inspection tool live test showing it's indexable. — URL inspection live test for PDF with no X-Robots-Tag HTTP header, showing it's indexable.

The crawled HTML from Search Console's URL Inspector test on an indexed PDF, showing the PDF content transformed to HTML. — URL inspection test for an indexed PDF, showing the PDF content transformed to HTML.

TL;DR

indexifembedded works as advertised for html content but has no effect on PDFs embedded, as these aren't flattened into the main document anyway.

Does indexifembedded Work as Expected?

What We're Testing Here

Setting Up the Test

The Outcomes

Test 1. HTML content, noindex & indexifembedded

Search Console Status

Verdict - Works as expected

Test 2. HTML content, noindex

Search Console Status

Verdict - Works as expected

Test 3. HTML content without robots directives

Search Console Status

Verdict - Works as expected

Test 4. PDF content with X-Robots-Tag: googlebot: noindex, indexifembedded

Search Console Status

Verdict - Sort of expected

Test 5. PDF content with X-Robots-Tag: googlebot: noindex

Search Console Status

Verdict - Works as expected

Test 6. PDF content with no X-Robots-Tag header

Search Console Status

Verdict - Sort of expected

Conclusions

Are the testing tools Google provides sufficient?

TL;DR

About the Author: