I'm going to have thousands of PDFs and Word documents accessible for content nodes, all stored within the Media section. I have examine working perfectly in searching for these documents within my development site.
But would there be a way (perhaps through this package), to prevent these same Media assets from being indexed on major search engines like Google and Bing? I pretty much want people to come to my site to find the documents through the examine search engine, not depend on external search engines.
Thanks for any suggestions, tips on moving forward with this process.
A quick win is to just add the media directory to your robots.txt - this asks crawl bots nicely to exclude it. You could also try adding nofollow tags to all links. Note this would also exclude your images.
Hiding all media assets from Google
I'm going to have thousands of PDFs and Word documents accessible for content nodes, all stored within the Media section. I have examine working perfectly in searching for these documents within my development site.
But would there be a way (perhaps through this package), to prevent these same Media assets from being indexed on major search engines like Google and Bing? I pretty much want people to come to my site to find the documents through the examine search engine, not depend on external search engines.
Thanks for any suggestions, tips on moving forward with this process.
You can use a robot.txt file to disallow the indexing of your media folder.
Hi,
A quick win is to just add the media directory to your robots.txt - this asks crawl bots nicely to exclude it. You could also try adding nofollow tags to all links. Note this would also exclude your images.
Thanks guys! I'll follow back when I get this all setup.
is working on a reply...