As Rich mentioned is it for the Front-end with a login you can use http://our.umbraco.org/projects/website-utilities/media-protect . If you have a different requirement please let me know there is always a VNext and maybe you have a requirement that I didn't think about.
for a unknown reason google indexed some media files that belonged to www.firstsite.com on www.secondsite.com. I still have no clue why. There's no links available (internally anyway). Now I temporarily added disallow: /media/ in robots.txt to stop all media indexing to get rid of the problem temporarily.
The way Umbraco handles this makes it rather cumbersome to stop robots from indexing media on wrong hosts (if a link appears somewhere for some reason). Alternatives I can think of:
1) make up robots.txt or sitemap to handle the media files (pdf's) individually.
2) serv all media files through a handler and use a host property (on media folder) to make sure they are requested from the right domain.
Perhaps that second alternative is something Media protect v next can help out with?
To add that one would need to control the HttpHeader for the individual file. Is that something that could be possible with the help of Media protect too Richard?
Whoa that's a lot of text ;-) . I need to look into this but if it's supported by the major search engines I might implement it and make it configurable to use it or not.
Hehe, yeah, sorry about that. Short version: I found the link to "Serious Robots.txt Misuse & High Impact Solutions" here on the forum, which is basically (afaiu) saying limit the use of robots.txt and use metatags instead, and "To restrict robot access to non-HTML documents like PDF files, you can use the x-robots tag in the HTTP Header."
So - if I can add a HttpHeader within some event for media I could add that x-robots-tag myself it would be nice. However - with the mentioned coming possibility to use an event to stop serving the PDF altogether it might just not be necessary.
Prevent media from being accessible cross sites (hosts)
Hi,
currently all media is accessible to all sites (hosts) on one Umbraco installation (right?):
www.firstsite.com/media/1000/doc.pdf as well as www.secondsite.com/media/1000/doc.pdf
Is there some way to stop this behaviour?
Thanks
Jonas
Hey Jonas,
Do you mean on the front end or in the back office?
If you mean the front end then without some type of login I can't see how this would be possible, with a login prehaps something like Richards package would help http://our.umbraco.org/projects/website-utilities/media-protect
Rich
HI Jonas
As Rich mentioned is it for the Front-end with a login you can use http://our.umbraco.org/projects/website-utilities/media-protect . If you have a different requirement please let me know there is always a VNext and maybe you have a requirement that I didn't think about.
Thanks,
Richard
Hi guys, thanks for your answers,
for a unknown reason google indexed some media files that belonged to www.firstsite.com on www.secondsite.com. I still have no clue why. There's no links available (internally anyway). Now I temporarily added disallow: /media/ in robots.txt to stop all media indexing to get rid of the problem temporarily.
The way Umbraco handles this makes it rather cumbersome to stop robots from indexing media on wrong hosts (if a link appears somewhere for some reason). Alternatives I can think of:
1) make up robots.txt or sitemap to handle the media files (pdf's) individually.
2) serv all media files through a handler and use a host property (on media folder) to make sure they are requested from the right domain.
Perhaps that second alternative is something Media protect v next can help out with?
Regards
Jonas
Hi Jonas,
Strange. Well in VNext there will be an Authenticating Event that you can cancel if the user comes from a wrong domain.
Cheers,
Richard
Very nice!
Cheers
I found that one can use a x-robots-tag in HttpHeader to prevent some pdf's from being indexed (as a better alternative to robots.txt afaiu). http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html
To add that one would need to control the HttpHeader for the individual file. Is that something that could be possible with the help of Media protect too Richard?
Thanks
Jonas
Whoa that's a lot of text ;-) . I need to look into this but if it's supported by the major search engines I might implement it and make it configurable to use it or not.
Thanks,
Richard
Hehe, yeah, sorry about that. Short version: I found the link to "Serious Robots.txt Misuse & High Impact Solutions" here on the forum, which is basically (afaiu) saying limit the use of robots.txt and use metatags instead, and "To restrict robot access to non-HTML documents like PDF files, you can use the x-robots tag in the HTTP Header."
So - if I can add a HttpHeader within some event for media I could add that x-robots-tag myself it would be nice. However - with the mentioned coming possibility to use an event to stop serving the PDF altogether it might just not be necessary.
is working on a reply...