Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Simon Dingley 1474 posts 3431 karma points c-trib
    Mar 08, 2011 @ 16:27
    Simon Dingley
    0

    Media Published Status

    This is really just a case of throwing an idea out for discussion at the moment but here is the scenario...

     

    I have a client who was recenty contacted by a journalist regarding a child protection policy(file) on an Umbraco site that was out of date. The file was in the media section but wasn't linked to any documents but obviously the media item had already been indexed by search engines so the clients response was "That’s quite worrying that a file in the media library has public access!". Ordinarily this is not an issue for most site owners but in this case has caused an issue.

    I am looking to the community for ideas on how to address this issue in order to satisy the clients requirement?

    Anything I can think of would most likely introduce a performance overhead that in 99% of cases would be unnecessary. Any ideas? My initial thought would be to ensure that any requests for content in the Media section have to have come via the current hosts IP address?

    Thanks, Simon

     

  • Tom Maton 387 posts 660 karma points
    Mar 08, 2011 @ 16:38
    Tom Maton
    1

    Hi Simon,

    Couldn't you add into the robots.txt to disallow the media folder so they are not indexable by the search bots?

    I know this would not resolve this immediatly but would be a start for new media items not to be indexed.

    Also a snippet from google: "If you own the site, you'll need to make the changes to your website yourself and then request removal of the problematic page from Google's search results using the URL removal tool in Webmaster Tools"

    http://www.google.com/support/webmasters/bin/answer.py?answer=164734

    Thanks

    Tom

  • Lee Kelleher 4026 posts 15836 karma points MVP 13x admin c-trib
    Mar 08, 2011 @ 16:40
    Lee Kelleher
    0

    Hi Simon,

    The only way a search engine has indexed the media item is because someone has linked to it directly - making it crawlable.

    As you know, the URL structure of the media folders isn't easy to "guess" (because they use the property ids from the database) - so Joe Bloggs isn't going to work it out.

    Quick wins...

    • Add "/media" to your robots.txt (for those that honour it)
    • Restrict access to the /media folder at server-level (IIS) - as you say via the host's IP address.
    • Move the /media folder outside the web-root (somehow) and use a proxy script to handle the requests?

    There's probably other ways...

    Cheers, Lee.

  • Simon Dingley 1474 posts 3431 karma points c-trib
    Mar 08, 2011 @ 16:46
    Simon Dingley
    0

    Tom, in this instance the file has been removed however adding something to the robots.txt file kind of tells the world it is there anyway, I think they are looking for something more secure.

    Lee, it was linked previously to a document that has since been updated and the link removed.

    I am thinking of perhaps adding a new property to their media items to allow them to protect them from being linked to directly from anywhere other than content on the same IP address. I would then need to check on each request for "protected" media items which I am sure will have overhead but likely do the job.

  • Douglas Robar 3570 posts 4711 karma points MVP ∞ admin c-trib
    Mar 08, 2011 @ 17:02
    Douglas Robar
    1

    The fundamental issue is a hard one - when a media item is updated or unliked (that is, not used directly by a link in an RTE), should it be removed from the filesystem? In your case it seems the answer might be yes, but generally that would be overkill and even a problem. For instance, you might have a photo gallery macro that reads a folder in your media section - no direct links but deleting the images would be a bad idea because it would break the photo gallery.

    Even so, our friend Tim Gaunt had a blog post about this situation and a tool that might be helpful. http://blogs.thesitedoctor.co.uk/tim/2008/09/03/Clean+Out+Unused+Media+Items+From+Umbraco+Media+Folder.aspx

    Keep the conversation going, this is an important topic!

    cheers,
    doug.

  • Simon Dingley 1474 posts 3431 karma points c-trib
    Mar 08, 2011 @ 17:15
    Simon Dingley
    0

    Doug, I think you may have touched on a potential solution here by looking at it from a different angle. They(and me) may have been looking at it from the wrong perspective and what they really need is to be able to identify orphaned files in a similar way to Tim's solution and then make a decision as to whether the media item stays or goes.

    Since I have to do this for the client anyway I think I will package something up for this.

  • Simon Dingley 1474 posts 3431 karma points c-trib
    Mar 08, 2011 @ 17:38
    Simon Dingley
    0

    Having had another discussion with the client on this matter they have agreed that the situation is an exception rather than the rule and that a solution along the lines of what you recommended Doug, to remove orphaned items from the media library would be an acceptable solution so watch this space!

Please Sign in or register to post replies

Write your reply to:

Draft