We have an issue where media items that are deleted from the media library and the recycle bin are still being indexed and accessed by google, and the links to them are still working.
Is this a known bug? and is there an easy way to identify and remove all these media items?
Matt, i am pretty sure that in U6 media items do not get deleted physically when deleted in the CMS. I have a high profile client who sometimes run in this problem when PDF's are still indexed by google. I delete those files on the server, and create a removal request by using Google Webmaster tools.
Yes, they are by default in the media folder, the link has probably a number preceded by the filename, that number is physical folder on the hard drive containing the file.
Seems odd that files are not deleted... we have deleted 100s over the life of the site but only a few are still being indexed.
Is there anyway someone can confirm this as if not we'll need to think about how we maintain that directory of files as it could get huge if nothing is ever removed.
deleted documents still available
Hi, This is a similar issue to https://our.umbraco.org/forum/core/general/60197-Deleted-Media-Items-Still-Accessible but I have started a new thread as that is over 18 months old.
we are using umbraco 6.2.5.
We have an issue where media items that are deleted from the media library and the recycle bin are still being indexed and accessed by google, and the links to them are still working.
Is this a known bug? and is there an easy way to identify and remove all these media items?
Many Thanks, Matt
Hi Matt,
The files for media items in the recycle bin are still on the disk, so can still be accessed through their http link.
I think they only get deleted from disk when you empty the recycle bin.
Dave
Hi Dave, Thanks for response - the recycle bin has been emptied but the documents are still available so something is going wrong somewhere?
Any thoughts anyone?
Regards, Matt
Bump...
Anyone else got any ideas?
Thank you
Matt, i am pretty sure that in U6 media items do not get deleted physically when deleted in the CMS. I have a high profile client who sometimes run in this problem when PDF's are still indexed by google. I delete those files on the server, and create a removal request by using Google Webmaster tools.
Thanks Sjors for your reply.
I can use webmaster tools to remove from the google index.
I don't have server permissions but can ask someone who does, I assume they are all just in one big media directory?
many thanks, Matt
Yes, they are by default in the media folder, the link has probably a number preceded by the filename, that number is physical folder on the hard drive containing the file.
Thanks Sjors, much appreciated.
Seems odd that files are not deleted... we have deleted 100s over the life of the site but only a few are still being indexed.
Is there anyway someone can confirm this as if not we'll need to think about how we maintain that directory of files as it could get huge if nothing is ever removed.
Thanks a lot Matt
Hi Matt, we had this problem also with one of our clients, they used Umbraco for Real Estate objects with quite large files (up to 70 MB per real estate object), i never implemented it since they left as client, but an example is posted her: https://our.umbraco.org/forum/developers/api-questions/4390-Delete-uploaded-files-upon-node-deletion-for-custom-upload-datatype-properties
That's very useful, thank you :-)
is working on a reply...