finding a pdf

Press Ctrl / CMD + C to copy this to your clipboard.

Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at

Graham Thomson 56 posts 136 karma points

Nov 30, 2019 @ 10:13

0

Finding a PDF

Hi

Is there away to search on the node number in the backend of Umbraco so that I can search on 6524 in the backend of Umbraco.

media/6524/filename.pdf

The reason I ask is that Google has indexed a file but the search result returns a 404 in Umbraco.

When I search on the filename it is not appearing. It is not in the recycle bin either.

Thanks Graham

Copy Link
Marc Goodson 2157 posts 14434 karma points MVP 9x c-trib

Nov 30, 2019 @ 11:39

1

Hi Graham

Confusingly the ID in the media URL isn't the corresponding node id of the media item... If so you could just visit a media item in the back office and change the ID in the query string to be that number and load the errant item

So how to find the media item in Umbraco?

If you use the search in the back office only the 'name' field in Umbraco is searched... So if you know what this was for the file or if it was dragged to upload it may be the filename...

If not the you can search the raw examine indexes... In the developer section choose the examine tab and locate and expand the indexsearcher... You can the search here for the filename and if it finds a match the first column of the search result will contain the Umbraco I'd for the media item that you can use to subscribe in the query string to load in the back office...

(There is a PR in to allow searching by media filename in the back office but that won't help you now)

If you can't find the item in the internal index

Then later versions of Umbraco 7 have a database table called 'umbracoMedia' that stores the path that you could search in

If the media item has been deleted from the media section... It will be in the recycle bin .. but the file will still be accessible to the outside world via the direct URL... So you'd need to delete the item from the recycle bin to remove from disk...

....happy hunting!

Regards

Marc

Copy Link
Graham Thomson 56 posts 136 karma points

Nov 30, 2019 @ 12:33

0

Hi

Thanks so much! I'll have a look into all of these.

Just to follow up on one thing you are saying - I do find correlation between some other PDF media files in terms on node number being same in the backend as the published URL.

Is there a way that I can create a redirect so that the offending "missing pdf" that Google is finding could just redirect to where I know it is?

I have a certain level of access to Umbraco (I have the Developer tab) but might need to go to the digital agency we work with.

Thanks again Graham

Copy Link
Marc Goodson 2157 posts 14434 karma points MVP 9x c-trib

Nov 30, 2019 @ 14:31

0

Hi Graham

The 'number' in the url is calculated from the folders existing already in the media folder on disk, what Umbraco )in V7) does is scan the folders on disk to find the highest numbered folder - and then the next media upload will get this highestnumbered folder + 1 in it's Url.

Whereas the unique umbraco node Id, of an item is across content, media, members etc so would not be sequential. (I know it's not intuitive!, but I think it's to ensure uniquenessa and to avoid clashes when saving files to disk, in V8, the numbers have been replaced by guids to ensure uniqueness), but it's a pain that there is no correlation!

With regard to a redirect, then if the file exists on disk, the direct request for the file is a static request and will be served first by the webserver before any redirect package or Umbraco logic can redirect it - so if you can't track down the file or remove it from disk, you could setup an IIS redirect rule in the web.config to handle the redirect to prevent it being served - if you can delete the file on disk, then an existing redirects package such as Skybrud Redirects could redirect to the new media url - or by writing a custom IContentFinder or you might have something already setup to handle redirects like this on your site.

Finally you can request Google via WebmasterTools to forget the indexed url, if it contains sensitive information, and you need it to go asap! https://support.google.com/webmasters/answer/1663419?hl=en

regards

Marc

Copy Link
Graham Thomson 56 posts 136 karma points

Dec 01, 2019 @ 10:45

0

Thanks so much for the great advice Marc!

Copy Link
is working on a reply...

This forum is in read-only mode while we transition to the new forum.

You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies

Flag this post as spam?

Finding a PDF