I have an Umbraco V4 website and I am just deciding what is going to be best to use, the upload datatype or the media picker datatype? I am trying to create a document type called "PDF Document" and I want the Umbraco user to be able to upload PDF's. I assume the advantages of the media section option is that if you are going to use the document on more than one node then you do not have to upload it more than once, and the advantage of the upload datatype is that if you are only going to use it once then you cut out an extra step in the process by not having to upload it to the media section. In the past I have always used the media section for images and documents. Are there any other pro's and con's for each option?
One issue that I have thought of is in the future I may need to implement some kind of lucene search that can search the contents of PDF's, how would I control what gets shown in the search? I wouldn't want items from the media section to be displayed in the search unless they were referenced from within the website under a PDF Document node.
Using the upload data type allows for better rollback than the media picker. Because the upload data type stores the full file system path you can use the Umbraco audit trail to go back to previously uploaded files (assuming that the file still exists and wasn't overwritten with a newer version obviously).
Thanks for the response, I'm not sure if the last sentence resolves the issue that I have unless it is just that I don't understand your response. The issue I am worried about it the search only displaying documents that are referenced in a node/page. For example I may upload a file to the media section but never actually add it to a node using the media picker, I would therefore not want this media file to be picked up in a search. Is this possible? Or would I have to make sure only files that I want to appear in a search are in the media section (in which case it would be easier to us the media picker than the upload because the user can easily delete files that are stored in the media section).
It depends on what search "engine"/package you are using - if fx you are using XSLT search you can, as far as i know, define where is shall search fairly easy.
Another thing you could do is to make some sort of "do not show in search" property on the uploaded files in the media library.
It is likely to be something like Umbraco Examine so that it can search the content of PDF's. I just wondered if it is possible to only find documents that have been selected in a node using the media picker. I'm assuming probably not and the case may be that I the user just has to delete documents they do not want in the search or maybe it is possible to exclude a folder in the media section such as an archive folder???
Lucene takes a string, adds it into it's index and then allows you to search on that string. If you can work out a way to extract the text of the PDF into a string then you can add it to the index and search on it.
Using upload or media picker?
Hi,
I have an Umbraco V4 website and I am just deciding what is going to be best to use, the upload datatype or the media picker datatype? I am trying to create a document type called "PDF Document" and I want the Umbraco user to be able to upload PDF's. I assume the advantages of the media section option is that if you are going to use the document on more than one node then you do not have to upload it more than once, and the advantage of the upload datatype is that if you are only going to use it once then you cut out an extra step in the process by not having to upload it to the media section. In the past I have always used the media section for images and documents. Are there any other pro's and con's for each option?
One issue that I have thought of is in the future I may need to implement some kind of lucene search that can search the contents of PDF's, how would I control what gets shown in the search? I wouldn't want items from the media section to be displayed in the search unless they were referenced from within the website under a PDF Document node.
Using the upload data type allows for better rollback than the media picker. Because the upload data type stores the full file system path you can use the Umbraco audit trail to go back to previously uploaded files (assuming that the file still exists and wasn't overwritten with a newer version obviously).
Also, you don't have to worry about the media item being deleted and the usages are not found (but you can use this package to find usages of a media item anyway: http://our.umbraco.org/projects/thefarm-media-link-checker/user-forum/8050-MemberType-images).
As to your other question, if you can extract your textual contents of a PDF to a string then Lucene will index them.
Hi Slace,
Thanks for the response, I'm not sure if the last sentence resolves the issue that I have unless it is just that I don't understand your response. The issue I am worried about it the search only displaying documents that are referenced in a node/page. For example I may upload a file to the media section but never actually add it to a node using the media picker, I would therefore not want this media file to be picked up in a search. Is this possible? Or would I have to make sure only files that I want to appear in a search are in the media section (in which case it would be easier to us the media picker than the upload because the user can easily delete files that are stored in the media section).
Hi
It depends on what search "engine"/package you are using - if fx you are using XSLT search you can, as far as i know, define where is shall search fairly easy.
Another thing you could do is to make some sort of "do not show in search" property on the uploaded files in the media library.
It is likely to be something like Umbraco Examine so that it can search the content of PDF's. I just wondered if it is possible to only find documents that have been selected in a node using the media picker. I'm assuming probably not and the case may be that I the user just has to delete documents they do not want in the search or maybe it is possible to exclude a folder in the media section such as an archive folder???
Lucene does not search inside a PDF document.
Lucene takes a string, adds it into it's index and then allows you to search on that string. If you can work out a way to extract the text of the PDF into a string then you can add it to the index and search on it.
is working on a reply...