We're currently having space issues due to a large number of pdfbox files being stored in the windows temp directory on our web server. The only reference we have to PDFBox is a pdfbox.dll sitting in the Umbraco bin folder.
Ideally we'd like to store these files on a different drive where we can be sure that space will not be an issue. Could you advise as to how we go about this? Or let me know if I'm completely barking up the wrong tree?
PDFBox doesn't ship with Umbraco as part of the default installation. The first place I would look is the sourceforge page for that product http://sourceforge.net/projects/pdfbox to see if there is some reference there. It sounds like there is a control, macro or other function that has been added in to the Umbraco installation to support this PDFBox solution that is not cleaning up resources at the end of the process, or the PDFBox solution itself creates these temporary files and does not get rid of them.
Check with the development team responsible for your Umbraco installation there and find out what they are doing with PDFBox and if there is a way to schedule the removal of those files if it is a bug/problem with the PDFBox software. The development team may not even be aware that it is an issue right now.
I came across this issue when i hacked umbsearch code, when you use pdfbox it creates tmp file to extract out content. Are you using umbSearch or umbracoUtilities if so then check the code where you are doing anything with pdf box you need to ensure after you are done you close the document eg
public string returnText(string FullPathToFile)
{
PDDocument doc = null;
string res ="";
res = getTextUsingIFilter(FullPathToFile);
//ifilter didnt work use pdfbox
if (res.Length == 0)
{
try
{
doc = PDDocument.load(FullPathToFile);
PDFTextStripper stripper = new PDFTextStripper();
res = stripper.getText(doc);
logMessage(FullPathToFile + " indexed using pdfbox", umbraco.BusinessLogic.LogTypes.Debug);
}
catch (Exception ePdf)
{
logMessage("Error indexing pdf '" + FullPathToFile + "': " + ePdf.ToString(),umbraco.BusinessLogic.LogTypes.Error);
}
finally
{
doc.close();
}
}
return res;
}
I acutally almost brought a host down before I figured out the issue, they were not best pleased :=}
The issue looks to be with the Lucene search used by Umbraco. Lucene uses PDFBox to index PDF docs. The indexes are built to the windows temp directory by default.
A solution is to explicitly set the directory where the indexes are to be written by adding the below line to the web.config (in the app.settings section):
<
addkey="Lucene.Net.lockdir"value="enter directory to write to here" />
Thanks for the help again - wouldnt' have sorted that otherwise!
Windows/temp folder filling up with pdfbox files
We're currently having space issues due to a large number of pdfbox files being stored in the windows temp directory on our web server. The only reference we have to PDFBox is a pdfbox.dll sitting in the Umbraco bin folder.
Ideally we'd like to store these files on a different drive where we can be sure that space will not be an issue. Could you advise as to how we go about this? Or let me know if I'm completely barking up the wrong tree?
Thanks!
PDFBox doesn't ship with Umbraco as part of the default installation. The first place I would look is the sourceforge page for that product http://sourceforge.net/projects/pdfbox to see if there is some reference there. It sounds like there is a control, macro or other function that has been added in to the Umbraco installation to support this PDFBox solution that is not cleaning up resources at the end of the process, or the PDFBox solution itself creates these temporary files and does not get rid of them.
Check with the development team responsible for your Umbraco installation there and find out what they are doing with PDFBox and if there is a way to schedule the removal of those files if it is a bug/problem with the PDFBox software. The development team may not even be aware that it is an issue right now.
CB,
I came across this issue when i hacked umbsearch code, when you use pdfbox it creates tmp file to extract out content. Are you using umbSearch or umbracoUtilities if so then check the code where you are doing anything with pdf box you need to ensure after you are done you close the document eg
I acutally almost brought a host down before I figured out the issue, they were not best pleased :=}
Regards
Ismail
Thanks for your help Chris and Ismail!
The issue looks to be with the Lucene search used by Umbraco. Lucene uses PDFBox to index PDF docs. The indexes are built to the windows temp directory by default.
A solution is to explicitly set the directory where the indexes are to be written by adding the below line to the web.config (in the app.settings section):
<
add key="Lucene.Net.lockdir" value="enter directory to write to here" />
Thanks for the help again - wouldnt' have sorted that otherwise!
Chris,
Would that not just give you the same issue except in a different directory?
Regards
Ismail
Ismail, thanks for your post, that was exactly what I needed!
Cheers,
Umair
is working on a reply...