Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Chris Koiak 700 posts 2626 karma points
    Apr 22, 2010 @ 10:06
    Chris Koiak
    1

    Umbraco Examine - PDF Indexing

    Hi,

    We're looking to incorporate PDF indexing into Umbraco Examine. Has anyone done this in the past and has a suggestion for the best approach?

    Or ideally, an extension/package that is already built? :-D

    I've read the suggestions on http://www.farmcode.org/post/2009/04/20/Umbraco-Examine-v4x-Powerful-Umbraco-Indexing.aspx, but I was wondering if the community has a recommended approach for this.

    Thanks,

    Chris

  • Darren Ferguson 1022 posts 3259 karma points MVP c-trib
    Apr 22, 2010 @ 10:46
    Darren Ferguson
    0

    Chirs - I had a package that extracted metadata from PDF files and set the metadata as properties of the media item. it doesn't however extract the body text of the PDF.

    Alternatively you could use my XSL FO package to generate your PDF's from Umbraco content nodes and just index the content nodes. Obviously if you have existing PDFs they'd have to be migrated.

    HTH.

     

     

  • Chris Koiak 700 posts 2626 karma points
    Apr 22, 2010 @ 11:02
    Chris Koiak
    0

    Hi Darren,

    It's established PDF files, so creating the the PDFs from content nodes isn't an option. It's really the indexing of PDF text I'm looking for.

    Nice Package though, I can see a number of uses for it in future projects.

    Chris

  • Dirk De Grave 4541 posts 6021 karma points MVP 3x admin c-trib
    Apr 22, 2010 @ 13:27
    Dirk De Grave
    1

    Chris,

    Can use a number of IFilter implementations for extracting data from any document. Here's the one for pdf's. Seen some tweets re pdf indexing as well (i was all about PDFBox - be it java based...)

     

    Cheers,

    /Dirk

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 22, 2010 @ 13:29
    Ismail Mayat
    0

    Dirk,

    You seen any code or know where to plug the ifilter stuff into examine. Slace if your reading you going to do an examine grok session at cg10?

    Regards

    Isamil

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Apr 22, 2010 @ 14:15
    Aaron Powell
    1

    In answer to does Examine (well, Lucene) support PDF indexing see this post - http://www.aaron-powell.com/lucene-net-overview

  • Chris Koiak 700 posts 2626 karma points
    Apr 27, 2010 @ 12:24
    Chris Koiak
    0

    Thanks for the feedback, if we build anything that can be packaged I'll make sure to post it.

     

  • Murray Roke 503 posts 966 karma points c-trib
    Apr 28, 2010 @ 05:51
    Murray Roke
    0

    Hi Chris,

    I'm trying to do exactly the same thing, I've found recommendations for iTextSharp to get the text from the PDF, but I'm not sure how to get that text into the index.

    I think I want to combine my node data and my pdf data into one Lucene-Document. Is this easy to do?

    This will mean search results bring up the page that includes the relevant attachment thus providing context, rather than bringing up the attachment directly.

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Apr 28, 2010 @ 06:16
    Aaron Powell
    0

    How are you extracting the text via iTextSharp? According to all the documentation I've read it is not possible to get back blocks of text from a PDF document.

    Quote:

    You can't 'parse' an existing PDF file using iText, you can only 'read' it page per page.
    What does this mean?
    The pdf format is just a canvas where text and graphics are placed without any structure information. As such there aren't any 'iText-objects' in a PDF file. In each page there will probably be a number of 'Strings', but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines. In short: parsing the content of a PDF-file is NOT POSSIBLE with iText. Post your question on the newsgroup news://comp.text.pdf and maybe you will get some answers from people that have built tools that can parse PDF and extract some of its contents, but don't expect tools that will perform a bullet-proof conversion to structured text.
    What iText DOES provide is the possibility to READ a PDF document and copy an entire page of this file into the PDF file you are constructing from scratch. This can be useful if you want to create a new document based on (an) existing document(s). You can add a Watermark, pagenumbers,...

    See: http://itextsharp.sourceforge.net/tutorial/ch01.html

  • Murray Roke 503 posts 966 karma points c-trib
    Apr 28, 2010 @ 06:34
    Murray Roke
    0

    I was looking at this:

    http://stackoverflow.com/questions/83152/reading-pdf-documents-in-net/84410#84410

    I haven't actually got it working yet, so I may have the problems? :-\

    Do you recommend any other ways to get the text data?

  • Murray Roke 503 posts 966 karma points c-trib
    Apr 28, 2010 @ 06:40
    Murray Roke
    0

    For searching purposes the text doesn't need to be pretty nor well structured, probably doesn't even need to be in the right order?

  • Casey Neehouse 1339 posts 483 karma points MVP 2x admin
    Apr 28, 2010 @ 07:45
    Casey Neehouse
    0

    Several years ago (3 or 4), I had to implement a seach that indexed pdfs.  I used Searcharoo code as a starting point (I was indexing pages, not the data source).  I ended up using an iFilter implementation that indexed pdf and other binary documents given the iFilter was installed on the machine, or configured to load directly.  At the time, I had to override the Adobe iFilter, as it was known to fail due to some pathing bugs.  

    Anyhow, I think it would be wonderful to implement the iFilter parsers where possible, and perhaps have configuration as to which files are parsed, with the ability to parse undefined "*" files with the iFilter by default.

    Case

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Apr 28, 2010 @ 12:55
    Aaron Powell
    5

    Once I get a few projects that have very immediate deadlines I'll be creating a built-in PDF indexer for Examine.

  • Chris Koiak 700 posts 2626 karma points
    Apr 28, 2010 @ 14:16
    Chris Koiak
    0

    Great news!

    Any indication of when this would be available? Next couple of months?

    Chris

  • Sebastiaan Janssen 5060 posts 15522 karma points MVP admin hq
    Apr 28, 2010 @ 15:15
    Sebastiaan Janssen
    5

    This is from an e-mail I sent to Aaron recently, if anybody wants to start implementing PDF searching now, without iFilters.

    It gets all of the readable text from a PDF, you could store it in some node in Umbraco and then it's searchable through Examine immediately.

    Just wanted to let you know that I've found a very simple way to extract text from a PDF file through a library called PDFBox.
    Found this article http://www.codeproject.com/KB/string/pdf2text.aspx and tried it out, works as advertised.

    I had to copy these to my bin folder:
    FontBox-0.1.0-dev.dll
    IKVM.GNU.Classpath.dll
    IKVM.Runtime.dll
    PDFBox-0.7.3.dll
    But I only had to reference IKVM.GNU.Classpath.dll and PDFBox-0.7.3.dll to be able to build the code.
    This is a nice solution without those ugly iFilters, so I hope it helps you for Lucene as well!

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 28, 2010 @ 15:34
    Ismail Mayat
    0

    Sebastiaan,

    where in examine did you have to make changes to implement pdf indexing?

    Regards

    Ismail

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Apr 29, 2010 @ 00:32
    Aaron Powell
    0

    It'll be available once I get a few more pressing projects out of the way.

    Examine will be running as Out-Of-Band releases to Umbraco (like ASP.NET MVC has done with Visual Studio) so there's no promises it'll make the 4.1 release.

    But if someone wants to write their own indexer be my guest, it is a provider model and you're completely welcome to create your own, it's what it's design for ;)

  • Murray Roke 503 posts 966 karma points c-trib
    Apr 29, 2010 @ 02:03
    Murray Roke
    0

    Here's what I've created so far, this indexes DOCX files because they're the simplest scenrio. Document text is rolled into the node that it is attached to.

    I'm not really sure what I'm doing, but it seems to work so far, so critical feedback welcome.

    Code:(using this codeproject sample to extract text from docx files)

        public class AttachmentAndSecurityAwareIndexer : UmbracoExamine.LuceneExamineIndexer
        {
            protected override Dictionary<string, string> GetDataToIndex(System.Xml.Linq.XElement node, Examine.IndexType type)
            {
                StringBuilder fileText = new StringBuilder();

                // find all files picked in the 'related downloads' property (multiple media picker)
                string values = node.Elements("data").Single(e => e.Attribute("alias").Value == "relatedDownloads").Value;
                foreach (var value in values.Split(','))
                {
                    int mediaId;
                    if (int.TryParse(value, out mediaId))
                    {
                        Media media = new Media(mediaId);
                        if (media.Id == 0)
                            break;

                        string extension = (string) media.getProperty("umbracoExtension").Value;
                        string filename = HttpContext.Current.Server.MapPath((string)media.getProperty("umbracoFile").Value);

                        fileText.AppendLine();
                        // depending on the extension use various methods to extract the text that will go into the lucene index.
                        switch (extension.ToUpperInvariant())
                        {
                            case "DOCX":
                                fileText.Append((new DocxToText(filename)).ExtractText());
                                break;
                        }
                    }
                }

                // Get the Base Data to index
                var result = base.GetDataToIndex(node, type);
               
                // add the file text to the data to index.
                if (!result.ContainsKey("bodyText"))
                    result.Add("bodyText", fileText.ToString());
                else
                    result["bodyText"] += fileText;
                return result;
            }
        }

    Configuration changes to use the new class: (based on default configuration documentation)

    <Examine>
        <ExamineIndexProviders enableDefaultEventHandler="true">
            <providers>
                <add name="GlobalMembersIndex" type="Terabyte.UmbracoWebsite.Models.AttachmentAndSecurityAwareIndexer, Terabyte.UmbracoWebsite"
                    ...

     

  • Murray Roke 503 posts 966 karma points c-trib
    Apr 29, 2010 @ 04:53
    Murray Roke
    0

    Add this case to index PDF files (this uses the libraries mentioned in sebastiaans post)

    case "PDF":
      PDDocument doc = PDDocument.load(filename);
      PDFTextStripper stripper = new PDFTextStripper();
      fileText.Append(stripper.getText(doc));
      break;
  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 29, 2010 @ 10:31
    Ismail Mayat
    0

    Murray,

    Take a look at Niel's code for the old http://umbracoext.codeplex.com/sourcecontrol/network/Show?projectName=umbracoext&changeSetId=49680">umbSearch goto umbSearch that makes use of factory pattern you implement IUmbracoSearchFileFilter so that way you can plug in your own extensions easily doc, pdf,rtf whatever.

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Apr 30, 2010 @ 12:21
    Ismail Mayat
    0

    Slace,

    Looking at murrays code i can see how he has supplied his own provider, the method

     

    public class AttachmentAndSecurityAwareIndexer : UmbracoExamine.LuceneExamineIndexer
       
    {
           
    protected override Dictionary<string, string> GetDataToIndex(System.Xml.Linq.XElement node, Examine.IndexType type)
           
    {

     

    GetDataToIndex will only nodes of type content be passed to it by examine or will it also receive nodes of type media.  If it does not receive nodes of type media what do i need to do so that i can index media nodes?  Could i do it with action handler for media after save and somehow add it to index using examine api?

    Regards

    Ismail

  • Aaron Powell 1708 posts 3046 karma points c-trib
    May 03, 2010 @ 01:05
    Aaron Powell
    0

    Ismail - it'll get all the nodes (content and media) with the IndexType defining which one it is.

    Shan and I just realised that there is no way to restrict and index to being just content or just media (unless you restrict the content types) so we may add that as a configuration property.

  • Murray Roke 503 posts 966 karma points c-trib
    May 05, 2010 @ 05:40
    Murray Roke
    0

    An update to using this method, I spent a while figuring out I was adding fields for protected nodes, so this guard statement at the top of your GetDataToIndex method should ensure your Indexer plays nice when supportProtected="false", I'm not sure f there is a more elegant way of doing this?

            protected override Dictionary<string, string> GetDataToIndex(System.Xml.Linq.XElement node, Examine.IndexType type)
            {
                // Get the Base Data to index
                var result = base.GetDataToIndex(node, type);

                // check we have a result, if we have no fields this is probably a protected node and we shouldn't add anything else.
                if (result.Count == 0)
                    return result;
  • Andrew Waegel 126 posts 126 karma points
    Sep 15, 2010 @ 20:09
    Andrew Waegel
    0

    Any progress on this? I need a PDF indexing solution for an upcoming site and would much prefer to use a community-supported solution. My C# skills are not stellar but I'm happy to put some time and effort into it.

  • Andrew Waegel 126 posts 126 karma points
    Sep 15, 2010 @ 20:19
    Andrew Waegel
    0

    Hang on, now I see that PDF indexing has been added to Examine RC3 on CodePlex. Anyone implement this successfully yet? I'll be trying soon.

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Sep 16, 2010 @ 01:35
    Aaron Powell
    0

    It works fine in our test suite :P

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Sep 16, 2010 @ 06:02
    Shannon Deminick
    1

    The latest code of Examine has PDF indexing support, and it also exists in RC3.

    I've published the DLLs of the latest checkin (57217) which surpasses RC3 and is simplified. If you'd like to try it, you can download it from:

    http://shazwazza.com/Content/Downloads/UmbracoExamine.57217.zip

    The PDF indexer provider looks like this:


    type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" />

    The PDF searcher provider looks like this:


    type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />

    The PDF index set is simple and looks like this:


    All PDF data goes into it's own index because the content could be quite huge and is better left to it's own index. The PDF indexer will index media items only and will only index files that are '*.pdf' and are contained in a property called 'umbracoFile' (these 2 things can be overridden in the Index provider if necessary). If you need it to index PDFs that are in a content node, then you'll have to use the API to do this.

    Hopefully we'll get the RTM out in the next week or two.

  • Neal Caselton 2 posts 22 karma points
    Sep 23, 2010 @ 13:08
    Neal Caselton
    0

    Great stuff ! I've managed to deploy the new DLLs and build up the indexes for web content and PDF's. However I'm having trouble searching against the newly created PDF Index.

    Is there any documentation or examples of how to query against the PDF Index as when viewing the Index in an analyser it's not clear as to how this can be achieved?

    Thanks in advance.

     

    Please ignore the above, I was being a <DIV> as I hadn't copied across the UmbracoExamine.PDF.dll that meant the index wasn't created correctly...

     

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Sep 24, 2010 @ 08:47
    Shannon Deminick
    0

    Also, just in case you come across this... some PDFs are just not indexable/readable if they have been saved in certain ways with security, etc... You might come across this and you clients might complain but the fact is that some PDFs just can't be read.... at least with itextsharp anyways.

  • Andrew Waegel 126 posts 126 karma points
    Oct 09, 2010 @ 03:26
    Andrew Waegel
    0

    Shannon, your example PDF index set  didn't come through, can you repost? I know it's probably really simple but it would help those of us trying to get this going. Meanwhile I'll try to sort it out myself and post an example if it works.

    - Andrew

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Oct 10, 2010 @ 09:41
    Shannon Deminick
    0

    PDF Indexer:

    <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" />

    PDF Searcher:

    <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />

    PDF Index set is simple it is just:

    <IndexSet SetName="PDFIndexSet" IndexPath="App_Data\PDFIndexSet" />

    You don't need to define anything as it's automated. It will index all media items that have a property of umbracoFile (which is already the property name of the Image and File media types) where the umbracoFile is a PDF.

    Please download latest Examine version here, there's a few bugs fixed. This will be released as v1.0 RTM this week.

    http://shazwazza.com/Content/Downloads/UmbracoExamine57796.zip

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 10, 2010 @ 14:26
    Ismail Mayat
    0

    Shannon,

    When you want to search over both the content and pdf indexes what is the examine syntax?  I know in lucene you can do cross index searching but couldnt quite see how to do it via examine?

    Regards

    Ismail

  • Andrew Waegel 126 posts 126 karma points
    Oct 11, 2010 @ 06:46
    Andrew Waegel
    0

    +1 for the cross-index searching as well.

    It seems like we would want to specify multiple SearchProviderCollections in an ExamineManager instance, but it's not clear how to do that - we only see the SearchProviderCollection[] property.

    If I happen to make this work while monkeying with it I'll post some results while waiting for the devs to check in.

     

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Oct 11, 2010 @ 09:46
    Shannon Deminick
    2

    All you'd need to do is concatenate your searches between the providers:

    var combinedResults = 
        ExamineManager.Instance.SearchProviderCollection["CWSSearcher"].Search("blah"true)
        .Concat(
            ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].Search("blah"true));

    You can use this same concept when searching with the Fluent API too.

    Please be aware however, that that 'Score' value returned between 2 searches is not relavent. The 'Score' value is only relavent to the results of one search regardless of the index. So you couldn't compare the 'Score' value between the concatenated results.

    Another approach would be to store your Content + PDF data into one index. The reason why we didn't implement this is because your PDF index could get really huge and we didn't want that to affect your Content/Media index. If you wanted however, you could use the API + events to get your PDF data into your Content/Media index.

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Oct 11, 2010 @ 10:44
    Aaron Powell
    0

    Examine doesn't use MultiSearcher, if you needed that you'll have to implement a custom searcher.

    Otherwise Shannon's solution is what you'll need to do.

  • Andrew Waegel 126 posts 126 karma points
    Oct 11, 2010 @ 11:28
    Andrew Waegel
    0

    Thanks for the replies. I'd like to try combining the content & PDF data into one index; how would I do that?

    Would I just make one IndexSet for everything, with two IndexProviders (one for PDF, one for regular content) and one SearchProvider?

    And if so, what type would I use for the search provider?

     

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Oct 11, 2010 @ 11:35
    Aaron Powell
    0

    Create you own indexer to combine the data, or create your own indexer that implements MultiSearcher

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 11, 2010 @ 12:57
    Ismail Mayat
    0

    Slace,

    Or tap into media events and push into index there?

    Regards

    Ismail

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Oct 11, 2010 @ 14:35
    Aaron Powell
    0

    What events are you thinking of using?

    I think it'd be easier to create either a custom indexer or searcher

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 11, 2010 @ 14:58
    Ismail Mayat
    0

    Slace,

    The media new, delete, update events tapping into those but looking at it logically creating your own indexer or searcher seems the better route.

    Regards

    Ismail

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Oct 11, 2010 @ 15:04
    Shannon Deminick
    1

    You could do it a bit 'dodgy' and just listen to the indexed event of the PDF indexer and add the results to your Content Indexer using ReIndexNode

    This means that you'll have PDF data in two indexes... but it would be very little code to write.

  • Andrew Waegel 126 posts 126 karma points
    Oct 11, 2010 @ 19:54
    Andrew Waegel
    0

    Thanks Shannon, I like the idea of a process that would put the indexed PDF text into the content indexer. The .NET part of this is a little advanced for me but I'm happy to give it a try.

    I think this means I'm fetching the Umbraco Examine source, making a new UmbracoExamine.PDF.PDFIndexer that does what you say - adds the extracted text to the to the content indexer using ReIndexNode - then, rebuilding and copying over the DLL and using the new indexer method for my IndexProvider that works on the PDF files?

    Sorry for the noob questions, hope to get this worked out, and happy to share the results if I do.

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Oct 12, 2010 @ 00:22
    Aaron Powell
    0

    No, Examine raises events that you can add handlers to, like you would if you were adding one do the Document object in Umbraco.

    Check out Shans CG10 slides for the event list

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    Oct 12, 2010 @ 06:57
    Shannon Deminick
    0

    Also, RC3 is still an RC!!!!!!!!!! .

    this is by no means final , there are bugs in the RC and there's alot since changed in the latest version which will become v1.0. Be mindful that there will be some breaking changes... for most people it should be painless to upgrade.

    here's some release notes for v1.0

    http://examine.codeplex.com/releases/view/50781

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 12, 2010 @ 18:03
    Ismail Mayat
    0

    Shannon or Slace,

    Is there an event for when an item is removed from index? I am looking at implementing shannons dirty hack of putting pdf stuff into content index so i am tapping into event 

     

    ExamineManager.Instance.IndexProviderCollection[PdfIndex].GatheringNodeData 
                    += new System.EventHandler<IndexingNodeDataEventArgs>(ExamineEvents_MediaGatheringNodeData);

     

    and at that point i will put the item into the content index. However when i remove the pdf i also need to remove it from my content index hence need to hit that event. Worse case I can tap into umbraco media delete event and do it from there.

    Regards

    Ismail

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Oct 13, 2010 @ 00:14
    Aaron Powell
    1

    IndexDeleted event is fired when an index of a node is removed.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 13, 2010 @ 10:58
    Ismail Mayat
    0

    Slace,

    The delete delegate has signature:

    void ExamineEvents_IndexDeleted(object sender, DeleteIndexEventArgs e)

    From e I cannot get the id of the node being delete so is it possible to get it from sender if so what can i cast sender to ? Or am I missing a trick?

    Regards

    Ismail

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Oct 13, 2010 @ 15:34
    Ismail Mayat
    0

    Slace,

    Ignore last post i have figured it out:

     nodeId = e.DeletedTerm.Value;

    which is the nodeid of the item being deleted.

    Regards

    Ismail

  • Euan Rae 11 posts 31 karma points
    Nov 21, 2010 @ 23:35
    Euan Rae
    0

    I am using examine for site search + pdf searching; is it possible to set the PDFs so it only indexes (and searches on) the metadata for PDFs?

  • Aaron Powell 1708 posts 3046 karma points c-trib
    Nov 22, 2010 @ 03:24
    Aaron Powell
    0

    Nope, you'd have to write your own indexer to only insert the metadata from a PDF. I'm not really sure if iTextSharp (which we use) can extract metadata, I'd assume it does.

  • MikeD 92 posts 112 karma points
    Aug 23, 2012 @ 21:02
    MikeD
    0

    Hi folks,

    Way late to this conversation, but this thread has gotten me so close to implementing my client's request I can almost taste success.  The one thing I do not understand is the combining of either indexers or searchers.

    Here's what I ahve now (Umbraco 4.8.0)

    In ExamineSettings:

    <add name="RazorSiteIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
    supportUnpublished="false"
    supportProtected="false"
    analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
    <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
    supportUnpublished="false"
    supportProtected="false" />
    <add name="RazorSiteSearcher"
    type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"/>
    
    <add name="PDFSearcher" 
    type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    

    and in my cshtml file:

    var searcher ExamineManager.Instance.SearchProviderCollection["RazorSiteSearcher"];

    Both searchers appear to be doing their jobs, but I need to combine them both into my results page.

    Any assistance would be very much appreciated.

    -Mike D

  • MikeD 92 posts 112 karma points
    Aug 23, 2012 @ 23:58
    MikeD
    0

    Hrm, it apprears there is a problem with the PDF Searcher.  I really shoulda tested before I posted...

    When I set the searcher collection to PDFSearcher, I get an error on the results page:

    Error loading MacroEngine script (file: SearchResults.cshtml)

    2 questions... first, how do I get more info in the error?  That might help me figure out what is wrong, and 2... what could be wrong?  lol  

  • Nathan Woulfe 447 posts 1665 karma points MVP 5x hq c-trib
    Aug 24, 2012 @ 01:21
    Nathan Woulfe
    0

    Mike - append '?umbDebugShowTrace=true' to the url, and find the angry red text...

  • MikeD 92 posts 112 karma points
    Aug 24, 2012 @ 02:55
    MikeD
    0

    Nathan,

    Thanks for the reply... unfortunately still nothing... is there anything else I need to do to make that work?

  • Nathan Woulfe 447 posts 1665 karma points MVP 5x hq c-trib
    Aug 24, 2012 @ 03:50
    Nathan Woulfe
    0

    Is the below key present in your web config? That should be enough to enable the trace, which will show you where the problem is

    <appSettngs>
    ...
    <add
    key="umbracoDebugMode"value="false"/>
    ...
    </appSettings> 

    I haven't used the PDF indexer, so won't be any real help on that front!

  • MikeD 92 posts 112 karma points
    Aug 24, 2012 @ 15:48
    MikeD
    0

    Still no more detail... grrr...

    <add key="umbracoDebugMode" value="true" />

    This is like trying to chase down a Windows error... lol

  • MikeD 92 posts 112 karma points
    Aug 24, 2012 @ 20:25
    MikeD
    0

    Ok, I got my error messages... I'm an idiot... lol

    I can now see what the problem is in my script... but figuring out a good way to fix it is going to require someone with much more knowledge of Examine than we possess.  What I need to do is to combine 2 indexes into 1.  The problem with my script is the pdf index does not have the any of the fields I need to sort out my results.  If I could add the pdf index to the site content index, I could then sort and filter my results like the client wants.  I can also exclude PDFs that have been "unpublished" via the content tree.

    Gawds I hope that made sense... I've been looking at this code for too long and I need a beer or 12....

    If there is anyone reading this thread that can help, please please contact me off list so I can try to explain what my client is looking for and how best to accomplish the task.

     

    Thanks everyone...

    -Mike D

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Aug 24, 2012 @ 20:35
    Ismail Mayat
    0

    Mike

    The latest version does multi index search.  With regards to sort field inject the field in using gatheringnode data event event.

    Regards

     

    Ismail

  • MikeD 92 posts 112 karma points
    Aug 24, 2012 @ 20:51
    MikeD
    0

    Thanks fro the quick response Ismail...

    I am already searching 2 indexes, that's part of my issue.  There are no fields in the PDF index to work with other than nodeid... when I search both indexes, I get errors on the results page... my main output is sorted based on the NodeTypeAlias and that field does not exist in the PDF index, so it blows up.  If I can combine everything into 1 index, that index will include all the fields I am currently working with.

    Please note that I am using built in stuff here... no custom programming.  It's all UmbracoExamine and Razor.

    Also be advised... I am NOT a programmer.  You have to use small words when explaining stuff to me... lol  I know enough about programming to grasp concepts, but "inject the field in using gatheringnode data event event" is not something I understand.  If you can explain, or give examples, maybe I can grasp the idea, then I can run with it and figure out how to do it.  I really need several fields in the PDF index to do what my client wants.  Without getting into specifics, it would be hard to explain, and I don't want to burden everyone with all that detail.  I'm trying to give enough info to get my pointed in the right direction without writing a novel... lol  


  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Aug 24, 2012 @ 23:11
    Ismail Mayat
    0

     

    Mike,

    Are you on Skype? I can talk you through it.  What you are trying to achieve is doable having done something similar.

     

    Examine has rich eventing system one of the events is gatheringnodedata you can tap into that event and inject in your own fields.  So in you case when PDF indexing happens we can use the event and shove in a nodetypealias field also we can inject in anything else that is needed.  My Skype is ismail_mayat if you add me I can talk you through Monday.

     

    Ps can you download Luke it's a useful tool for looking at what is in an examine/ Lucene index just google Luke for Lucene it's a java app latest version is on google code site.

     

    Regards

     

    Ismail

     

  • MikeD 92 posts 112 karma points
    Aug 27, 2012 @ 16:08
    MikeD
    0

    Ismail,

    I am on Skype, I went to add you this morning and there are several Ismail Mayats...  Are you the one in Preston, UK?  That's the one I added... hope it's you.  lol

    I have Luke already downloaded, and have used it several times already in this learning process.  Don't know how I would have gotten as far as I have without it.  Anyone else cruising this thread should download it.  Great tool to have.

    Assuming I got the correct Skype account, send me a blip before you try to call.  I really appreciate the fact you are willing to do this, you have no idea.

     

    -Mike D

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Aug 27, 2012 @ 16:13
    Ismail Mayat
    0

    Mike

    Apologies just realised its bank holiday today in uk.  Ill be online tommorow you got the right skype user.  I was surprised how many skype users already with my name lol thankfully i got twitter name bagged! 

    I love messing around with examine and your issue is very similar to what i did on fairbairnpb.co.uk

     

    Regards

     

    Ismail

  • MikeD 92 posts 112 karma points
    Aug 27, 2012 @ 16:25
    MikeD
    0

    Is there a place on the web with maybe a list of all the events and stuff available in Examine?  I have some really smart programmers on staff that I can bug if I just have a reference.  And if I cannot get it figured out today I most welcome your assistance tomorrow.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Aug 27, 2012 @ 16:32
    Ismail Mayat
    0

    There is links and docs on examine.codeplex.com also some umbraco tv vids and code garden vids see stream.umbraco.org 

  • MikeD 92 posts 112 karma points
    Aug 28, 2012 @ 17:40
    MikeD
    0

    Many many thanks to Ismail.  Not many people would go to the lengths he did to help a complete stranger.

    You are a rock star sir!

  • Matt Taylor 873 posts 2086 karma points
    Jan 15, 2013 @ 15:32
    Matt Taylor
    0

    In which version of Umbraco was the PDF indexer added to examine?

    I have a 4.7.1 site I'd like to add PDF search to but don't know if I need to upgrade Umbraco first.

    Thanks, Matt

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Jan 15, 2013 @ 15:57
    Ismail Mayat
    0

    Matt,

    4.7.1 has the pdfindexer out of the box.

  • Matt Taylor 873 posts 2086 karma points
    Jan 15, 2013 @ 16:00
    Matt Taylor
    0

    Thanks Ismail!

  • Matt Taylor 873 posts 2086 karma points
    Feb 15, 2013 @ 14:52
    Matt Taylor
    0

    Is the Examine PDF Indexer supposed to index the actual content of the PDF files or just the filenames?

    I've tried the CogUmbracoExamineMediaIndexer package which indexes the content but the Examine PDF Indexer seems to only be returning matches on the filename and not the file content.

    Cheers, Matt

     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Feb 15, 2013 @ 17:31
    Ismail Mayat
    0

    The examine pdf indexer should do content as well but not any meta data. Is the data present when you look with luke should be field called FileTextContent?

  • Matt Taylor 873 posts 2086 karma points
    Feb 26, 2013 @ 12:40
    Matt Taylor
    0

    Sorry for the delay getting back to you Ismail,
    This is mainly an excercise in increasing my understanding so took a back seat to some work I had to do for a couple of days.

    I've looked in Luke at the index created by the CogUmbracoExamineMediaIndexer package which works great and I can see all the PDF content indexed:

    The examine PDF index however has just indexed a bunch of numbers:

    It's strange and I can assure you that both indexes are looking at the same PDF media files:

    This is how the index is configured:

    <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/PDF/" IndexParentId="-1">
        <IndexAttributeFields>
          <add Name="id" />
          <add Name="nodeName" />
          <add Name="updateDate" />
          <add Name="writerName" />
          <add Name="path" />
          <add Name="nodeTypeAlias" />
          <add Name="parentID" />
        </IndexAttributeFields>
        <IncludeNodeTypes>
          <add Name="File" />
        </IncludeNodeTypes>
      </IndexSet>

    The indexer:

    <add name="PDFIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"/>

    The searcher:

    <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true" />

    Regards,

    Matt

  • d Thomas 13 posts 33 karma points
    Mar 20, 2013 @ 12:32
    d Thomas
    0

    @Ismail 

    Hi Ismail, 

    Could you please assist with examine.pdf configuration for search in the pdf content? 

    I am using umbraco 4.9 and copied the latest version of umbraco examine pdf from codeplex, placed the dlls in the bin, but got stuck to later configuration for searching with pdf content. 

    Thanks,

    David

  • Matt Taylor 873 posts 2086 karma points
    Mar 20, 2013 @ 13:34
    Matt Taylor
    0

    I still haven't managed to get it working as expected either. :-(

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    Mar 20, 2013 @ 14:59
    Ismail Mayat
    0

    David,

    You are having problems searching or indexing?  Can you paste your examineindex and settings config files. Also can you take a look at your pdf index using luke or http://our.umbraco.org/projects/backoffice-extensions/examine-inspector do you have any documents in the index?

    Regards

    Ismail

  • d Thomas 13 posts 33 karma points
    Mar 20, 2013 @ 17:46
    d Thomas
    0

    Hi Ismail, 

    I think I have a problem with the searching or maybe even only rendering the results in umbraco page.

    I also installed luke (and ExamineIndexAdmin and Examine modules for developer side)   and shows my documents indexed. 

     

    ExamineSettings: 

    <?xml version="1.0"?>
    <!-- Umbraco examine is an extensible indexer and search engine. This configuration file can be extended to add your own search/index providers. Index sets can be defined in the ExamineIndex.config if you're using the standard provider model. More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com -->
    <Examine>
      <ExamineIndexProviders>
        <providers>
          <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               interval="10"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="InternalMemberIndexer" type="UmbracoExamine.UmbracoMemberIndexer, UmbracoExamine"
               supportUnpublished="true"
               supportProtected="true"
               interval="10"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net"/>
    
            <!-- default external indexer, which excludes protected and published pages-->
            <add name="ExternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine"
                supportUnpublished="false"
                supportProtected="false"
                interval="10"
                analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
    
    
            <add name="PDFIndexer" 
                 type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF"
                 extensions=".pdf"
                 umbracoFileProperty="umbracoFile" runAsync="true"/>
    
    
        </providers>
      </ExamineIndexProviders>
    
      <ExamineSearchProviders defaultProvider="ExternalSearcher">
        <providers>
               <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
                 analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
          <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/>
    
          <add name="ExternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
                 analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
    
          <add name="InternalMemberSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
               analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
    
         <add name="PDFSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine"
                 analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcards="true"/>
    
        </providers>
      </ExamineSearchProviders>
    
    
    
    
    </Examine>
    


    ExamineIndex: 

    <?xml version="1.0"?>
    <!-- Umbraco examine is an extensible indexer and search engine. This configuration file can be extended to create your own index sets. Index/Search providers can be defined in the UmbracoSettings.config More information and documentation can be found on CodePlex: http://umbracoexamine.codeplex.com -->
    <ExamineLuceneIndexSets>
      <!-- The internal index set used by Umbraco back-office - DO NOT REMOVE -->
      <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Internal/">
        <IndexAttributeFields>
          <add Name="id" />
          <add Name="nodeName" />
          <add Name="updateDate" />
          <add Name="writerName" />
          <add Name="path" />
          <add Name="nodeTypeAlias" />
          <add Name="parentID" />
        </IndexAttributeFields>
        <IndexUserFields />
        <IncludeNodeTypes/>
        <ExcludeNodeTypes />
      </IndexSet>
    
      <!-- The internal index set used by Umbraco back-office for indexing members - DO NOT REMOVE -->
      <IndexSet SetName="InternalMemberIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/InternalMember/">
        <IndexAttributeFields>
          <add Name="id" />
          <add Name="nodeName"/>
          <add Name="updateDate" />
          <add Name="writerName" />
          <add Name="loginName" />
          <add Name="email" />
          <add Name="nodeTypeAlias" />
        </IndexAttributeFields>
        <IndexUserFields/>
        <IncludeNodeTypes/>
        <ExcludeNodeTypes />
      </IndexSet>
    
      <!-- Default Indexset for external searches, this indexes all fields on all types of nodes-->
      <IndexSet SetName="ExternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/External/" />
        <IndexSet SetName="PDFIndexSet" IndexPath="~/App_Data/ExamineIndexes/PDFIndexSet" IndexParentId="-1"/>
    </ExamineLuceneIndexSets>

     

    I wish to rende the rsults with RAZOR  script ( render results). 

    @using Examine

    @using Examine.SearchCriteria

    @using UmbracoExamine

    @using UmbracoExamine.PDF

     

     

     

    @{

     

    var searchString = Request["searchString"]; 

    var searchResults = ExamineManager.Instance.SearchProviderCollection["PDFSearcher"].Search(searchString.ToLower(),false).ToList();

     

    }

     

    Please let me know how you render the search/results in umbraco. 

    I have done it after this post, adapting the part for pdfsearcher. 

    http://joeriks.com/2011/03/15/ajax-enabled-search-in-umbraco-using-examine-and-razor/#comment-1269

     

    Thanks, 

    David

  • d Thomas 13 posts 33 karma points
    Mar 20, 2013 @ 17:50
    d Thomas
    0

    Yes I have pdf files, I added 2 pdf files in media. 

  • Damjan 12 posts 31 karma points
    May 01, 2013 @ 13:38
    Damjan
    0

    Hi All

    I seem to have the same problem as Matt, I looked into Luke only to see random numbers as values for File Text Content. Such as "1 1,1 1221/0 13 2 21 21/21 21/3 3 4 5 6 7" etc.
    I searched for "1" and printed out the actual content and saw it was

     !"# $% &'%'()*" +,-./ 0%"& %/ ,& %/"1221/0()*%"& %/ ,& %/"21/3%$& %/ ,& %/"#-4 -&21/&3()*%$& %/ ,& %/"'21/" +,-./ & %,& %/ ,& %/"'& ()*%,& %/ ,& %/"21/%5& %/ ,& %/"6&21/%5& %/ ,& %/"" +%,-./ 21/&%7&& %/ ,& %/"6%21/21/" +,-./ %2& %/ ,& %/"%21/&21/%74 ,& %/ ,& %/"21/21/%,& %/ ,& %/"21/" +%,-./ 21/&%21/,& %/ ,& %/"'8321/" +,-./ %,& %/ ,& %/"#9$%21/,& %/ ,& %/"'$& ! ! ! !$6::,& %/ ,& %/"$2;:,:"#$"#$"#$"#$%9 ;::::,:: <::::<'-"1<#%#%#%#%'-"1<#%#%#%#%'-1! %'-6%'-$0&"$&"$&"$&"$'()!'()!'()!'()!= &****9::# >?@@%:%:++++/&>+.+A.,+-,+-,+-,+-.).).).)++++;%B& ,& ,& ;%B& ,& ,& -7C,& ,& ,& -73 1,,-7,-%/-,& %/ ,& %/"1$1,1 !"!"D -&/6 !/ = &/ > --";/,-> /))-E??>&/):-<:E/";/,:-<:>+.F +A.??/# :-3:>;3?/% :-3:>+. +A.?>+.13+A.?0>*D+.;3+A.-<EGD+.";3+A.-<E/ D-<E?/&)(E+*(EHH>?

    So, I assume it is ignoring all the symbols and indexing the numbers only. But the whole point of the PDFIndexer is to read the actual values right, not the encoded version:/ So I would like to ask if Matt resolved this issue or if someone else managed to index PDF content and has time to look into what I'm trying to do, it would be great.

    Thanks

    <add name="PdfIndexer" type="UmbracoExamine.PDF.PDFIndexer, UmbracoExamine.PDF" extensions=".pdf" umbracoFileProperty="umbracoFile" interval="10"/>
    <add name="PdfSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" />
    <IndexSet SetName="PdfIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Pdf/" />
    var searcher = ExamineManager.Instance.SearchProviderCollection["PdfSearcher"];
    var searchCriteria = searcher.CreateSearchCriteria(BooleanOperation.Or);
    var query = searchCriteria.GroupedOr(new string[] { "FileTextContent", "nodeName" }, searchTerm).Compile();
    var searchResults = searcher.Search(query);
    var noResults       = searchResults.Count();
    <p>You searched for <em>@searchTerm</em>, and found @noResults results</p>
    <ul class="search-results">
        @foreach (var result in searchResults)
        {
            <li>
                <a href="@umbraco.library.GetMedia(result.Id, false)">@result.Fields["FileTextContent"]</a>
            </li>
        }
    </ul>
  • Matt Taylor 873 posts 2086 karma points
    May 01, 2013 @ 13:43
    Matt Taylor
    0

    No I didn't manage to resolve it.

    It's quite frustrating but I was just researching for future projects so the priority wasn't there and eventually had to move on.

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    May 01, 2013 @ 13:52
    Ismail Mayat
    0

    Damjan,

    Try using http://our.umbraco.org/projects/website-utilities/cogumbracoexaminemediaindexer package see if that indexes your pdf content.

    Regards

     

    Ismail

  • Matt Taylor 873 posts 2086 karma points
    May 01, 2013 @ 13:57
    Matt Taylor
    0

    The CogUmbracoExamineMediaIndexer worked great for me.

    It's just a shame the out of the box stuff doesn't.

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    May 02, 2013 @ 14:48
    Shannon Deminick
    0

    Hi guys, I'm gonna be totally honest and say that I haven't read nearly any of all 8 pages of these issues but I will say that the Examine PDF indexer 'DOES' index PDF content. It definitely does not just only index PDF file names, otherswise that would be useless. The other thing to mention is that some PDFs are protected or created with some weird protection encoding which is why you might experience the strange chars. Examine's PDF indexer uses itextsharp to read PDF. The later the examine version, the later the itextsharp version so that might help. TBH I don't know anything about the Cogworks PDF indexer so not sure what it does beyond the normal examine Examine PDF indexer. There is documentation on the Examine site that does reference that it is not possible to index 'ALL' PDF data and that is because the PDF 'standard' is not standard and is pretty f%#$d in general. We realy on iTextSharp. If it can't do it that neither can we. However please let me know if there are issues with the PDF indexer otherwise, we have unit tests that pass but if it is not working for 'any' of your PDFs than maybe its a setting I've missed.

  • Damjan 12 posts 31 karma points
    May 02, 2013 @ 14:59
    Damjan
    0

    Hello,

    Thank you Matt and Ismail, the CogUmbracoExamineMediaIndexer indexes the content properly for me too. I will now try to implement the actual search box etc... 


    I'd just complain a bit about the procedure for the installation of the package, I got the same error as here: 

    http://our.umbraco.org/projects/website-utilities/cogumbracoexaminemediaindexer/bugs-support/37947-Could-not-load-file-or-assembly-IKVMOpenJDKBeans ;

    and managed to fix it after adding and removing the libraries for a while, but I'd reccommend that you put a bit more in the Readme, since it's not enough to just install the package and add the tika-app-1.2.dll, but you also need a few more IKVM libraries in the bin folder.

    Anyway, thank you again

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    May 02, 2013 @ 15:00
    Ismail Mayat
    0

    Shannon,

    The Cogworks media indexer is just a wrapper around apache tikka so it will index everything tikka can handle that includes pdf. It will also rip out meta data and shove that in the index. Not sure why some of these pdfs are failing with pdf indexer ideally the people having problems need to send the failing pdfs to you see if you can re create?

    Regards

    Ismail

  • Shannon Deminick 1526 posts 5272 karma points MVP 3x
    May 02, 2013 @ 15:03
    Shannon Deminick
    0

    for sure, ideally just log an issue on the tracker at examine.codeplex.com with the faulting PDF(s) and I'll see if I can replicate. 

  • Damjan 12 posts 31 karma points
    May 02, 2013 @ 15:07
    Damjan
    0

    Hi,

    This is the PDF I tried and got the funny symbols and numbers with the built-in PDFIndexer, but got indexed OK with the Cogworks package. It's just the Razor cheat sheet I got from here:


    http://our.umbraco.org/projects/developer-tools/razor-dynamicnode-cheat-sheet

    So, maybe check if it is protected than Shannon's reason for the PdfIndexer not working makes sence to me too, but if it's not maybe there's a bug in the built-in PdfIndexer..?

    Thanks,
    Damjan
     

  • Ismail Mayat 4511 posts 10092 karma points MVP 2x admin c-trib
    May 02, 2013 @ 15:17
    Ismail Mayat
    0

    Damjan,

    Its probably charset range in that pdf the pdfindexer does have some code that tests range of characters its possible stuff is getting through however apache tikka is picking up.

    Regards

    Ismail

  • Matt Taylor 873 posts 2086 karma points
    May 02, 2013 @ 15:26
    Matt Taylor
    0

    Damjan, yes I also had problems missing IKVMOpenJDKBeans assemblies. There's another post somewhere where I list everything you need.

    Funily enough I also used the PDF cheatsheet to test indexing but having considered it could be in a strange format I decided to create my own PDFs using OpenOffice to convert a doc I created. I figured that must be pretty standard but alas, no joy.

Please Sign in or register to post replies

Write your reply to:

Draft