Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Stephen Brewer 4 posts 24 karma points
    Dec 04, 2012 @ 11:12
    Stephen Brewer
    0

    Lucene and creating faceted blog search terms problem

    We are creating a Blog module and decided upon using Lucene to do the main search and creating a faceted search. So far everything has gone great and worked very quickly but unfortunately ive come across a problem when creating the faceted list of search terms.

    Basically everytime you republish an item in the CMS the term count/frequency just goes up by 1 even though nothing has changed. This means our frequency count is inaccurate.

    Example :

    We have 3 categories with count as follows :

    • Category 1 (2)
    • Category 2 (1)
    • Category 3 (3)

    If i go to the CMS and republish all the nodes we get the following :

    • Category 1 (4)
    • Category 2 (2)
    • Category 3 (6)

    ie everything has doubled!

    Ive looked at the index using Luke and it reports the same term frequency term count as above (so the index is correct). Looking at the index details though shows there are deleted documents (presuming a publish deletes a document and creates a new one?). Whilst this is fine for a normal search it seems to be messing up the term frequency (well thats my guess anyway).

    Weve never done a Lucene term search so found the following example and tweaked slightly for our needs so not sure if this is correct or not :

    public Dictionary<String,int> LuceneGetIndexTerms(string searchTerm)
            {
                Directory directory = FSDirectory.Open(new System.IO.DirectoryInfo(HttpContext.Current.Server.MapPath("/App_Data/TEMP/ExamineIndexes/Blog/Index")));

                Dictionary<String, int> termlist = new Dictionary<String, int>();
                IndexReader reader = IndexReader.Open(directory , true);
                TermEnum terms = reader.Terms();
                while (terms.Next())
                {
                    Term term = terms.Term();
                    if (term.Field() == searchTerm)
                    {
                        String termText = term.Text();
                        int frequency = reader.DocFreq(term);
                        termlist.Add(termText, frequency);
                    }
                }
                reader.Close();

                return termlist;
            }

    Ive also created a new fresh project and umbraco install this morning and setup the above scenario to eliminate any other causes and it does the same.

    Any help greatly appreciated.

    Steve

     

     

Please Sign in or register to post replies

Write your reply to:

Draft