Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • kleptbit 42 posts 98 karma points
    Nov 23, 2017 @ 18:03
    kleptbit
    0

    Examine searching with a regular expression

    I have a requirement to find all the telephone numbers on a medium sized site in all the properties of nodes containing text.

    I though I could use Examine for this, but I am having trouble figuring out how to search a bodyText field with a RegEx. I think I am mixing up Examine and Lucene capabilities.

    public class PhoneNumberIndexer
    {
        public string regex = @"^(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?$";
    
        public IOrderedEnumerable<Examine.SearchResult> GetPagesWithPhoneNumbers()
        {
            var searchField = "bodyText";
            var searchTerm = regex;
            var queryTerm = "tel";
            Term t = new Term(searchField, searchTerm);
            RegexQuery regQuery = new RegexQuery(t);
    
     var searcher = ExamineManager.Instance.SearchProviderCollection["External"];
    
            if (searcher == null)
                return null;
    
            var searchCriteria = searcher.CreateSearchCriteria(IndexTypes.Content);
            IBooleanOperation theQuery = searchCriteria.Field("bodyText", regQuery);
    

    I get the error 'The best overloaded method match for 'Examine.SearchCriteria.IQuery.Field(string, Examine.SearchCriteria.IExamineValue)' has some invalid arguments'

    Can anyone point me in the right direction please?

  • Alex Brown 129 posts 620 karma points
    Nov 23, 2017 @ 21:38
    Alex Brown
    0

    I'm not sure if you can search with regex (correct me if I'm wrong) however you could search on "+44" with a wildcard, then use regex after the search on the results to filter them down into relevant results?

    Since there's a + symbol you'll need to escape it. I'd just do p

    searchCriteria.RawQuery("bodyText:\+44*");
    
  • kleptbit 42 posts 98 karma points
    Nov 24, 2017 @ 12:38
    kleptbit
    0

    Thanks Alex. I think you can search Lucene with a regex. There is a class Lucene.Net.Search.Regex in Lucene 1.9 (an add-on via Contrib.Regex). It has a RegexQuery in it with a MultiTermQuery, but I’m not sure how to use it:

    Summary: 'An abstract Lucene.Net.Search.Query that matches documents containing a subset of terms provided by a Lucene.Net.Search.FilteredTermEnum enumeration. This query cannot be used directly; you must subclass it and define Lucene.Net.Search.MultiTermQuery.GetEnum(Lucene.Net.Index.IndexReader) to provide a Lucene.Net.Search.FilteredTermEnum that iterates through the terms to be matched. '

  • kleptbit 42 posts 98 karma points
    Nov 28, 2017 @ 14:29
    kleptbit
    0

    Desperate bump.

Please Sign in or register to post replies

Write your reply to:

Draft