Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • kleptbit 42 posts 98 karma points
    Nov 23, 2017 @ 18:03
    kleptbit
    0

    Examine searching with a regular expression

    I have a requirement to find all the telephone numbers on a medium sized site in all the properties of nodes containing text.

    I though I could use Examine for this, but I am having trouble figuring out how to search a bodyText field with a RegEx. I think I am mixing up Examine and Lucene capabilities.

    public class PhoneNumberIndexer
    {
        public string regex = @"^(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?$";
    
        public IOrderedEnumerable<Examine.SearchResult> GetPagesWithPhoneNumbers()
        {
            var searchField = "bodyText";
            var searchTerm = regex;
            var queryTerm = "tel";
            Term t = new Term(searchField, searchTerm);
            RegexQuery regQuery = new RegexQuery(t);
    
     var searcher = ExamineManager.Instance.SearchProviderCollection["External"];
    
            if (searcher == null)
                return null;
    
            var searchCriteria = searcher.CreateSearchCriteria(IndexTypes.Content);
            IBooleanOperation theQuery = searchCriteria.Field("bodyText", regQuery);
    

    I get the error 'The best overloaded method match for 'Examine.SearchCriteria.IQuery.Field(string, Examine.SearchCriteria.IExamineValue)' has some invalid arguments'

    Can anyone point me in the right direction please?

  • Alex Brown 129 posts 620 karma points
    Nov 23, 2017 @ 21:38
    Alex Brown
    0

    I'm not sure if you can search with regex (correct me if I'm wrong) however you could search on "+44" with a wildcard, then use regex after the search on the results to filter them down into relevant results?

    Since there's a + symbol you'll need to escape it. I'd just do p

    searchCriteria.RawQuery("bodyText:\+44*");
    
  • kleptbit 42 posts 98 karma points
    Nov 24, 2017 @ 12:38
    kleptbit
    0

    Thanks Alex. I think you can search Lucene with a regex. There is a class Lucene.Net.Search.Regex in Lucene 1.9 (an add-on via Contrib.Regex). It has a RegexQuery in it with a MultiTermQuery, but I’m not sure how to use it:

    Summary: 'An abstract Lucene.Net.Search.Query that matches documents containing a subset of terms provided by a Lucene.Net.Search.FilteredTermEnum enumeration. This query cannot be used directly; you must subclass it and define Lucene.Net.Search.MultiTermQuery.GetEnum(Lucene.Net.Index.IndexReader) to provide a Lucene.Net.Search.FilteredTermEnum that iterates through the terms to be matched. '

  • kleptbit 42 posts 98 karma points
    Nov 28, 2017 @ 14:29
    kleptbit
    0

    Desperate bump.

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies