I have a requirement to find all the telephone numbers on a medium sized site in all the properties of nodes containing text.
I though I could use Examine for this, but I am having trouble figuring out how to search a bodyText field with a RegEx. I think I am mixing up Examine and Lucene capabilities.
public class PhoneNumberIndexer
{
public string regex = @"^(?:(?:\(?(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?(?:\(?0\)?[\s-]?)?)|(?:\(?0))(?:(?:\d{5}\)?[\s-]?\d{4,5})|(?:\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3}))|(?:\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4})|(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}))(?:[\s-]?(?:x|ext\.?|\#)\d{3,4})?$";
public IOrderedEnumerable<Examine.SearchResult> GetPagesWithPhoneNumbers()
{
var searchField = "bodyText";
var searchTerm = regex;
var queryTerm = "tel";
Term t = new Term(searchField, searchTerm);
RegexQuery regQuery = new RegexQuery(t);
var searcher = ExamineManager.Instance.SearchProviderCollection["External"];
if (searcher == null)
return null;
var searchCriteria = searcher.CreateSearchCriteria(IndexTypes.Content);
IBooleanOperation theQuery = searchCriteria.Field("bodyText", regQuery);
I get the error 'The best overloaded method match for 'Examine.SearchCriteria.IQuery.Field(string, Examine.SearchCriteria.IExamineValue)' has some invalid arguments'
Can anyone point me in the right direction please?
I'm not sure if you can search with regex (correct me if I'm wrong) however you could search on "+44" with a wildcard, then use regex after the search on the results to filter them down into relevant results?
Since there's a + symbol you'll need to escape it. I'd just do p
Thanks Alex.
I think you can search Lucene with a regex. There is a class Lucene.Net.Search.Regex in Lucene 1.9 (an add-on via Contrib.Regex). It has a RegexQuery in it with a MultiTermQuery, but I’m not sure how to use it:
Summary:
'An abstract Lucene.Net.Search.Query that matches documents containing a subset of terms provided by a Lucene.Net.Search.FilteredTermEnum enumeration. This query cannot be used directly; you must subclass it and define Lucene.Net.Search.MultiTermQuery.GetEnum(Lucene.Net.Index.IndexReader) to provide a Lucene.Net.Search.FilteredTermEnum that iterates through the terms to be matched. '
Examine searching with a regular expression
I have a requirement to find all the telephone numbers on a medium sized site in all the properties of nodes containing text.
I though I could use Examine for this, but I am having trouble figuring out how to search a bodyText field with a RegEx. I think I am mixing up Examine and Lucene capabilities.
I get the error 'The best overloaded method match for 'Examine.SearchCriteria.IQuery.Field(string, Examine.SearchCriteria.IExamineValue)' has some invalid arguments'
Can anyone point me in the right direction please?
I'm not sure if you can search with regex (correct me if I'm wrong) however you could search on "+44" with a wildcard, then use regex after the search on the results to filter them down into relevant results?
Since there's a + symbol you'll need to escape it. I'd just do p
Thanks Alex. I think you can search Lucene with a regex. There is a class Lucene.Net.Search.Regex in Lucene 1.9 (an add-on via Contrib.Regex). It has a RegexQuery in it with a MultiTermQuery, but I’m not sure how to use it:
Summary: 'An abstract Lucene.Net.Search.Query that matches documents containing a subset of terms provided by a Lucene.Net.Search.FilteredTermEnum enumeration. This query cannot be used directly; you must subclass it and define Lucene.Net.Search.MultiTermQuery.GetEnum(Lucene.Net.Index.IndexReader) to provide a Lucene.Net.Search.FilteredTermEnum that iterates through the terms to be matched. '
Desperate bump.
is working on a reply...