We are running Umbraco 8.6.2 as an Azure App Service. Our site search uses a third-party search tool called Swiftype. Swiftype access our pages via a crawler and indexes data about the pages, which we then search.
We have several pages that are gated with group based protection. Currently, these pages are not being indexed in the search database because the crawler cannot reach them. According to the vendor, the custom user agent string that their crawler uses can be allowed to access private pages.
Does anyone have experience allowing a crawler to access gated pages? Is that even possible in Umbraco?
I am pursuing this with the vendor, but they don't have any experience with Umbraco. I am appreciative of any guidance.
I have developed a solution via dependency injection.
I created a new implementation of IPublicAccessService that derived from the standard implementation, PublicAccessService. My new implementation just called the base method on almost all of the methods, but intercepted the following method:
public new PublicAccessEntry GetEntryForContent(string contentPath)
The new implementation checked if the user agent string matched the unique string from the search vendor and returned null if so, indicating no protection. It calls the base method otherwise.
My implementation also intercepts the methods that call that method to ensure they call the derived method.
Finally, I added a Composer that called composition.RegisterUnique to associate the new implementation with the interface as a singleton.
Crawler Needs to Access Private Pages
Hello all,
We are running Umbraco 8.6.2 as an Azure App Service. Our site search uses a third-party search tool called Swiftype. Swiftype access our pages via a crawler and indexes data about the pages, which we then search.
We have several pages that are gated with group based protection. Currently, these pages are not being indexed in the search database because the crawler cannot reach them. According to the vendor, the custom user agent string that their crawler uses can be allowed to access private pages.
Does anyone have experience allowing a crawler to access gated pages? Is that even possible in Umbraco?
I am pursuing this with the vendor, but they don't have any experience with Umbraco. I am appreciative of any guidance.
Sincerely,
Angus Atkins-Trimnell
I have developed a solution via dependency injection.
I created a new implementation of IPublicAccessService that derived from the standard implementation, PublicAccessService. My new implementation just called the base method on almost all of the methods, but intercepted the following method:
public new PublicAccessEntry GetEntryForContent(string contentPath)
The new implementation checked if the user agent string matched the unique string from the search vendor and returned null if so, indicating no protection. It calls the base method otherwise.
My implementation also intercepts the methods that call that method to ensure they call the derived method.
Finally, I added a Composer that called composition.RegisterUnique to associate the new implementation with the interface as a singleton.
Cheers.
Angus
is working on a reply...