I am working on a website for a Customer which have implemented Google Mini Search. They want Google Mini Search to index some protected pages protected with the out-of-the box public access protection in umbraco.
Google Search Appliance supports Forms Authentication, but Google Mini Search doesn't. That means that we will have to somehow authenticate the Google Mini Search crawler so it can see the protected pages.
So far I have tried this in the codebehind in a masterpage which is used for all templates:
protected void Page_Init(object sender, EventArgs e) { // Google Mini auto login if (Request.UserAgent.Contains("gsa-crawler") { if (Request.UserAgent.Contains("[email protected]")) { // Auto Login var m = Member.GetMemberFromEmail("[email protected]"); Member.AddMemberToCache(m, true, TimeSpan.FromHours(1)); } } }
We are checking that the UserAgent contains "gsa-crawler" and a specific e-mail address entered in the Google Mini Search backend. If both of these matches, we attempt to auto-login a member we have created for Google Mini Search which have access to the protected pages.
I know that Google Mini will ignore cookies, so we have tried to use this to login Google Mini:
With the above code being execute for every request that Google Mini Search makes, should Google Mini Search then not be allowed to see the protected pages? - even though Google Mini Search doesn't store any cookies?
Result is that only the login page is indexed when we try to index a protected page.
Any other suggestions on how to give Google Mini Search access to the protected pages are most welcome
Feels a bit strange to open up your closed content for google, but to answer your question Recently I found out by building the Membershwitcher package that .net supports hacks like this. Once you set the Authentication cookie with the username it's all ok. So this little snippet should do the work
We are not opening up for Google, just the inhouse Google Mini Search solution. The search results for the protected pages will only be available for users who are logged in.
So what you are saying, that if I modify the code like this, it should work?
// Auto Login var m =Member.GetMemberFromEmail("[email protected]"); Member.AddMemberToCache(m,true,TimeSpan.FromHours(1)); FormsAuthentication.SetAuthCookie(m.LoginName, false);
It didn't seem to work, by using SetAuthCookie, as it still requires the client (in this case Google Mini Search) to be able to store cookies.
The following however did work:
protectedvoidPage_Init(object sender,EventArgs e) { // Google Mini auto login if(Request.UserAgent.Contains("gsa-crawler") { if(Request.UserAgent.Contains("[email protected]")) { // Auto Login var m =Member.GetMemberFromEmail("[email protected]"); Member.AddMemberToCache(m,true,TimeSpan.FromHours(1)); Response.Redirect(Request.RawUrl, true); } } }
In Web.config we added cookieless="AutoDetect" for system.web/authentication/forms.
It allows Google Mini Search to login without using cookies, without affecting the regular user browsing the site using a browser.
Curious if you got this working correctly - if you ran into any other issues ... I need to do something similar, not with Google Search Mini, but to allow access to protected pages for a spider/web-crawler - again based on the UserAgent string.
From looking at your code snippet, wouldn't you get caught in a redirect loop?
Curious if anyone else has done this successfully? I'm starting to hit a brick wall at the moment.
Thanks Dennis. I couldn't get it to work from the MasterPage's Page_Init event ... I had to override the /default.aspx code-behind (inheriting from umbraco.UmbracoDefault) and do it in the Page_PreInit event. Still can't get the spider to login as a member ... I'll keep looking into it.
Auto-login as a member for Google Mini Search
I am working on a website for a Customer which have implemented Google Mini Search. They want Google Mini Search to index some protected pages protected with the out-of-the box public access protection in umbraco.
Google Search Appliance supports Forms Authentication, but Google Mini Search doesn't. That means that we will have to somehow authenticate the Google Mini Search crawler so it can see the protected pages.
So far I have tried this in the codebehind in a masterpage which is used for all templates:
The UserAgent should looks something like this:
gsa-crawler (Enterprise; GID01065; [email protected])
We are checking that the UserAgent contains "gsa-crawler" and a specific e-mail address entered in the Google Mini Search backend. If both of these matches, we attempt to auto-login a member we have created for Google Mini Search which have access to the protected pages.
I know that Google Mini will ignore cookies, so we have tried to use this to login Google Mini:
setting true to use session.
With the above code being execute for every request that Google Mini Search makes, should Google Mini Search then not be allowed to see the protected pages? - even though Google Mini Search doesn't store any cookies?
Result is that only the login page is indexed when we try to index a protected page.
Any other suggestions on how to give Google Mini Search access to the protected pages are most welcome
Kind regards
Dennis Milandt
Hi Dennis,
Feels a bit strange to open up your closed content for google, but to answer your question Recently I found out by building the Membershwitcher package that .net supports hacks like this. Once you set the Authentication cookie with the username it's all ok. So this little snippet should do the work
FormsAuthentication
.SetAuthCookie(your username her, false);
Hope this helps you,
Richar
Thank you very much for answering!
We are not opening up for Google, just the inhouse Google Mini Search solution. The search results for the protected pages will only be available for users who are logged in.
So what you are saying, that if I modify the code like this, it should work?
Kind regards
Dennis Milandt
Hi Dennis,
Yes it should work. You can even remove the Member.AddmemberToCache line.
Cheers,
Richard
It didn't seem to work, by using SetAuthCookie, as it still requires the client (in this case Google Mini Search) to be able to store cookies.
The following however did work:
In Web.config we added cookieless="AutoDetect" for system.web/authentication/forms.
It allows Google Mini Search to login without using cookies, without affecting the regular user browsing the site using a browser.
Thank you for your feedback.
Kind regards
Dennis Milandt
Hi Dennis,
Curious if you got this working correctly - if you ran into any other issues ... I need to do something similar, not with Google Search Mini, but to allow access to protected pages for a spider/web-crawler - again based on the UserAgent string.
From looking at your code snippet, wouldn't you get caught in a redirect loop?
Curious if anyone else has done this successfully? I'm starting to hit a brick wall at the moment.
Thanks, Lee.
Dennis, quick question... which mode are you using to store the session state?
I'm currently using InProc, but that relies on cookies, so thinking I need to use SQL Server Mode?
Thanks, Lee.
I believe that cookieless="AutoDetect" did the trick for us. Our session state is stored on the webserver InProc as well.
/Dennis
Thanks Dennis. I couldn't get it to work from the MasterPage's Page_Init event ... I had to override the /default.aspx code-behind (inheriting from umbraco.UmbracoDefault) and do it in the Page_PreInit event. Still can't get the spider to login as a member ... I'll keep looking into it.
Cheers, Lee.
is working on a reply...