Search code examples
asp.net-mvcasp.net-mvc-5botsrobots.txt

Filter on Controller to check User Agent and then redirect based on if result is true


--------- Note (Edit) - I might be doing this completely wrong, any guidance would be appreciated if this is in fact wrong (New to mvc)

In the solution a robots.txt file exists to block all crawlers from the site. The only problem with this is, is Facebooks crawler/scraper is not following the rules and are still crawling/scraping the site and causing an error to log and email every couple of minutes. The error being sent for this is "A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'."

The solution for this is to create a filter on the Controllers to check the agent name. If the agent name is for facebook then redirect them to a "No Robots authentication page". The filter has to be on the controller due to the site catering for 3 different routes where each has a custom link and customers has access to the direct links which gets shared on facebook (thus creating a route for this in the route config will not work).

The problem I'm facing is that the solution is not redirecting immediate on the controller filter. It's acceding Action methods (These action methods are Partial Pages) and then fails due to not being able to redirect (the view already started rendering then - which is correct). Is there a way to redirect immediately on the first time when this filter is accessed? Or is there maybe a better solution to this?

To test and troubleshoot I am changing the user agent in code to match what is logged. The error when redirecting from the filter: "Child actions are not allowed to perform redirect actions."

The Error that is currently logged due to Facebook's crawler: " A public action method 'Customer' was not found on controller 'SolutionName.Web.Controllers.QuoteController'. "

User Agent from Stack Trace: enter image description here

This is what I've done:

Custom Filter:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Reflection;
    using System.Web;
    using System.Web.Mvc;

    namespace SolutionName.Web.Classes
    {
        public class UserAgentActionFilterAttribute : ActionFilterAttribute
        {
            public override void OnActionExecuting(ActionExecutingContext filterContext)
            {
                try
                {
                    List<string> Crawlers = new List<string>()
                    {
                        "facebookexternalhit/1.1","facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)","facebookexternalhit/1.1","Facebot"
                     };

                     string userAgent = HttpContext.Current.Request.UserAgent.ToLower();
                     bool iscrawler = Crawlers.Exists(x => userAgent.Contains(x));
                     if (userAgent != null && iscrawler)
                     {
                        filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
                        return;
                     }
            
                    base.OnActionExecuting(filterContext);

                 }
                 catch (Exception errException)
                 {
                    LogHelper.LogException(Severity.Error, errException);
                    SessionHelper.PolicyBase = null;
                    SessionHelper.ClearQuoteSession();
                    filterContext.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
                    return;
                }
            }
        }
    }

NoRobotsAuthentication.cshtml:

@{
        ViewBag.PageTitle = "Robots not authorized";
        Layout = "~/Views/Shared/_LayoutClean2.cshtml";
 }

 <div class="container body-content">
     <div class="row">
    <div class="col-lg-12 col-md-12 col-sm-12 col-xs-12 container-solid">
        <div class="form-horizontal">
            <h3>@ViewBag.NotAuthorized</h3>
        </div>
    </div>
</div>

Action Method for the No Robots:

    #region Bot Detection
    public ActionResult NoRobotsAuthentication()
    {
        ViewBag.NotAuthorized = "Robots / Scrapers not authorized!";
        return View();
    }

    #endregion

One of the Controllers that I am trying to check against:

    namespace SolutionName.Web.Controllers
    {
        [UserAgentActionFilter]
        public class QuoteController : Controller
        {

            public ActionResult Customer()
            { //Some logic }
        }
    }

Partial Page ActionResult where the error occurs when the filter is run:

    public ActionResult _Sidebar()
    {
        var model = SessionHelper.PolicyBase;
        return PartialView("_Sidebar", model);
    }

Solution

  • This is because you're using an ActionFilterAttribute. If you check the documentation here: https://learn.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-3.1 it explains the filter lifecycle and basically - by the time you arrive to action filters, it's too late. You need an authorization filter or a resource filter so you can short-circuit the request.

    Each filter type is executed at a different stage in the filter pipeline:

    Authorization Filters

    • Authorization filters run first and are used to determine whether the user is authorized for the request.
    • Authorization filters short-circuit the pipeline if the request is not authorized.

    Resource filters

    • Run after authorization.
    • OnResourceExecuting runs code before the rest of the filter pipeline. For example, OnResourceExecuting runs code before model binding.
    • OnResourceExecuted runs code after the rest of the pipeline has completed.

    The example below is taken from the documentation, it's an implementation of a Resource Filter. Presumably, a similar implementation is possible with an Authorization Filter but I believe returning a valid Http Status Code after failing an Authorization Filter may be a bit of an anti-pattern.

    // See that it's implementing IResourceFilter
    public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
    {
        public void OnResourceExecuting(ResourceExecutingContext context)
        {
            context.Result = new ContentResult()
            {
                Content = "Resource unavailable - header not set."
            };
        }
    
        public void OnResourceExecuted(ResourceExecutedContext context)
        {
        }
    }
    

    I've attempted to merge it with what you've provided - beware that this may not work out of the box.

    public class ShortCircuitingResourceFilterAttribute : Attribute, IResourceFilter
    {
        public void OnResourceExecuting(ResourceExecutingContext context)
        {
            try
            {
                // You had duplicates in your list, try to use Hashset for .Contains methods
                var crawlerSet = new Hashset<string>()
                {
                   "facebookexternalhit/1.1",
                   "facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)",
                   "Facebot"
                };
                        
                string userAgent = HttpContext.Current.Request.UserAgent;
                // You're unnecessarily and incorrectly checking if the userAgent is null multiple times
                // if it's null it'll fail when you're .ToLower()'ing it. 
                if (!string.IsNullOrEmpty(userAgent) && crawlerSet.Contains(userAgent.ToLower()))
                {
                    // Some crawler
                    context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
                }
             }
             catch (Exception errException)
             {
                LogHelper.LogException(Severity.Error, errException);
                SessionHelper.PolicyBase = null;
                SessionHelper.ClearQuoteSession();
                context.Result = new RedirectResult("~/Home/NoRobotsAuthentication");
             }
        }
    
        public void OnResourceExecuted(ResourceExecutedContext context)
        {
        }
    }