Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Richard Barg 358 posts 532 karma points
    Oct 10, 2012 @ 09:34
    Richard Barg
    0

    Whitelist for Links Flagged w/Warnings But Otherwise Valid

    Diplo Link checker flags certain links as problems which turn out to be OK. Links to Wikipedia content always return a forbidden error. A white list to avoid these on future runs would be great. Here are some examples:

     

    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    Liang Ge, Ph.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 4 200 OK
    The following links in 'Liang Ge, Ph.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    The underlying connection was closed: The connection was closed unexpectedly. http://www.sciencedirect.com/science/art... hyperlink No
    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    David M. Jablons, M.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 32 200 OK
    The following links in 'David M. Jablons, M.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    403 Forbidden http://americansurgical.info/about.cgi American Surgical Society hyperlink No Help
    Error
    400 Bad Request http://www.rushu.rush.edu/servlet/Satell... James L. Mulshine, M.D. hyperlink No Help
    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    Carlo C. Maley, Ph.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 10 200 OK
    The following links in 'Carlo C. Maley, Ph.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    404 Not Found http://www.nature.com/news/2007/070917/f... http://www.nature.com/news/2007/070917/full/news070917-11.html
  • Dan Diplo 1554 posts 6205 karma points MVP 6x c-trib
    Oct 10, 2012 @ 20:44
    Dan Diplo
    0

    Wikipedia is weird in that they forbid HEAD requests to their server. A HEAD request just asks for the headers rather than downloading the entire page contents, so is much faster to process. If you check the config file there's a comment to this effect:

    <!-- Can be HEAD, GET or POST (HEAD is quickest but some servers don't respect 
    it (eg. Wikipedia). Use GET for those.) --> 
    <checkLinksRequestMethod>HEAD</checkLinksRequestMethod> 

    You can change it to GET and it should work for a wider range of servers, but will be slower.

     

  • Richard Barg 358 posts 532 karma points
    Oct 11, 2012 @ 00:14
    Richard Barg
    0

    We're running 50 pages now w/excellent speed. So I would hate to sacrifice that.  I take it the whiltelist idea is not feasible for given pages.

  • Richard Barg 358 posts 532 karma points
    Oct 18, 2012 @ 18:28
    Richard Barg
    0

    Dan,  

     

    Just wanted to check in again on the idea of a whitelist.  More specfically, it would allow you so create  a list of links that were found to be invalid but are actually OK. The program woudl skip them on subsequent checks.  For sites like Wikipedia, there could be a setting for suppressing all error messages from the wikipedia domain. This would speed up link checking. We're now checking 75 pages at a time w/excellent speed and results.

    Richard

     

  • Dan Diplo 1554 posts 6205 karma points MVP 6x c-trib
    Oct 18, 2012 @ 21:09
    Dan Diplo
    0

    Hi Richard,

    I do take account of all feature requests and definitely will consider them. But as I explained in another thread my wife has just had a baby and so I have very little free time at the moment (and your request is not trivial). But once things get back to normal (if they ever do!) then I'll see what I can do.

    Dan

  • Richard Barg 358 posts 532 karma points
    Oct 18, 2012 @ 21:24
    Richard Barg
    0

    Congrats Dan on the new arrival.  Completely understand.  

  • This forum is in read-only mode while we transition to the new forum.

    You can continue this topic on the new forum by tapping the "Continue discussion" link below.

Please Sign in or register to post replies