Copied to clipboard

Flag this post as spam?

This post will be reported to the moderators as potential spam to be looked at


  • Richard Barg 358 posts 532 karma points
    Oct 10, 2012 @ 09:34
    Richard Barg
    0

    Whitelist for Links Flagged w/Warnings But Otherwise Valid

    Diplo Link checker flags certain links as problems which turn out to be OK. Links to Wikipedia content always return a forbidden error. A white list to avoid these on future runs would be great. Here are some examples:

     

    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    Liang Ge, Ph.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 4 200 OK
    The following links in 'Liang Ge, Ph.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    The underlying connection was closed: The connection was closed unexpectedly. http://www.sciencedirect.com/science/art... hyperlink No
    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    David M. Jablons, M.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 32 200 OK
    The following links in 'David M. Jablons, M.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    403 Forbidden http://americansurgical.info/about.cgi American Surgical Society hyperlink No Help
    Error
    400 Bad Request http://www.rushu.rush.edu/servlet/Satell... James L. Mulshine, M.D. hyperlink No Help
    Checked PageEdit LinkCreatedLast EditedParent SectionLinksStatus
    Carlo C. Maley, Ph.D. Edit 15/06/2012 by rbarg 30/08/2012 by RobM Faculty 10 200 OK
    The following links in 'Carlo C. Maley, Ph.D.' have been checked:
    StatusMessageChecked LinkLink TextTypeInternal?
    Error
    404 Not Found http://www.nature.com/news/2007/070917/f... http://www.nature.com/news/2007/070917/full/news070917-11.html
  • Dan Diplo 1554 posts 6205 karma points MVP 6x c-trib
    Oct 10, 2012 @ 20:44
    Dan Diplo
    0

    Wikipedia is weird in that they forbid HEAD requests to their server. A HEAD request just asks for the headers rather than downloading the entire page contents, so is much faster to process. If you check the config file there's a comment to this effect:

    <!-- Can be HEAD, GET or POST (HEAD is quickest but some servers don't respect 
    it (eg. Wikipedia). Use GET for those.) --> 
    <checkLinksRequestMethod>HEAD</checkLinksRequestMethod> 

    You can change it to GET and it should work for a wider range of servers, but will be slower.

     

  • Richard Barg 358 posts 532 karma points
    Oct 11, 2012 @ 00:14
    Richard Barg
    0

    We're running 50 pages now w/excellent speed. So I would hate to sacrifice that.  I take it the whiltelist idea is not feasible for given pages.

  • Richard Barg 358 posts 532 karma points
    Oct 18, 2012 @ 18:28
    Richard Barg
    0

    Dan,  

     

    Just wanted to check in again on the idea of a whitelist.  More specfically, it would allow you so create  a list of links that were found to be invalid but are actually OK. The program woudl skip them on subsequent checks.  For sites like Wikipedia, there could be a setting for suppressing all error messages from the wikipedia domain. This would speed up link checking. We're now checking 75 pages at a time w/excellent speed and results.

    Richard

     

  • Dan Diplo 1554 posts 6205 karma points MVP 6x c-trib
    Oct 18, 2012 @ 21:09
    Dan Diplo
    0

    Hi Richard,

    I do take account of all feature requests and definitely will consider them. But as I explained in another thread my wife has just had a baby and so I have very little free time at the moment (and your request is not trivial). But once things get back to normal (if they ever do!) then I'll see what I can do.

    Dan

  • Richard Barg 358 posts 532 karma points
    Oct 18, 2012 @ 21:24
    Richard Barg
    0

    Congrats Dan on the new arrival.  Completely understand.  

Please Sign in or register to post replies

Write your reply to:

Draft