Whitelist for Links Flagged w/Warnings But Otherwise Valid
Diplo Link checker flags certain links as problems which turn out to be OK. Links to Wikipedia content always return a forbidden error. A white list to avoid these on future runs would be great. Here are some examples:
Wikipedia is weird in that they forbid HEAD requests to their server. A HEAD request just asks for the headers rather than downloading the entire page contents, so is much faster to process. If you check the config file there's a comment to this effect:
<!-- Can be HEAD, GET or POST (HEAD is quickest but some servers don't respect
it (eg. Wikipedia). Use GET for those.) -->
<checkLinksRequestMethod>HEAD</checkLinksRequestMethod>
You can change it to GET and it should work for a wider range of servers, but will be slower.
Just wanted to check in again on the idea of a whitelist. More specfically, it would allow you so create a list of links that were found to be invalid but are actually OK. The program woudl skip them on subsequent checks. For sites like Wikipedia, there could be a setting for suppressing all error messages from the wikipedia domain. This would speed up link checking. We're now checking 75 pages at a time w/excellent speed and results.
I do take account of all feature requests and definitely will consider them. But as I explained in another thread my wife has just had a baby and so I have very little free time at the moment (and your request is not trivial). But once things get back to normal (if they ever do!) then I'll see what I can do.
Whitelist for Links Flagged w/Warnings But Otherwise Valid
Diplo Link checker flags certain links as problems which turn out to be OK. Links to Wikipedia content always return a forbidden error. A white list to avoid these on future runs would be great. Here are some examples:
Wikipedia is weird in that they forbid HEAD requests to their server. A HEAD request just asks for the headers rather than downloading the entire page contents, so is much faster to process. If you check the config file there's a comment to this effect:
You can change it to GET and it should work for a wider range of servers, but will be slower.
We're running 50 pages now w/excellent speed. So I would hate to sacrifice that. I take it the whiltelist idea is not feasible for given pages.
Dan,
Just wanted to check in again on the idea of a whitelist. More specfically, it would allow you so create a list of links that were found to be invalid but are actually OK. The program woudl skip them on subsequent checks. For sites like Wikipedia, there could be a setting for suppressing all error messages from the wikipedia domain. This would speed up link checking. We're now checking 75 pages at a time w/excellent speed and results.
Richard
Hi Richard,
I do take account of all feature requests and definitely will consider them. But as I explained in another thread my wife has just had a baby and so I have very little free time at the moment (and your request is not trivial). But once things get back to normal (if they ever do!) then I'll see what I can do.
Dan
Congrats Dan on the new arrival. Completely understand.
is working on a reply...