Sorry, can check more than 5 pages by editing the config file.
My supervisor would like a way to export a report for the entire site such as a CSV file -- but since right now it has to be run manually and only a few pages at a time, that may be asking alot at this point in time. After all, you just released this not long ago!
Thanks for the feedback. As you discovered, you can check more than 5 pages by editing the config. I might make that something that is easily done via the interface, too. Your idea about selecting the start node is a good one - the underlying code supports that, it just more of an interface question (as I'm not sure how to integrate the Umbraco content picker into my control). But it's something I'll look into.
As for reporting, I did build something in that exported the report to Word, but it wasn't fully working so I disabled it. Long term I think being able to run a check for the entire site and generate a report is something I'll be looking at.
It took me from about 9:00 AM to 3:30 PM to check 2000 pages - then I somehow got out of the link checker and will have to start back all over again from the homepage start node. However, since our site is over 8000 pages and still growing it may be impossible for me to check all the links this way.
If you ever get this to work so that I can choose a different start node I will be forever grateful! :)
I can appreciate that is a pain. I had a quick play around and have figured out to integrate the Content Picker control from Umbraco into the Link Checker. I've also got it basically working, but need to figure out how to deal with potential problems (such as what happens if you pick a new start node half-way through checking) etc. But hopefully I shall be able to release something soon.
OK, I've uploaded the new version (1.3) that adds the feature to select a start node. It seems to work OK on my tests, but would be good if you could give me any feedback on this. Download the new package from the usual place: http://our.umbraco.org/projects/backoffice-extensions/diplo-linkchecker
We have an Umbraco instance with 55 sites, ranging from 100-750 pages. In the current version, could we start the checking at any of the top level nodes? Also would we be able to configure checking 200 pages vs. 5? Any progress on outputting a report in word or excel?
That's a big site you've go there :) Yes, the latest version does allow you to select the start node so you can indeed just check any of the 55 sub-sites.
You can also change how many pages are checked at once in the config file. When you install the Link Checker it adds a DiploLinkChecker.config file into the /config/ directory of Umbraco which you can edit. One of the options is how many pages to check at once.
However, please bare in mind that setting it to 200 would mean it would take a long time to complete, with the risk that the request might time-out all together. The reason is that to check 200 pages it first as to make a request to each of those pages, parse out all the links from each page and then send HTTP requests to each and every link it finds in them. A typical page can have dozens of links. It also will wait around 20 seconds to determine whether a link exists if there is no instant response (it could just be the server is slow in responding, so you have to be conservative). I have tried to make this process asynchronous, but even so it takes time and processing power.
I'm afraid as the father of a new baby (six weeks old) I don't currently have time to add the reports bit I was planning as I'm sure you understand. I may release the code, though, when i get the chance :)
Thanks for the quick response and congrats on the new arrival!! A couple of other questions:
If we were to start at one of the nodes above, let's say "Cardiac Surgery", would your program, upon finishing that node, proceed automatically to checking the next to level node "Colorectal Surgery" and so on until all the nodes have been completed.
Is it possible to exclude any nodes/tree branches from being checked?
Although the answer is probably obvious, what is the advantage of using your Umbraco-based link checker over generic link checkers like (http://linkchecker.sourceforge.net/)?
"If we were to start at one of the nodes above, let's say "Cardiac Surgery", would your program, upon finishing that node, proceed automatically to checking the next to level node "Colorectal Surgery" and so on until all the nodes have been completed."
No, it only works down the tree - so it would start at whatever start node you chose and then would check all the descendants of that node. Of course you can start at the very root, which will check all nodes - but this would take a long time in a site as large as yours.
Is it possible to exclude any nodes/tree branches from being checked?
No (but bear in mind above answer in that it won't check "up" the tree)
Although the answer is probably obvious, what is the advantage of using your Umbraco-based link checker over generic link checkers like (http://linkchecker.sourceforge.net/)?
I guess the big advantage is that it's integrated into the back-end of Umbraco, so it is web-based and so accessible to anyone who has access to the Content section in Umbraco. It's not a desktop app that needs installing.
The other thing is that the link-checker knows about every single Umbraco page in your site (because it uses the API_ without the need to "spider" it.
I guess the thing for you to do is install it and try it - you can always uninstall the package if you don't like it.
suggested features
We have a large site and lots of users, so it would be nice if they could use a tool like this to check their own pages.
Sorry, can check more than 5 pages by editing the config file.
My supervisor would like a way to export a report for the entire site such as a CSV file -- but since right now it has to be run manually and only a few pages at a time, that may be asking alot at this point in time. After all, you just released this not long ago!
Thanks for building it!
Hi Jennifer,
Thanks for the feedback. As you discovered, you can check more than 5 pages by editing the config. I might make that something that is easily done via the interface, too. Your idea about selecting the start node is a good one - the underlying code supports that, it just more of an interface question (as I'm not sure how to integrate the Umbraco content picker into my control). But it's something I'll look into.
As for reporting, I did build something in that exported the report to Word, but it wasn't fully working so I disabled it. Long term I think being able to run a check for the entire site and generate a report is something I'll be looking at.
Dan
It took me from about 9:00 AM to 3:30 PM to check 2000 pages - then I somehow got out of the link checker and will have to start back all over again from the homepage start node. However, since our site is over 8000 pages and still growing it may be impossible for me to check all the links this way.
If you ever get this to work so that I can choose a different start node I will be forever grateful! :)
I can appreciate that is a pain. I had a quick play around and have figured out to integrate the Content Picker control from Umbraco into the Link Checker. I've also got it basically working, but need to figure out how to deal with potential problems (such as what happens if you pick a new start node half-way through checking) etc. But hopefully I shall be able to release something soon.
OK, I've uploaded the new version (1.3) that adds the feature to select a start node. It seems to work OK on my tests, but would be good if you could give me any feedback on this. Download the new package from the usual place: http://our.umbraco.org/projects/backoffice-extensions/diplo-linkchecker
Awesomeness! Thank-you so much! In my initial testing everything is working swimmingly.
Dan,
We have an Umbraco instance with 55 sites, ranging from 100-750 pages. In the current version, could we start the checking at any of the top level nodes? Also would we be able to configure checking 200 pages vs. 5? Any progress on outputting a report in word or excel?
Hi Richard,
That's a big site you've go there :) Yes, the latest version does allow you to select the start node so you can indeed just check any of the 55 sub-sites.
You can also change how many pages are checked at once in the config file. When you install the Link Checker it adds a DiploLinkChecker.config file into the /config/ directory of Umbraco which you can edit. One of the options is how many pages to check at once.
However, please bare in mind that setting it to 200 would mean it would take a long time to complete, with the risk that the request might time-out all together. The reason is that to check 200 pages it first as to make a request to each of those pages, parse out all the links from each page and then send HTTP requests to each and every link it finds in them. A typical page can have dozens of links. It also will wait around 20 seconds to determine whether a link exists if there is no instant response (it could just be the server is slow in responding, so you have to be conservative). I have tried to make this process asynchronous, but even so it takes time and processing power.
I'm afraid as the father of a new baby (six weeks old) I don't currently have time to add the reports bit I was planning as I'm sure you understand. I may release the code, though, when i get the chance :)
Thanks for the quick response and congrats on the new arrival!! A couple of other questions:
If we were to start at one of the nodes above, let's say "Cardiac Surgery", would your program, upon finishing that node, proceed automatically to checking the next to level node "Colorectal Surgery" and so on until all the nodes have been completed.
Is it possible to exclude any nodes/tree branches from being checked?
Although the answer is probably obvious, what is the advantage of using your Umbraco-based link checker over generic link checkers like (http://linkchecker.sourceforge.net/)?
"If we were to start at one of the nodes above, let's say "Cardiac Surgery", would your program, upon finishing that node, proceed automatically to checking the next to level node "Colorectal Surgery" and so on until all the nodes have been completed."
No, it only works down the tree - so it would start at whatever start node you chose and then would check all the descendants of that node. Of course you can start at the very root, which will check all nodes - but this would take a long time in a site as large as yours.
Is it possible to exclude any nodes/tree branches from being checked?
No (but bear in mind above answer in that it won't check "up" the tree)
Although the answer is probably obvious, what is the advantage of using your Umbraco-based link checker over generic link checkers like (http://linkchecker.sourceforge.net/)?
I guess the big advantage is that it's integrated into the back-end of Umbraco, so it is web-based and so accessible to anyone who has access to the Content section in Umbraco. It's not a desktop app that needs installing.
The other thing is that the link-checker knows about every single Umbraco page in your site (because it uses the API_ without the need to "spider" it.
I guess the thing for you to do is install it and try it - you can always uninstall the package if you don't like it.
is working on a reply...