When gathering node data, like when rebuilding the external index, i get a entry in the log telling me:
System.Exception: Error indexing queue items,System.ArgumentException: Illegal characters in path.
at System.IO.Path.CheckInvalidPathChars(String path, Boolean checkAdditional)
at System.IO.Path.GetFileName(String path)
at Our.Umbraco.ezSearch.ezSearchBoostrapper.OnGatheringNodeData(Object sender, IndexingNodeDataEventArgs e)
at Examine.Providers.BaseIndexProvider.OnGatheringNodeData(IndexingNodeDataEventArgs e) in X:\Projects\Examine\Examine\src\Examine\Providers\BaseIndexProvider.cs:line 213
at UmbracoExamine.UmbracoContentIndexer.OnGatheringNodeData(IndexingNodeDataEventArgs e)
at Examine.LuceneEngine.Providers.LuceneIndexer.GetDataToIndex(XElement node, String type) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1115
at Examine.LuceneEngine.Providers.LuceneIndexer.ProcessIndexQueueItem(IndexOperation op, IndexWriter writer) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1965
at Examine.LuceneEngine.Providers.LuceneIndexer.ProcessQueueItem(IndexOperation item, ICollection`1 indexedNodes, IndexWriter writer) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1676
at Examine.LuceneEngine.Providers.LuceneIndexer.ForceProcessQueueItems(Boolean block) in X:\Projects\Examine\Examine\src\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1530, IndexSet: ExternalIndexSet
From what i've learned the Illegal characters in path is the key, that something that my client has uploaded has a a character in the file name that Umbraco doesn't like. Or rather System.IO.Path doesn't like. Or maybe it isn't uploaded, maybe it's a node with a name that got through the requestHandler/urlReplacing handler.
However, I can't figure out how to find the media (or node) that contains the illegal character. How would I do that?
I tried to copy all media and the actual DB from the server to my local environment, without luck..
Umm If you can find it locally, I would try renaming the broke node and deleting the "ExamineIndexes" folder inside App_Data -> TEMP. Then try rebuilding the index. All on local to see if that fixes up your index locally and then you can try on live
I adjust the script above and try to find the wrong node with following sql statement
SELECT * ,Len(urlName) as charlength FROM (SELECT nodeId , [xml], (cast([xml] as xml).value('(/*/@urlName)[1]', 'nvarchar(max)')) as urlName FROM dbo.[cmsContentXml]) as tt
WHERE urlName IS NOT NULL
AND (
urlName like '%<%'
OR urlName like '%>%'
OR urlName like '%|%'
OR urlName like '%"%'
OR urlName LIKE '%' + CHAR(32) + '%'
)
But I am still not finding any nodes except of german "Umlaute" and long urlNames. Could this also affect the RebuildIndex? I recognized, that I can't execute the rebuildindex because of this error (illegal character in path?), and as a consequence some media files get lost after executing the ExamineManger.
I tried to execute the Try and Catch-Block from above, but I don't receive an error message :(
Does Umbraco have difficulties with german "umlaute" in the urlName? I adjust my select-statement like this. And I just found results for "umlaute" like "ü"
SELECT * ,Len(urlName) as charlength FROM (SELECT nodeId , [xml], (cast([xml] as xml).value('(/*/@urlName)[1]', 'nvarchar(max)')) as urlName FROM dbo.[cmsContentXml]) as tt
WHERE urlName IS NOT NULL
AND (
urlName like '%[<>''"äüöÖÜÄ!@#$% |"]%'
)
Illegal characters in path.
When gathering node data, like when rebuilding the external index, i get a entry in the log telling me:
From what i've learned the Illegal characters in path is the key, that something that my client has uploaded has a a character in the file name that Umbraco doesn't like. Or rather System.IO.Path doesn't like. Or maybe it isn't uploaded, maybe it's a node with a name that got through the requestHandler/urlReplacing handler.
However, I can't figure out how to find the media (or node) that contains the illegal character. How would I do that?
I tried to copy all media and the actual DB from the server to my local environment, without luck..
Hey
You could try searching the media file names in the database for any file name that meet the following criteria since they are invalid.
To search for the urlName of a node you could try the following script:
SELECT TOP 1 [nodeId] FROM [db_owner].[cmsContentXml] where (cast([xml] as xml).value('(/{DOCUMENTTYPE}/@urlName)[1]', 'nvarchar(max)')) like '%>%'
The [db_owner].[cmsPropertyData] contains the names of the media images in dataNtext so you could search that column for any invalid characters.
Not sure if any of this helps
Nick
Nice! However, i get an error i TSQL: XQuery [value()]: ")" was expected.
I can't really wrap my head around why!
I'm on SQL 2016 Express.
Did you remove an extra bracket from the query?
No, I had to change db_owner to dbo, but apart from that nothing is changed:
SELECT TOP 1 [nodeId] FROM dbo.[cmsContentXml] where (cast([xml] as xml).value('(/{DOCUMENTTYPE}/@urlName)[1].', 'nvarchar(max)')) like '%>%'
Hmm.
Umbraco 7.5.11 if that changes anything.
Ah whoops, You need to replace {DOCUMENTTYPE} with your document type. I don't think there is a way to do all document types at once.
Aaaah. Of course : )
For future reference:
SELECT TOP 1 [nodeId] FROM dbo.[cmsContentXml] where (cast([xml] as xml).value('(/Image/@urlName)[1].', 'nvarchar(max)')) like '%>%'
Oh well. I cannot find anything weird apart from an ê. But renaming the node and file didn't do anything.
Any other suggestions?
I figured out how I can reproduce it in my local environment. Can i debug it somehow? VS won't break on anything that i can think of.
Umm If you can find it locally, I would try renaming the broke node and deleting the "ExamineIndexes" folder inside App_Data -> TEMP. Then try rebuilding the index. All on local to see if that fixes up your index locally and then you can try on live
I can trigger the error when rebuilding the index from Developer/Examine Managenment.
It doesn't solve the issue, nor if i delete the indexes :/
ah difficult. I think you could try using the actual source code to debug it locally since your got the live database.
I believe this is the source code of what is called when you rebuild the index:
So if you run the above code somewhere and put a break point on the exception you should be able to find the broken node and rename that node.
ah difficult. I think you could try using the actual source code to debug it locally since your got the live database.
I believe this is the source code of what is called when you rebuild the index:
So if you run the above code somewhere and put a break point on the exception you should be able to find the broken node and rename that node.
Unfortunately it seems that the CheckInvalidPathChars throws silent errors. So that RebuildIndex doesn't. Gaah.
I need to think about something else for now. I'll return and figure it out in a day or so.
: )
I adjust the script above and try to find the wrong node with following sql statement
But I am still not finding any nodes except of german "Umlaute" and long urlNames. Could this also affect the RebuildIndex? I recognized, that I can't execute the rebuildindex because of this error (illegal character in path?), and as a consequence some media files get lost after executing the ExamineManger. I tried to execute the Try and Catch-Block from above, but I don't receive an error message :(
Does anyone have same difficulties?
Does Umbraco have difficulties with german "umlaute" in the urlName? I adjust my select-statement like this. And I just found results for "umlaute" like "ü"
I Know it's a long time ago.. But did you find a solution ??
Having the issue on 7.15.7..
is working on a reply...