i have the same issue. I saw this post that might be helpful, but i don't know if i want to turn off the tidy code option because i cant see the admin trying to copy and paste code from a word document or something.
When i realized the same behaviour of u4.0.x, i start a little investigation and internet crawling... The same problem was reported in old umbraco forum in 2009 and it seems still no solution except turning Tidy off...
So, it was time for me to download the Tidy.NET sources from SF and start to debug ;-)
The results are:
1) The dashes (and some quotation marks) replacement occurs inside the tidy's PerfectPrint part, the code snippet is:
if (_options.MakeClean) { if (c >= 0x2013 && c <= 0x201E) { switch (c) { case 0x2013: case 0x2014: c = '-'; break; case 0x2018: case 0x2019: case 0x201A: c = '\''; break; case 0x201C: case 0x201D: case 0x201E: c = '"'; break; } } }
2) There are two possible solutions as i can see:
a) turn off the MakeClean option in the umbraco xhtml helper code (in the Tidy initialization part) - it will need to recompile umbraco cms dll
b) because of "MakeClean" usage in other parts of Tidy and maybe unexpected result of this code change, i made a little change in the Tidy's code itself:
if (_options.MakeClean) {
into the
if (_options.MakeClean && _options.CharEncoding != CharEncoding.Raw && _options.CharEncoding != CharEncoding.UTF8)
3) So, with this little update the Tidy will made NO any "dash-changes" in your editor content if the "config\umbracoSettings.config" TidyCharEncoding option state is default ("UTF8") or "Raw".
4) There is another sad moment connected to this behavior, it's about TinyMCE paste plugin and his replacement default behavior. When you'll try to insert some piece of content with UTF8 entities via the ctrl-V in paste plugin - the TinyMCE will replace very strange ;-) set of entities with the ASCII analogs. Luckily, this behavior can be very simply changed by adding into the "config\tinyMceConfig.config" file the two fllowing keys:
I've also noticed that non-breaking spaces ( ) get stripped and replaced with regular spaces. This has unfortunate consequences in those rare (but important!) cases where these are truly needed... (One more-common example are in in empty table cells when dealing with really old sites.)
I'm definitely going to try Doug's package because the em-dashes have also been a major pain for us: they are an important typographical/grammatical character which is definitely not the same as a hyphen! I just found this post now, so thought I'd also mention the non-breaking space problem for posterity...
Our of curiosity, I downloaded the Tidy.NET source from sourceforge and found the same spot in the code that Alexander noted along with some interesting commentary. (Still trying to see where perhaps our precious non-breaking spaces are being stripped.)
To respond to Allan's question,
Who made the decision to replace em-dash with '-' what would possibly be the reasoning behind this?
.... Here's what it says about this. (Keep in mind, the Tidy.NET project was last updated in June of 2005! And this code section in particular seems to have been last updated back in 2000??? So perhaps a bit of antiquity that is doing more harm than good now...)
/*
Filters from Word and PowerPoint often use smart
quotes resulting in character codes between 128
and 159. Unfortunately, the corresponding HTML 4.0
entities for these are not widely supported. The
following converts dashes and quotation marks to
the nearest ASCII equivalent. My thanks to
Andrzej Novosiolov for his help with this code.
*/
This has been an interesting learning experience to look into the Tidy.NET project a bit and realize the magnitude of all the tidying going-on behind the scenes!
Can't insert em dashes
I can't insert em dashes in the TinyMCE editor, it appears as an em dash until I publish, then it converts to a hyphen. Any ideas?
Thank you!
i have the same issue. I saw this post that might be helpful, but i don't know if i want to turn off the tidy code option because i cant see the admin trying to copy and paste code from a word document or something.
Hi all!
When i realized the same behaviour of u4.0.x, i start a little investigation and internet crawling... The same problem was reported in old umbraco forum in 2009 and it seems still no solution except turning Tidy off...
So, it was time for me to download the Tidy.NET sources from SF and start to debug ;-)
The results are:
1) The dashes (and some quotation marks) replacement occurs inside the tidy's PerfectPrint part, the code snippet is:
2) There are two possible solutions as i can see:
a) turn off the MakeClean option in the umbraco xhtml helper code (in the Tidy initialization part) - it will need to recompile umbraco cms dll
b) because of "MakeClean" usage in other parts of Tidy and maybe unexpected result of this code change, i made a little change in the Tidy's code itself:
into the
3) So, with this little update the Tidy will made NO any "dash-changes" in your editor content if the "config\umbracoSettings.config" TidyCharEncoding option state is default ("UTF8") or "Raw".
4) There is another sad moment connected to this behavior, it's about TinyMCE paste plugin and his replacement default behavior. When you'll try to insert some piece of content with UTF8 entities via the ctrl-V in paste plugin - the TinyMCE will replace very strange ;-) set of entities with the ASCII analogs. Luckily, this behavior can be very simply changed by adding into the "config\tinyMceConfig.config" file the two fllowing keys:
--> To all guru's: what is the best way to place somewhere on this community site the updated TidyNet.dll file?
Thanks.
Alexander,
You could make it a "package" and post it under "Projects", or you can just upload the dll to your own server and paste a download link here.
~Heather
If anyone else comes across this problem I've created a package with a new TidyNet.dll based on Alexander's post above.
Package is here
Thanks, Doug!
@Doug - just discovered that one today. Saved my day!
Thanks Doug.... this was exactly what I needed.
Who made the decision to replace em-dash with '-' what would possibly be the reasoning behind this?
I've also noticed that non-breaking spaces ( ) get stripped and replaced with regular spaces. This has unfortunate consequences in those rare (but important!) cases where these are truly needed... (One more-common example are in in empty table cells when dealing with really old sites.)
I'm definitely going to try Doug's package because the em-dashes have also been a major pain for us: they are an important typographical/grammatical character which is definitely not the same as a hyphen! I just found this post now, so thought I'd also mention the non-breaking space problem for posterity...
Thanks!
Our of curiosity, I downloaded the Tidy.NET source from sourceforge and found the same spot in the code that Alexander noted along with some interesting commentary. (Still trying to see where perhaps our precious non-breaking spaces are being stripped.)
To respond to Allan's question,
.... Here's what it says about this. (Keep in mind, the Tidy.NET project was last updated in June of 2005! And this code section in particular seems to have been last updated back in 2000??? So perhaps a bit of antiquity that is doing more harm than good now...)
This has been an interesting learning experience to look into the Tidy.NET project a bit and realize the magnitude of all the tidying going-on behind the scenes!
is working on a reply...