problems with displaying text-string properties in the frontend when the content is Arabic
I’m having problems with displaying text-string properties in the frontend when the content is Arabic. I checked the backend, the content there correct, but in the frontend it displays something like “????” This problem only exists for text-string properties. There are no problems with dictionary item and the rich text editor. I’m using a My SQL database and Umbraco version 4.5.1. I followed the instructions in this thread http://our.umbraco.org/forum/using/ui-questions/13235-Cyrillic-letters-turn-into-''-when-entered-into-textstring-etc but it didn’t resolve my problem. I’m stuck so any help will be appreciated.
Which instructions have you followed from the post? Have you tried the altering stuff that Gal is posting about?
If it does not help to set your charset to utf-8 in the html source have you tried doing so in the database? I must admit that I would find it a bit weird if it's the database you need to alter though since it's looking right in the back office.
Have you checked to see if the content looks right in the umbraco.config file? (The xml cache) - Is the encoding of this document also set to utf 8?
How about the server what mime-type is it set to serve the content with?
I checked umbraco.config. The content of the text-string fields is also "????". The rich editor field contains html character codes. This can't be good for SEO. The DOCTYPE declaration in umbraco.config is
<!DOCTYPE umbraco[ <!ELEMENT nodes ANY> <!ELEMENT node ANY> <!ATTLIST node id ID #REQUIRED> ]>
My character set is set to utf-8 in my master template: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
The create statement for the table cmspropertydata defines dataNtext with charset latin1. This could explain why data is not encoded as utf8 in the rich text editor field CREATE TABLE `cmspropertydata` ( `id` int(11) NOT NULL auto_increment, `contentNodeId` int(11) NOT NULL, `versionId` char(36) character set latin1 default NULL, `propertytypeid` int(11) NOT NULL, `dataInt` int(11) default NULL, `dataDate` datetime default NULL, `dataNvarchar` varchar(500) character set utf8 default NULL, `dataNtext` longtext character set latin1, PRIMARY KEY (`id`), KEY `IX_cmsPropertyData_1` (`contentNodeId`), KEY `IX_cmsPropertyData_2` (`versionId`), KEY `IX_cmsPropertyData_3` (`propertytypeid`) ) ENGINE=MyISAM AUTO_INCREMENT=24667 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Is the site in question running on shared hosting or do you have access to the server through a remote connection or something like that? If you're running IIS7.x you should be able to click on the site in IIS and see a button called ".NET globalization".
I altered the CHARACTER SET for dataNtext to utf8 and now I see the value is right in the backend (arabic letters)
alter table `cmspropertydata` modify `dataNtext` longtext CHARACTER SET utf8;
Everything looks right in the backend now, but still "????" in the frontend for rich text editor and text-string properties. For Dictionary items everything is right both in the backend and frontend. For comparison the create statement for the Dictionary table looks like this
DROP TABLE IF EXISTS `cmslanguagetext`; CREATE TABLE `bilalisa_database00780`.`cmslanguagetext` ( `pk` int(11) NOT NULL auto_increment, `languageId` int(11) NOT NULL, `UniqueId` char(36) NOT NULL, `value` varchar(1000) character set utf8 NOT NULL, PRIMARY KEY (`pk`) ) ENGINE=MyISAM AUTO_INCREMENT=234 DEFAULT CHARSET=latin1;
So it seems like charset for the table has no influence on how a column field is encoded. So I'm guessing that Dictionary items and properties are handled differently when we want to display them in the frontend.
The site is running on a shared environment. I'm guessing it's running on a IIS 6.0 but I don't have access to it.
Have you tried to republish the entire site and refresh the XML cache after the changes you have made? If not this can be the reason you can't see the effect on the frontend of your page yet.
Bilal, out of curiosity, what are you using to display the properties from the fields you're having issues with? It seems odd that you're having to go through all of this trouble. I've created several sites in various languages and have never had to make any modifications to Umbraco at the level you're having to.
Choose the "Content" node in your content section and right click it and choose "Republish entire site". This refreshes the chache. In order to make a full repbulish you need to right click on the root node of your site and choose "publish". (You probably already know the last part, but the Republish entire site stuff can be a bit misleading if not explained).
The question marks "????" appear both in umbraco.config and the frontend. It does not matter whether I use Umbraco page fields or XSLT. Both text-string properties and rich text editor display "????" in the frontend and in umbraco.config.
I'm running out of ideas...Have you tried to use another character encoding than UFT-8 in all of the mentioned places? Perhaps you need to set the encoding to ISO/IEC 8859-6 (which should be arabic as far as I know).
Otherwise it might be worth the try to contact the host to find out if it can be solved in IIS.
On the server running IIS, make sure the Regional Settings are set to Arabic (someting like Arabic Saudi Arabia will always work).
Also, to troubleshoot, just develop a simple website using VS 2005 or similar and develop a simple ASP.NET Web Page (ASPX) which displayes Arabic Text using direct literal text, and also Arabic Text from a Database.
I faced similar problems before and I solved them by making the needed config changes on the Server and web.config.
Even the page names are showing in Arabic and URL is still working fine. I was surprised.
Here is what happens, to the best of my understanding:
1. Properties are stored in SQL Server,
2. The values are cashed in some forme in text files (I think) or XML File under, I think "\data\umbraco.config",
3. The values are then merged with the templates and rendered to the Browser.
If only properties are not showing, so this means SQL Server must have proper support for Arabic Chars, and also, the above file must be saved in same code page used in SQL Server, and IIS Server/Windows Server must have compatible settings, and web.config must have compatible setting like:
During the transformation of the values from the SQL Server to the Browser, if any code-page is not compatible, the values will be garbled.
Try to open the above file, I think "\data\umbraco.config", directly from you IE, and see how the property values look like.
In short, make sure the code page used to store Arabic is supported/compatible from the time the values are created until they are rendered to the browser.
Also, try to deply the same web site on your local PC or Server to see if you can replicate and then solve the problem.
To check the encoding used for the text or XML File, simple open it in NotePad, take "Save As", and confirm the encoding is in UTF-8 which is the best option as per my expereince.
I checked the XML cache umbraco.config. It is encoded as utf-8. The arabic properties show up as "????" when I open it in both notepad and IE. But dictionary elements are displayed right.
Regarding 1: The website is hosted on a shared environment. It’s running on a production server with multiple websites and I don’t have control over Regional settings on the server.
Regarding 2: I’m running a MySQL server. The default character set for my database is utf
DEFAULT CHARACTER SET utf8 COLLATE utf8_unicode_ci
Regarding 3: I’m guessing you meant
<TidyCharEncoding>UTF8</TidyCharEncoding>
in umbracoSettings.config.
The operating system on the shared environment is Microsoft Windows 2003 -- Microsoft Windows NT 5.2.3790 Service Pack 2 with IIS version 6.0, but my local pc is Windows 7 Enterprise and IIS version 7. So I think even if I can get things to work locally I will still have a problem on the production environment.
1. You can report your problem to the hosting company, and request them to apply the possible resolution. You have nothing to loose if you report the problem and maybe they can apply your resolution only for your website if they can deploy it to a VM setup.
My recommendation is to report the summary of the complete details related to the problem, becuase they can figure out what is going on.
2. Did you open the SQL Table where the Arabic Properties are stored ? How do they look ? Try to locate where the Dictionary Items are Stored in SQL and this will help to lead you to a resolution.
3. Based on my understanding, the Umbraco Document Property Values are stored in SQL Table, and then they are moved to some "cashed" store but I don't know the exact location and details. Based on my test, I deleted the file "data\umbraco.config" and still the website is working. The file was recreated after click on "Save and Publish" in Umbraco back-office. This means the cashed data is stored in another place then it is rendered to the browser. Maybe some other Umbracian guru can help in this area.
4. The fact that Dictionary Items are looking fine, could mean the Database Encoding is correct, and/or the dictionary items are not cashed (maybe). Such details will help you find a resolution.
Based on your input, you confirmed the following:
1. You can store Arabic Text in SQL Table (same DB of Umbraco) and you can display it correctly on a normal ASPX Web Page (also under Umbraco Website), right ? You can test that by writing some Data Access Code in Code Behind of some Umbraco Web Page Template.
2. You can display add literal Arabic Text in ASPX Web Page and display it on the Browser properly (umbraco Umbraco Website), right ? You can also test that by writing some literal Arabic Text in Umbraco Web Page Template.
3. You can retrieve Arabic Dictionary Items and display them properly in ASPX under Umbraco, right ?
So, this leaves you with only the method and encoding used to Store, Retreive, and Render the Property Values in Umbraco. If you can find out where they live in each cycle, you will solve the problem. And, I can say 87.56% it is "Regional Settings" related matter.
problems with displaying text-string properties in the frontend when the content is Arabic
I’m having problems with displaying text-string properties in the frontend when the content is Arabic. I checked the backend, the content there correct, but in the frontend it displays something like “????” This problem only exists for text-string properties. There are no problems with dictionary item and the rich text editor.
I’m using a My SQL database and Umbraco version 4.5.1. I followed the instructions in this thread http://our.umbraco.org/forum/using/ui-questions/13235-Cyrillic-letters-turn-into-''-when-entered-into-textstring-etc but it didn’t resolve my problem.
I’m stuck so any help will be appreciated.
Bilal
Hi Bilal
Which instructions have you followed from the post? Have you tried the altering stuff that Gal is posting about?
If it does not help to set your charset to utf-8 in the html source have you tried doing so in the database? I must admit that I would find it a bit weird if it's the database you need to alter though since it's looking right in the back office.
Have you checked to see if the content looks right in the umbraco.config file? (The xml cache) - Is the encoding of this document also set to utf 8?
How about the server what mime-type is it set to serve the content with?
/Jan
Are you using any text-replacement on the front-end? Cufon, sifr, etc?
Hi Amir
No, not at all.
/Bilal
Hi Jan
I checked umbraco.config. The content of the text-string fields is also "????". The rich editor field contains html character codes. This can't be good for SEO. The DOCTYPE declaration in umbraco.config is
<!DOCTYPE umbraco[ <!ELEMENT nodes ANY> <!ELEMENT node ANY> <!ATTLIST node id ID #REQUIRED> ]>
My character set is set to utf-8 in my master template: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I followed Gals instruction here: http://our.umbraco.org/forum/getting-started/installing-umbraco/13225-%5Binfo%5D-MySQL-mangle-all-non-English-character-set-like-Hebrew-Arabic-and-others
I changed the value of TidyCharEncoding in umbracoSettings.config from "UTF8" to "ASCII".
About the server mime-type how can I get this information?
/Bilal
The create statement for the table cmspropertydata defines dataNtext with charset latin1.
This could explain why data is not encoded as utf8 in the rich text editor field
CREATE TABLE `cmspropertydata` (
`id` int(11) NOT NULL auto_increment,
`contentNodeId` int(11) NOT NULL,
`versionId` char(36) character set latin1 default NULL,
`propertytypeid` int(11) NOT NULL,
`dataInt` int(11) default NULL,
`dataDate` datetime default NULL,
`dataNvarchar` varchar(500) character set utf8 default NULL,
`dataNtext` longtext character set latin1,
PRIMARY KEY (`id`),
KEY `IX_cmsPropertyData_1` (`contentNodeId`),
KEY `IX_cmsPropertyData_2` (`versionId`),
KEY `IX_cmsPropertyData_3` (`propertytypeid`)
) ENGINE=MyISAM AUTO_INCREMENT=24667 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Hi Bilal
Maybe it's a database issue afterall then.
Is the site in question running on shared hosting or do you have access to the server through a remote connection or something like that? If you're running IIS7.x you should be able to click on the site in IIS and see a button called ".NET globalization".
/Jan
Hi Jan
I altered the CHARACTER SET for dataNtext to utf8 and now I see the value is right in the backend (arabic letters)
alter table `cmspropertydata` modify `dataNtext` longtext CHARACTER SET utf8;
Everything looks right in the backend now, but still "????" in the frontend for rich text editor and text-string properties. For Dictionary items everything is right both in the backend and frontend. For comparison the create statement for the Dictionary table looks like this
DROP TABLE IF EXISTS `cmslanguagetext`;
CREATE TABLE `bilalisa_database00780`.`cmslanguagetext` (
`pk` int(11) NOT NULL auto_increment,
`languageId` int(11) NOT NULL,
`UniqueId` char(36) NOT NULL,
`value` varchar(1000) character set utf8 NOT NULL,
PRIMARY KEY (`pk`)
) ENGINE=MyISAM AUTO_INCREMENT=234 DEFAULT CHARSET=latin1;
So it seems like charset for the table has no influence on how a column field is encoded. So I'm guessing that Dictionary items and properties are handled differently when we want to display them in the frontend.
The site is running on a shared environment. I'm guessing it's running on a IIS 6.0 but I don't have access to it.
/Bilal
Hi Bilal
That's an interesting discovery...
Have you tried to republish the entire site and refresh the XML cache after the changes you have made? If not this can be the reason you can't see the effect on the frontend of your page yet.
/Jan
Bilal, out of curiosity, what are you using to display the properties from the fields you're having issues with? It seems odd that you're having to go through all of this trouble. I've created several sites in various languages and have never had to make any modifications to Umbraco at the level you're having to.
-Amir
Hi Jan
How is refreshing XML cache done?
/Bilal
Hi Bilal
Choose the "Content" node in your content section and right click it and choose "Republish entire site". This refreshes the chache. In order to make a full repbulish you need to right click on the root node of your site and choose "publish". (You probably already know the last part, but the Republish entire site stuff can be a bit misleading if not explained).
/Jan
Hi Amir
I'm using XSLT, Umbraco macros and Umbraco page field macros.
/Bilal
Hi Jan
I tried to republish entire site without any luck :(
/Bilal
Hi Bilal
Is it the content from the umbraco page fields that are shown as ??? characters?
And do these characters still appear in the umbraco.config or only on the frontend?
/Jan
The question marks "????" appear both in umbraco.config and the frontend. It does not matter whether I use Umbraco page fields or XSLT. Both text-string properties and rich text editor display "????" in the frontend and in umbraco.config.
/Bilal
Hmmm
I'm running out of ideas...Have you tried to use another character encoding than UFT-8 in all of the mentioned places? Perhaps you need to set the encoding to ISO/IEC 8859-6 (which should be arabic as far as I know).
Otherwise it might be worth the try to contact the host to find out if it can be solved in IIS.
Hope this helps.
/Jan
Hi Jan
I tried the encoding ISO/IEC 8859-6 but without any luck. I'm pulling my hair over this problem!
/Bilal
All suggestions are welcome..
Can you paste me some of the Arabic text you're having issues with? I want to see what happens in a basic Umbraco install.
Hi Amir
Try with this
I wrote it in Notepad2 using utf-8 encoding.
/Bilal
Ofcourse this editor does not support utf-8 :/// crap
you can email me a text file at amir at blackswaninteractive dot com if you'd like
On the server running IIS, make sure the Regional Settings are set to Arabic (someting like Arabic Saudi Arabia will always work).
Also, to troubleshoot, just develop a simple website using VS 2005 or similar and develop a simple ASP.NET Web Page (ASPX) which displayes Arabic Text using direct literal text, and also Arabic Text from a Database.
I faced similar problems before and I solved them by making the needed config changes on the Server and web.config.
Tarek
Hi Tarek
The website is running on a shared environment. I don't think I can ask my webhotel provider to set Regional settings to arabic.
I did a test where I added a static page to my webhotel and added arabic text to it.Everything worked fine.
So I think the problem is in Umbraco.
/Bilal
did you tried saving the file with utf-8 encoding ?
I have tested Arabic with Umbraco v 4.0.4.1 about 1 year ago on a local server, and everything is working fine:
http://bit.ly/hbRaAZ
Even the page names are showing in Arabic and URL is still working fine. I was surprised.
Here is what happens, to the best of my understanding:
1. Properties are stored in SQL Server,
2. The values are cashed in some forme in text files (I think) or XML File under, I think "\data\umbraco.config",
3. The values are then merged with the templates and rendered to the Browser.
If only properties are not showing, so this means SQL Server must have proper support for Arabic Chars, and also, the above file must be saved in same code page used in SQL Server, and IIS Server/Windows Server must have compatible settings, and web.config must have compatible setting like:
During the transformation of the values from the SQL Server to the Browser, if any code-page is not compatible, the values will be garbled.
Try to open the above file, I think "\data\umbraco.config", directly from you IE, and see how the property values look like.
In short, make sure the code page used to store Arabic is supported/compatible from the time the values are created until they are rendered to the browser.
Also, try to deply the same web site on your local PC or Server to see if you can replicate and then solve the problem.
Tarek.
To check the encoding used for the text or XML File, simple open it in NotePad, take "Save As", and confirm the encoding is in UTF-8 which is the best option as per my expereince.
Tarek.
Hi Eran
I presume you mean the static test page that I added. Yes I saved it as utf-8.
/Bilal
Hi Tarek
I added the fileEncoding and culture attributes to the globalization element in web.config
I checked the XML cache umbraco.config. It is encoded as utf-8. The arabic properties show up as "????" when I open it in both notepad and IE. But dictionary elements are displayed right.
/Bilal
This leaves you with the following possibilities:
1. Regional settings of the server,
2. SQL Server Encoding
3. Encoding of the cache, which is not the file I mentioned earlier but I really don't know where it is stored
All encoding must be compatible and supports Arabic.
Did you try to reinstall the same Umbraco website on your local PC / Server ?
Tarek
Regarding 1: The website is hosted on a shared environment. It’s running on a production server with multiple websites and I don’t have control over Regional settings on the server.
Regarding 2: I’m running a MySQL server. The default character set for my database is utf
Regarding 3: I’m guessing you meant
in umbracoSettings.config.
The operating system on the shared environment is Microsoft Windows 2003 -- Microsoft Windows NT 5.2.3790 Service Pack 2 with IIS version 6.0, but my local pc is Windows 7 Enterprise and IIS version 7. So I think even if I can get things to work locally I will still have a problem on the production environment.
/Bilal
Ok, I see your point.
More points:
1. You can report your problem to the hosting company, and request them to apply the possible resolution. You have nothing to loose if you report the problem and maybe they can apply your resolution only for your website if they can deploy it to a VM setup.
My recommendation is to report the summary of the complete details related to the problem, becuase they can figure out what is going on.
2. Did you open the SQL Table where the Arabic Properties are stored ? How do they look ? Try to locate where the Dictionary Items are Stored in SQL and this will help to lead you to a resolution.
3. Based on my understanding, the Umbraco Document Property Values are stored in SQL Table, and then they are moved to some "cashed" store but I don't know the exact location and details. Based on my test, I deleted the file "data\umbraco.config" and still the website is working. The file was recreated after click on "Save and Publish" in Umbraco back-office. This means the cashed data is stored in another place then it is rendered to the browser. Maybe some other Umbracian guru can help in this area.
4. The fact that Dictionary Items are looking fine, could mean the Database Encoding is correct, and/or the dictionary items are not cashed (maybe). Such details will help you find a resolution.
Based on your input, you confirmed the following:
1. You can store Arabic Text in SQL Table (same DB of Umbraco) and you can display it correctly on a normal ASPX Web Page (also under Umbraco Website), right ? You can test that by writing some Data Access Code in Code Behind of some Umbraco Web Page Template.
2. You can display add literal Arabic Text in ASPX Web Page and display it on the Browser properly (umbraco Umbraco Website), right ? You can also test that by writing some literal Arabic Text in Umbraco Web Page Template.
3. You can retrieve Arabic Dictionary Items and display them properly in ASPX under Umbraco, right ?
So, this leaves you with only the method and encoding used to Store, Retreive, and Render the Property Values in Umbraco. If you can find out where they live in each cycle, you will solve the problem. And, I can say 87.56% it is "Regional Settings" related matter.
Tarek.
is working on a reply...