Creating a csv for excel with xslt (Encoding Problem)
Hi Umbraco peeps,
I'm working on a xslt that outputs csv to the browser.
I change the content type to text/csv, so excel will recognize the file and open it I add the content-disposition header and changes the filename. That works great.
My only problem is that the file is outputted in utf-8 which it seems Excel hates. So all my special danish characters are destroyed when opened in excel.
What do I do about that? Anyone with experience with this? I don't want to teach my users to import the data into excel, which would be a solution.
Maybe I can change the output to ANSI somehow? That might help.
Setting the Response.Charset does not change the encoding of the XSLT output (which I'm sure you know, just stating this for future reference in this thread).
I can't however imagine that Excel has a problem with UTF-8, per se - I recall seeing the "File encoding" dropdown somewhere in one of the many screens of an import operation, which definitely allowed for UTF-8.
The real problem may lie in the .NET pipeline - there was (dunno if it's still there) a bug in using the .transform() method in older versions of MSXML where if you used .transform() and subsequently took the output as a String, it would always be UTF-16 encoded, regardless of setting e.g. <xsl:output encoding="utf-8" />. You had to jump some hoops to get the output in the correct encoding...
Are you wrapping the output as a macro on a simple template? Try to double check the *actual* encoding used for the template...
OK - did a little checking and it seems like Excel really has a problem... according to a couple of StackOverflow posts (this and this), Excel needs the Byte-Order-Mark (BOM) to be present if it's to handle UTF-8 CSVs correct... (the 2nd link has a simple C# workaround).
Thanks for the example. Looks good when using MVC. But it does not work for me when trying to convert it to not MVC. It might not be you (Chriztian) that can answer me what I'm doing wrong, but maybe someone else outthere can.
What I tried to do was to generate my string in xslt and then convert it to utf-8 with BOM using the example from Chriztians post. The return value is then <xsl:value-of'ed into my xslt.
publicstaticstring UTF8WithoutBOMify( string value ) {
byte[] data = Encoding.UTF8.GetBytes( value );
byte[] result = Encoding.UTF8.GetPreamble().Concat( data ).ToArray();
returnEncoding.Unicode.GetString( result, 0, result.Length );
}
The result is exactly the same as if I didn't do it. So it must be terribly wrong :)
Oh - the BOM must be the very first byte of the output.
If you're using value-of you're already inside the XSLT output tree, so that almost certainly can't be made to work...
Can you switch the Macro to use a UserControl instead, in which you perform the transformation yourself, using whatever tools Umbraco gives you, and then perform the necessary "byte-shiftery-magic" before sending the output to "Response" (or whatever it is one does in C# land) ?
Creating a csv for excel with xslt (Encoding Problem)
Hi Umbraco peeps,
I'm working on a xslt that outputs csv to the browser.
I change the content type to text/csv, so excel will recognize the file and open it
I add the content-disposition header and changes the filename. That works great.
My only problem is that the file is outputted in utf-8 which it seems Excel hates. So all my special danish characters are destroyed when opened in excel.
What do I do about that? Anyone with experience with this? I don't want to teach my users to import the data into excel, which would be a solution.
Maybe I can change the output to ANSI somehow? That might help.
/Rune
Hi Rune
Weird it won't accept UTF-8...but then again...it's excel! :)
Have you tried using ISO-8859-1 (Western european signs) to see if that makes a difference?
/Jan
I have now, but I don't think I'm succeeding in actually setting the encoding.
I have tried doing it with an xslt extension method and doing this:
But it's aparently not enough as nothing happens. notepad++ still says it's encoded in utf-8 and excel still hates me.
/Rune
Hi Rune,
Setting the Response.Charset does not change the encoding of the XSLT output (which I'm sure you know, just stating this for future reference in this thread).
I can't however imagine that Excel has a problem with UTF-8, per se - I recall seeing the "File encoding" dropdown somewhere in one of the many screens of an import operation, which definitely allowed for UTF-8.
The real problem may lie in the .NET pipeline - there was (dunno if it's still there) a bug in using the .transform() method in older versions of MSXML where if you used .transform() and subsequently took the output as a String, it would always be UTF-16 encoded, regardless of setting e.g. <xsl:output encoding="utf-8" />. You had to jump some hoops to get the output in the correct encoding...
Are you wrapping the output as a macro on a simple template? Try to double check the *actual* encoding used for the template...
/Chriztian
Hi Chriztian,
Yes I'm using a simple macro in a template. Not sure how to check the actual encoding of the template. It's as standard as it gets:
<%@ Master Language="C#" MasterPageFile="~/umbraco/masterpages/default.master" AutoEventWireup="true" %>
<asp:Content ContentPlaceHolderID="ContentPlaceHolderDefault" runat="server">
<umbraco:Macro Alias="CsvMembers" runat="server" />
asp:Content>
Notepadd++ though, says UTF-8 without BOM, as noted above.
Umbraco is version 4.7.1.1 by the way
/Rune
Hi Rune,
OK - did a little checking and it seems like Excel really has a problem... according to a couple of StackOverflow posts (this and this), Excel needs the Byte-Order-Mark (BOM) to be present if it's to handle UTF-8 CSVs correct... (the 2nd link has a simple C# workaround).
/Chriztian
Thanks for the example. Looks good when using MVC. But it does not work for me when trying to convert it to not MVC. It might not be you (Chriztian) that can answer me what I'm doing wrong, but maybe someone else outthere can.
What I tried to do was to generate my string in xslt and then convert it to utf-8 with BOM using the example from Chriztians post. The return value is then <xsl:value-of'ed into my xslt.
The result is exactly the same as if I didn't do it. So it must be terribly wrong :)
/Rune
Hi Rune,
Oh - the BOM must be the very first byte of the output.
If you're using value-of you're already inside the XSLT output tree, so that almost certainly can't be made to work...
Can you switch the Macro to use a UserControl instead, in which you perform the transformation yourself, using whatever tools Umbraco gives you, and then perform the necessary "byte-shiftery-magic" before sending the output to "Response" (or whatever it is one does in C# land) ?
/Chriztian
I'm facing the same issue, was wondering on how's your code like to add in the content-disposition header with xslt?
It will be great if anyone can share along?
Thanks
Hi Jeric,
I ended up doing this in the top of my <xsl:tempate>
<xsl:value-of select="lange:ClearResponse()"/>
<xsl:value-of select="umbraco.library:ChangeContentType('application/octet-stream')"/>
<xsl:value-of select="lange:AddHeader('content-disposition', 'attachment;filename=orders.csv')"/>
/Rune
Also, my bom xslt library method now looks like this. Call that along with the above to be able to open the csv directly in Excel
/Rune
is working on a reply...