I've got some content in RTEs (both in grids and not) which has some span tags in it for formatting. These tags are causing the content to be fragmented when exported to XLIFF, which is causing problems for the translators who would much prefer to have whole sentences to translate.
The formatting isn't necessary for any of the translations, so I'd like to ignore it on export. I'm trying to write a value mapper to do this, which I think should be used instead of the default Inline Tag Mapper. I've set the Editors like this:
public override string[] Editors
{
get
{
return new string[] { "Umbraco.TinyMCEv3.Tag", "Umbraco.TinyMCEv3.Image" };
}
}
But my GetSourceValue method never gets called when exporting. How can I set my value mapper to be used instead of the default one?
(all IValueMappers are loaded so yours should already be there).
however I don't think this is what you want to do (but i am not sure how you can do it)
The splitting of the text is done by the XmlSerializer - the value mappers get the text out of Umbraco but if you look at the values in an job you will see they are not split in the UI. that does split out tag title/alt into its own values but only because most translation tools prefer these to be separate.
The XmlSerializer does the html splitting (so by
, etc) and it does it based on the Split Html setting in the connector options. With that setting off, nothing is split (so you do end up with lots of html in the text.
What to split on isn't really configurable at the moment - so its split or not :( . but we could look at this, do you have an example of the text you want to split and what you expect it to look like in the xliff ?
You could use a value mappers to remove the bits of html before it got to the xliff serializer. but you would have to code some way of putting it back into the translated text, not sure how that might work, (given the text will be translated).
<h2>Run your shipments <span class="underline">smoothly</span> from port to port.</h2>
It's exported as:
<group id="u1219-1-g" name="h2">
<unit id="u1219-1-1" name="#text">
<segment>
<source>Run your shipments </source>
</segment>
</unit>
<unit id="u1219-1-2" name="span">
<mda:metadata>
<mda:metaGroup id="span_attributes">
<mda:meta type="class">underline</mda:meta>
</mda:metaGroup>
</mda:metadata>
<segment>
<source>smoothly</source>
</segment>
</unit>
<unit id="u1219-1-3" name="#text">
<segment>
<source> from port to port.</source>
</segment>
</unit>
</group>
I'd like all spans with class=underline to be ignored, so it would be exported as:
<group id="u1219-1-g" name="h2">
<unit id="u1219-1-1" name="#text">
<segment>
<source>Run your shipments smoothly from port to port.</source>
</segment>
</unit>
</group>
I've managed to exclude the default mapper, although I think that replacing the HtmlDocumentMapper and removing all the spans from the whole block might be easier than trying to do it with a new InlineTagMapper. Not sure if that's the "right" way, but it seems like it should work?
Xliff does have the functionality to do this for well known tag types so for example with the underline tag :
<h2>Run your shipments <u>smoothly</u> from port to port.</h2>
the xliff will be:
<unit id="u3-1" name="h2">
<originalData>
<data id="d1"><u></data>
<data id="d2"></u></data>
</originalData>
<segment>
<source>Run your shipments <pc dataRefEnd="d2" dataRefStart="d1" id="1" subType="xlf:u" type="fmt">smoothly</pc> from port to port.</source>
</segment>
</unit>
In a translation tool (such as SDL Trados ) the translator sees this like below:
This lets the translator see the emphais and it maintains the formatting of the translation on the way back.
Spans are a little bit harder - because well they are generic and can be anything (from dividers to inline elements) - based on a lot of feedback we actually changed from not splitting on spans to splitting on spans, for this very reason. - However its still not ideal for everyone.
If we just removed the spans then you would loose the formatting on the returned translation - so that means there would be no underline coming back in, and its not possible to put it back because what happens if the underlined word is in a different place in the translation or it splits into two words for example ?
The possible solution is for us to split on the spans again . and this would result in the following xliff :
<unit id="u3-1" name="h2">
<originalData>
<data id="d1"><span class="underline"></data>
<data id="d2"></span></data>
</originalData>
<segment>
<source>Run your shipments <pc dataRefEnd="d2" dataRefStart="d1" id="1" type="fmt">smoothly</pc> from port to port.</source>
</segment>
</unit>
again in SDL Trados this shows:
this way the Translators can see what is in the span - and the formatting is preserved.
The only real downside with this is if you turn span splitting on - it will do this for all spans in your html - and there might be places where you actually want to split on the spans and that will no longer happen (its will very much be dependent on implementation/site code then).
So what we've done for the latest translation manager release is add the option for the xliff connector to add addtional 'inline' codes that you want Translation Manager to treat as inline on your site.
in this example you would add span to the translations.config / xliff provider section of the file :
Thanks for the new version. I'm just checking with our translators that it will work with their software, but I think it should. Splitting on all spans shouldn't be a problem for our content.
Overriding default mappers
I've got some content in RTEs (both in grids and not) which has some span tags in it for formatting. These tags are causing the content to be fragmented when exported to XLIFF, which is causing problems for the translators who would much prefer to have whole sentences to translate.
The formatting isn't necessary for any of the translations, so I'd like to ignore it on export. I'm trying to write a value mapper to do this, which I think should be used instead of the default Inline Tag Mapper. I've set the Editors like this:
But my GetSourceValue method never gets called when exporting. How can I set my value mapper to be used instead of the default one?
Hi Steve,
I've never done it - but the value mappers are collections so i belive you exclude the mapper you don't want
(all IValueMappers are loaded so yours should already be there).
however I don't think this is what you want to do (but i am not sure how you can do it)
The splitting of the text is done by the XmlSerializer - the value mappers get the text out of Umbraco but if you look at the values in an job you will see they are not split in the UI. that does split out tag title/alt into its own values but only because most translation tools prefer these to be separate.
The XmlSerializer does the html splitting (so by
, etc) and it does it based on the Split Html setting in the connector options. With that setting off, nothing is split (so you do end up with lots of html in the text.
What to split on isn't really configurable at the moment - so its split or not :( . but we could look at this, do you have an example of the text you want to split and what you expect it to look like in the xliff ?
You could use a value mappers to remove the bits of html before it got to the xliff serializer. but you would have to code some way of putting it back into the translated text, not sure how that might work, (given the text will be translated).
Hi Kevin,
Thanks for replying, that all makes sense.
This is some of the HTML I've got:
It's exported as:
I'd like all spans with class=underline to be ignored, so it would be exported as:
I've managed to exclude the default mapper, although I think that replacing the HtmlDocumentMapper and removing all the spans from the whole block might be easier than trying to do it with a new InlineTagMapper. Not sure if that's the "right" way, but it seems like it should work?
Hi Steve,
TL:DR - I think we have made this possible for you in the latest release of Translation manager with no custom code.
more detail :
Xliff does have the functionality to do this for well known tag types so for example with the underline tag :
the xliff will be:
In a translation tool (such as SDL Trados ) the translator sees this like below:
This lets the translator see the emphais and it maintains the formatting of the translation on the way back.
Spans are a little bit harder - because well they are generic and can be anything (from dividers to inline elements) - based on a lot of feedback we actually changed from not splitting on spans to splitting on spans, for this very reason. - However its still not ideal for everyone.
If we just removed the spans then you would loose the formatting on the returned translation - so that means there would be no underline coming back in, and its not possible to put it back because what happens if the underlined word is in a different place in the translation or it splits into two words for example ?
The possible solution is for us to split on the spans again . and this would result in the following xliff :
again in SDL Trados this shows:
this way the Translators can see what is in the span - and the formatting is preserved.
The only real downside with this is if you turn span splitting on - it will do this for all spans in your html - and there might be places where you actually want to split on the spans and that will no longer happen (its will very much be dependent on implementation/site code then).
So what we've done for the latest translation manager release is add the option for the xliff connector to add addtional 'inline' codes that you want Translation Manager to treat as inline on your site.
in this example you would add span to the translations.config / xliff provider section of the file :
(comma delimited - if you want to add more)
Hi Kevin,
Thanks for the new version. I'm just checking with our translators that it will work with their software, but I think it should. Splitting on all spans shouldn't be a problem for our content.
is working on a reply...