ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexWebSphere Message Broker (ACE) SupportEmail. Parsing cyrillic in attachment.

Post new topicReply to topic Goto page 1, 2  Next
Email. Parsing cyrillic in attachment. View previous topic :: View next topic
Author Message
Sosed
PostPosted: Mon Dec 03, 2012 5:56 am Post subject: Email. Parsing cyrillic in attachment. Reply with quote

Apprentice

Joined: 24 Aug 2012
Posts: 43

Hello!

I'm using MB 7.0.0.2.
I trying to process emails by MB. But I can't get well-parsed attachment with Cyrillic symbols.

I have created simply message flow for tests:
EMailInput node -> FileTrace node -> EMailOutput node
EMailInput node has settings:
Basic.Email server = ...
Polling.Polling Interval = 5
Security.Security Identity = ...
Retry.Retry mechanism = Failure
Transactions.Transaction mode = Yes
EMailOutput node has settings:
Basic.SMTP Server and Port = ...
Email.To Addresses = ...
Email.From Address = ...
Email.Subject of email = тест
Body Content Type = text/plain
Security.Security Identity = ...

When I send email (plain text. latin and cyrillic symbols. utf8) with attachment (xml file. has latin and cyrillic symbols in name. has latin and cyrillic symbols inside. utf8), message flow sends it back to my address.
Then, I get email in ms outlook, and it's attachment body has strange characters instead of cyrillic (latin characters hasn't changed). Cyrillic symbols in name of attachment hasn't changed, and cyrillic symbols at email body/subject has no changes.

Why is it happens?
What should i do to fix it?


I have found this article about code pages.
So, if mb has converted bytes into ucs-2 and has changed cyrillic symbols in body of attachment, why hasn't cyrillic in email body changed? Or in name of attachment?

Sending file name is тест_utf8.xml.
Sending file body is:
Code:
<tns:BuildReport xmlns:tns="http://www.test.ru/sov">
   <tns:MortgageNo>привет</tns:MortgageNo>   
   <tns:Date>2001-10-26T21:32:52</tns:Date>
</tns:BuildReport>


Receiving file name is тест_utf8.xml
Receiving file body is:
Code:
<tns:BuildReport xmlns:tns="http://www.test.ru/sov">
   <tns:MortgageNo>привет</tns:MortgageNo>   
   <tns:Date>2001-10-26T21:32:52</tns:Date>
</tns:BuildReport>


Trace at message flow:
Code:
( ['GENERICROOT' : 0x2aaac4078090]
  (0x01000000:Name):Properties       = ( ['GENERICPROPERTYPARSER' : 0x2aaac43d59e0]
    (0x03000000:NameValue):MessageSet             = '' (CHARACTER)
    (0x03000000:NameValue):MessageType            = '' (CHARACTER)
    (0x03000000:NameValue):MessageFormat          = '' (CHARACTER)
    (0x03000000:NameValue):Encoding               = 546 (INTEGER)
    (0x03000000:NameValue):CodedCharSetId         = 1208 (INTEGER)
    (0x03000000:NameValue):Transactional          = TRUE (BOOLEAN)
    (0x03000000:NameValue):Persistence            = FALSE (BOOLEAN)
    (0x03000000:NameValue):CreationTime           = GMTTIMESTAMP '2012-12-03 11:06:12.139252' (GMTTIMESTAMP)
    (0x03000000:NameValue):ExpirationTime         = -1 (INTEGER)
    (0x03000000:NameValue):Priority               = 0 (INTEGER)
    (0x03000000:NameValue):ReplyIdentifier        = X'' (BLOB)
    (0x03000000:NameValue):ReplyProtocol          = 'UNKNOWN' (CHARACTER)
    (0x03000000:NameValue):Topic                  = NULL
    (0x03000000:NameValue):ContentType            = 'multipart/mixed; boundary=_002_53B441659592BE4A951950120482A560014751mb01ahml1ru_' (CHARACTER)
    (0x03000000:NameValue):IdentitySourceType     = '' (CHARACTER)
    (0x03000000:NameValue):IdentitySourceToken    = '' (CHARACTER)
    (0x03000000:NameValue):IdentitySourcePassword = '' (CHARACTER)
    (0x03000000:NameValue):IdentitySourceIssuedBy = '' (CHARACTER)
    (0x03000000:NameValue):IdentityMappedType     = '' (CHARACTER)
    (0x03000000:NameValue):IdentityMappedToken    = '' (CHARACTER)
    (0x03000000:NameValue):IdentityMappedPassword = '' (CHARACTER)
    (0x03000000:NameValue):IdentityMappedIssuedBy = '' (CHARACTER)
  )
  (0x01000000:Name):EmailInputHeader = ( ['EMAILIHD' : 0x2aaac402d020]
    (0x03000000:NameValue):To       = '...' (CHARACTER)
    (0x03000000:NameValue):From     = '...' (CHARACTER)
    (0x03000000:NameValue):Subject  = 'тест проверка тест ' (CHARACTER)
    (0x03000000:NameValue):Size     = 27290 (INTEGER)
    (0x03000000:NameValue):SentDate = TIMESTAMP '2012-12-03 14:06:42' (TIMESTAMP)
  )
  (0x01000000:Name):MIME             = ( ['MIME' : 0x2aaac4054620]
    (0x03000000:NameValue):Content-Type = 'multipart/mixed; boundary=_002_53B441659592BE4A951950120482A560014751mb01ahml1ru_' (CHARACTER)
    (0x01000000:Name     ):Parts        = (
      (0x01000000:Name):Part = (
        (0x03000000:NameValue):Content-Type = 'text/html; charset=utf-8' (CHARACTER)
        (0x01000000:Name     ):Data         = (
          (0x01000000:Name):BLOB = ( ['none' : 0x2aaac40d8bb0]
            (0x03000000:NameValue):BLOB = X'3c6874...' (BLOB)
          )
        )
      )
      (0x01000000:Name):Part = (
        (0x03000000:NameValue):Content-Type = 'text/xml; name="=?utf-8?B?0YLQtdGB0YJfdXRmOC54bWw=?="' (CHARACTER)
        (0x01000000:Name     ):Data         = (
          (0x01000000:Name):BLOB = ( ['none' : 0x2aaac40b7ac0]
            (0x03000000:NameValue):BLOB = X'c3af...' (BLOB)
          )
        )
      )
    )
  )
)

________________________________________________________________________________________________________________________________________________________________________


Guys, any suggestions?
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Mon Dec 03, 2012 5:58 am Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

You need to set your CCSID properly. You also need to update your toolkit and runtime to modern versions.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
Sosed
PostPosted: Mon Dec 03, 2012 6:12 am Post subject: Reply with quote

Apprentice

Joined: 24 Aug 2012
Posts: 43

lancelotlinc wrote:
You also need to update your toolkit and runtime to modern versions.


Thanks for your advise, lancelotlinc!

lancelotlinc wrote:
You need to set your CCSID properly.


It allready has value 1208. It is right CCSID value for utf-8. Or am I wrong?
Sosed wrote:
(0x03000000:NameValue):CodedCharSetId = 1208 (INTEGER)
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Mon Dec 03, 2012 6:18 am Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

kimbert or mqjeff may know better than I, but if you want to process Cyrillic characters, you may want the 897 code page or 896.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
mqjeff
PostPosted: Mon Dec 03, 2012 6:41 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you are putting in Cyrillic characters, the CCSID needs to be one that allows Cyrillic characters.

If you are putting in UTF-8 characters, the CCSID needs to be UTF-8, which I think is 1200 (not 1208)...

You will need to set this on the EmailInputHeader, and on each Mime part.

Note that your article about code-page converters only applies to cases where you have allowed the Broker to convert things for you, and not to cases where you have specifically done things that request or require conversion (ASBITSTREAM, CAST AS CHAR, etc)
Back to top
View user's profile Send private message
kimbert
PostPosted: Mon Dec 03, 2012 7:25 am Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

The CCSID for UTF-8 is 1208.

This article about character encoding is worth reading, even though it says nothing at all about WMB : http://www.joelonsoftware.com/articles/Unicode.html
Back to top
View user's profile Send private message
smdavies99
PostPosted: Mon Dec 03, 2012 11:50 am Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

kimbert wrote:
The CCSID for UTF-8 is 1208.

This article about character encoding is worth reading, even though it
says nothing at all about WMB : http://www.joelonsoftware.com/articles/Unicode.html


Do you get royalties each time you quote this excellent article?

_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
kimbert
PostPosted: Mon Dec 03, 2012 12:21 pm Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Nope - maybe I should ask Joel to consider it, though!

I quote it mainly to save my own time - encoding issues seem to make up a fair proportion of the questions on this and other forums, and I reckon one careful reading of that article should set anybody straight.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue Dec 04, 2012 2:12 am Post subject: Re: Email. Parsing cyrillic in attachment. Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1415

Sosed wrote:
Trace at message flow:
Code:
      (0x01000000:Name):Part = (
        (0x03000000:NameValue):Content-Type = 'text/xml; name="=?utf-8?B?0YLQtdGB0YJfdXRmOC54bWw=?="' (CHARACTER)
        (0x01000000:Name     ):Data         = (
          (0x01000000:Name):BLOB = ( ['none' : 0x2aaac40b7ac0]
            (0x03000000:NameValue):BLOB = X'c3af...' (BLOB)
          )
        )

The "Content-Type" element needs to specify "charset=utf-8".

lancelotlinc wrote:
kimbert or mqjeff may know better than I, but if you want to process Cyrillic characters, you may want the 897 code page or 896.

No, those are Japanese.
Back to top
View user's profile Send private message
Sosed
PostPosted: Wed Dec 05, 2012 2:42 am Post subject: Reply with quote

Apprentice

Joined: 24 Aug 2012
Posts: 43

Firstly, I have added compute node and I have tried to converts attachment using different code pages (ccsid description: IBM. Coded character set identifiers). My file is utf-8. I tried this :
Quote:
DECLARE clob1 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1252); --MS-WIN LATIN-1
DECLARE clob2 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1251); --MS-WIN CYRILLIC
DECLARE clob3 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1250); --MS-WIN LATIN-2
DECLARE clob4 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1237); --UTF-32
DECLARE clob5 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1236); --UTF-32 with IBM PUA
DECLARE clob6 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1235); --UTF-32 LE
DECLARE clob7 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1234); --UTF-32 LE with IBM PUA
DECLARE clob8 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1233); --UTF-32 BE
DECLARE clob9 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1232); --UTF-32 BE with IBM PUA
DECLARE clob10 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1211); --UTF-EBCDIC
DECLARE clob11 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1210); --UTF-EBCDIC with IBM PUA
DECLARE clob12 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1209); --UTF-8
DECLARE clob13 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1208); --UTF-8 with IBM PUA
DECLARE clob14 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1205); --UTF-16
DECLARE clob15 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1204); --UTF-16 with IBM PUA
DECLARE clob16 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1203); --UTF-16 LE
DECLARE clob17 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1202); --UTF-16 LE with IBM PUA
DECLARE clob18 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1201); --UTF-16 BE
DECLARE clob19 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1200); --UTF-16 BE with IBM PUA
DECLARE clob20 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1174); --Windows Cyrillic (Kazakhstan)
DECLARE clob21 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1168); --KOI8-U
DECLARE clob23 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1167); --KOI8-RU
DECLARE clob24 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1158); --UKRAINE EBCDIC
DECLARE clob25 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1154); --CYRILLIC EBCDIC
DECLARE clob26 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1025); --CYRILLIC EBCDIC
DECLARE clob27 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 921); --ISO 8859-13
DECLARE clob28 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 920); --ISO 8859-9 ASCII
DECLARE clob29 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 916); --ISO 8859-8 ASCII
DECLARE clob30 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 915); --ISO 8859-5 ASCII
DECLARE clob31 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 914); --ISO 8859-4 ASCII
DECLARE clob32 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 913); --ISO 8859-3 ASCII
DECLARE clob33 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 912); --ISO 8859-2 ASCII
DECLARE clob34 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 880); --CYRILLIC EBCDIC
DECLARE clob35 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 878); --KOI8-R CYRILLIC
DECLARE clob36 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 872); --CYRILLIC PC-DATA
DECLARE clob37 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 870); --LATIN-2 EBCDIC
DECLARE clob38 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 866); --CYRILLIC PC-DATA
DECLARE clob39 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 855); --CYRILLIC PC-DATA
DECLARE clob40 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 819); --ISO 8859-1 ASCII. Error creating converter
DECLARE clob41 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 808); --CYRILLIC PC-DATA
DECLARE clob42 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 437); --USA PC-DATA
DECLARE clob43 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 437); --USA PC-DATA


No one "CAST" could parse my file. This means that problem is not in header of sending email-message.
Then I have seen kimbert reply, and have read article about Unicode. Thanks, kimbert

After reading I looked at my files encoding. My old file was "UTF-8", my new file was "ANSI as UTF-8" (via notepad++). I looked at hex representation of my files.
Sended file:
Code:
00000000 EFBBBF3C 746E733A 4275696C 64526570
00000016 6F727420 786D6C6E 733A746E 733D2268
00000032 7474703A 2F2F7777 772E6168 6D6C2E72
00000048 752F736F 76223E0D 0A093C74 6E733A4D
00000064 6F727467 6167654E 6F3ED0BF D180D0B8
00000080 D0B2D0B5 D1823C2F 746E733A 4D6F7274
00000096 67616765 4E6F3E09 0D0A093C 746E733A
00000112 44617465 3E323030 312D3130 2D323654
00000128 32313A33 323A3532 3C2F746E 733A4461
00000144 74653E0D 0A3C2F74 6E733A42 75696C64
00000160 5265706F 72743E

Received file:
Code:
00000000 C3AFC2BB C2BF3C74 6E733A42 75696C64
00000016 5265706F 72742078 6D6C6E73 3A746E73
00000032 3D226874 74703A2F 2F777777 2E61686D
00000048 6C2E7275 2F736F76 223E0D0A 093C746E
00000064 733A4D6F 72746761 67654E6F 3EC390C2
00000080 BFC391C2 80C390C2 B8C390C2 B2C390C2
00000096 B5C391C2 823C2F74 6E733A4D 6F727467
00000112 6167654E 6F3E090D 0A093C74 6E733A44
00000128 6174653E 32303031 2D31302D 32365432
00000144 313A3332 3A35323C 2F746E73 3A446174
00000160 653E0D0A 3C2F746E 733A4275 696C6452
00000176 65706F72 743E


BOM and Cyrillic characters have changed. One byte in sended file become two bytes in received. UTF-8 BOM (EFBBBF) become C3AFC2BB C2BF, and i couldn't find out what encoding has such BOM.
I have found on this web-site just shot description:
Quote:
The UTF-8 representation of the BOM is the byte sequence 0xEF,0xBB,0xBF. A text editor or web browser interpreting the text as ISO-8859-1 or CP1252 will display the characters  for this.

If, instead, you see this hex: C3AFC2BBC2BF, then you are also suffering from double encoding.

If you encounter this (presumably at the start of a file), it implies that your editor is adding this, but the the reader of the file (eg, mysqldump) is does not know what to do with it. Consider using a different editor.

This artile says, that file in "ANSI as UTF-8" if file had been saved in utf-8, but has no BOM.
So, I have decided to use UTF-16 (wich has 2 bytes for one character). And it works fine.

But I couldn't understand, why I can't parse attachemnt? (even if i am using right CCSID 1208 for utf-8 ). Because byte order mark has changed into C3AFC2BB C2BF and wmb has interpreted it like file has no BOM? But why has it changed? Guys, please, could you give me your suggestions?
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Dec 05, 2012 4:47 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Kimbert is going to tell me I'm wrong, but try 1209 instead of 1208.
http://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html
Back to top
View user's profile Send private message
Sosed
PostPosted: Wed Dec 05, 2012 5:09 am Post subject: Reply with quote

Apprentice

Joined: 24 Aug 2012
Posts: 43

mqjeff wrote:
Kimbert is going to tell me I'm wrong, but try 1209 instead of 1208.
http://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html


Yes, I have allready tried this.

Sosed wrote:
DECLARE clob12 CHARACTER CAST(clobRef.Data.BLOB.BLOB AS CHAR CCSID 1209); --UTF-8


It doesn't work.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Dec 05, 2012 5:25 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Well, then you don't need Kimbert to tell me I'm wrong. You've done so very adequately yourself.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Dec 05, 2012 11:07 am Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

How do you create the BLOBs you're passing on to the email node?
Are you making sure first that the BLOBs in question are in CCSID 1208?
What is the incoming CCSID of the data?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Sosed
PostPosted: Wed Dec 05, 2012 9:51 pm Post subject: Reply with quote

Apprentice

Joined: 24 Aug 2012
Posts: 43

fjb_saper wrote:
How do you create the BLOBs you're passing on to the email node?
Are you making sure first that the BLOBs in question are in CCSID 1208?
What is the incoming CCSID of the data?


I have created a file (utf-8 ) via notepad++. I have created an email message using utf-8 encoding and attach this file. Then, I sent the email.

WMB have cought my email (I can see it in a debug mode). Parsed message example you can see in my first message in this disscussion (FileTrace node right after EmailInput node).
As you can see, it's header has CCSID, email body has "Content-Type", email attachment has no "Content-Type".
Then, in a compute node i tried to parse attachment (BLOB) using CAST(... CCSID ...). I have created 42 variables for 42 different CCSID. I have used this article about CCSID - http://www-01.ibm.com/software/globalization/ccsid/ccsid_registered.html.
No one "CAST" couldn't parse my cyrillic characters in my attachment (some "CAST" couldn't parse even latin characters, some rose exceptions like "Error creating converter", "unconvertable chracter").

Only if I use UTF-16 (UCS-2 BE), when I create a file, it works fine. I can parse attachment using CCSID 1201.

But it doesn't work with UTF-8.
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum IndexWebSphere Message Broker (ACE) SupportEmail. Parsing cyrillic in attachment.
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.