ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB EmailInput node help needed

Post new topic  Reply to topic Goto page 1, 2  Next
 IIB EmailInput node help needed « View previous topic :: View next topic » 
Author Message
marko.pitkanen
PostPosted: Mon Sep 19, 2016 12:50 am    Post subject: IIB EmailInput node help needed Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

How should I process emails with attachments?
If at the sending side attachment file is produced with UTF-16LE encoding
Code:
0000000 ff fe 65 00 6d 00 61 00 69 00 6c 00 09 00 52 00

IIB reads it into Root -tree with CCSID 1208
Code:

  (0x01000000:Name):Properties       = ( ['GENERICPROPERTYPARSER' : 0x7fd91b6c12e0]
    (0x03000000:NameValue):MessageSet             = '' (CHARACTER)
    (0x03000000:NameValue):MessageType            = '' (CHARACTER)
    (0x03000000:NameValue):MessageFormat          = '' (CHARACTER)
    (0x03000000:NameValue):Encoding               = 546 (INTEGER)
    (0x03000000:NameValue):CodedCharSetId         = 1208 (INTEGER)
.
.
.
  (0x01000000:Name):Part = (
   (0x03000000:NameValue):Content-Type                     = 'text/plain; name="Juhanatesti11.txt"' (CHARACTER)
   (0x03000000:NameValue):Content-Description              = 'testi11.txt' (CHARACTER)
   (0x03000000:NameValue):Content-Disposition              = 'attachment; filename="testi11.txt"; size=7690;creation-date="Mon, 19 Sep 2016 08:2
0:31 GMT";modification-date="Mon, 19 Sep 2016 08:20:31 GMT"' (CHARACTER)
   (0x03000000:NameValue):Content-ID                       = '<EA01B9D3465BB94384C26A9FA845DA4E@eurprd04.prod.outlook.com>' (CHARACTER)
   (0x03000000:NameValue):Content-Transfer-Encoding        = 'base64' (CHARACTER)
   (0x03000000:NameValue):X-Microsoft-Exchange-Diagnostics = '...' (CHARACTER)
   (0x01000000:Name     ):Data                             = (
     (0x01000000:Name):BLOB = ( ['none' : 0x7fd91b705860]
      (0x03000000:NameValue):BLOB = X'c3bfc3be65006d00610069006c00....


At least Byte Order Mark(c3bfc3be) seems to be changed some how?

What do I need to do to be able to write file into file system with FileOutput node?

At least by just setting
Code:
SET OutputRoot.BLOB.BLOB = rPart.*:Data.*:BLOB.*:BLOB;
it aint work.
Back to top
View user's profile Send private message Visit poster's website
smdavies99
PostPosted: Mon Sep 19, 2016 1:18 am    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Have you tried casting the BLOB (UTF16LE) to a BLOB (UTF- single byte character set before writing it to the file?

You don't say what does not work though. What goes wrong? Perhaps the above is wrong. I really don't know.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Mon Sep 19, 2016 1:31 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks smdavies99!

What doesn't work is that EDW side loads fails and they says that from threi point of view file is corrupted.

If I open file for example with Firefox it seems like this ÿþe�m�a�i�l�

I'll try next to cast it to the UTF-8 (CCSID 1208) before writing to the file.
Back to top
View user's profile Send private message Visit poster's website
timber
PostPosted: Mon Sep 19, 2016 1:56 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

Your problem statement is not making much sense:
Quote:
the sending side attachment file is produced with UTF-16LE encoding
...
IIB reads it into Root -tree with CCSID 1208
The message tree shows that IIB has parsed the attachment using the BLOB parser. So the BLOB in the message tree is exactly the same as the attachment, and CCSID (UTF16-LE) is not relevant.
Quote:
SET OutputRoot.BLOB.BLOB = rPart.*:Data.*:BLOB.*:BLOB;
This will simply copy the bytes from input to output. If you need to change the encoding of the text then I'm not surprised that you're not getting the required result.

So I agree with smdavies99. You probably need to CAST InputRoot.BLOB.BLOB to CHARACTER and then CAST the resulting text into the required output encoding.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Mon Sep 19, 2016 2:23 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks timber!

I'm sorry of my lack of understanding about the subject. What does CodedCharSetId define in this case then if it doesn't say anything about the payload?
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Mon Sep 19, 2016 4:15 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

The goal is to read emails and write attachments without modification to the file system.

Observed phenomenon is that in between sender's email client and IIB, attachment is some how re-encoded. For example BOM ain't the same as it is at the sender side. Another observasion is that if I download attachment from mail box for example with web GUI, the file content is the precise the same as it was at the sender's file system.
Back to top
View user's profile Send private message Visit poster's website
mqjeff
PostPosted: Mon Sep 19, 2016 4:22 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The CCID describes the meaning of the bytes if they are treated as characters.

Broker tends to transform messages into CCSID 1200 (1202?) during processing of the message flow. You can set this CCSID to another value before sending the data to an Output node. Then the character message data will be re-transformed into that CCSID.

When dealing with Email and attachments, the best idea is to use the MIME parser. This will treat all of the parts of the email message as blobs, and not do any transformation on them. You can then take the relevant blobs, and do what you need to them, including cast them as characters and manipulating them, forward the blob to an output node, or simply leave them alone.

What parser are you setting on the EmailInput node?
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Mon Sep 19, 2016 4:36 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks mqjeff!

If I have understood it right one can't set parser for EmailInput node. It uses MIME as default.
Back to top
View user's profile Send private message Visit poster's website
mqjeff
PostPosted: Mon Sep 19, 2016 4:46 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

marko.pitkanen wrote:
Thanks mqjeff!

If I have understood it right one can't set parser for EmailInput node. It uses MIME as default.


Okay. If it won't let you set it, it won't let you set ti. I haven't worked with it in a while...
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
timber
PostPosted: Mon Sep 19, 2016 5:21 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

I see what you mean about the BOM. It has changed from FFFE to C3BFC3BE.
But C3BFC3BE is not a valid BOM : https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

I think you need to look carefully at the settings on the EmailInput node and check whether you have set any properties relating to encoding.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Mon Sep 19, 2016 11:50 pm    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

This seems to be related to the MIME type of the attachment.

If I fake Outlook client to think that attachment is not plain/text than a Word document then bytes are untouched / correct

Code:
      (0x01000000:Name):Part = (
        (0x03000000:NameValue):Content-Type                     = 'application/octet-stream; name="testi11.txt.docs"' (CHARACTER)
        (0x03000000:NameValue):Content-Description              = 'testi11.txt.docs' (CHARACTER)
        (0x03000000:NameValue):Content-Disposition              = 'attachment; filename="testi11.txt.docs"; size=7690;creation-date="Tue, 20 Sep 2016
 07:43:13 GMT";modification-date="Tue, 20 Sep 2016 07:43:13 GMT"' (CHARACTER)
        (0x03000000:NameValue):Content-ID                       = '<...>' (CHARACTER)
        (0x03000000:NameValue):Content-Transfer-Encoding        = 'base64' (CHARACTER)
        (0x03000000:NameValue):X-Microsoft-Exchange-Diagnostics = '...' (CHARAC
TER)
        (0x01000000:Name     ):Data                             = (
          (0x01000000:Name):BLOB = ( ['none' : 0x7fd944042a40]
            (0x03000000:NameValue):BLOB = X'fffe65006d00610069006c00090052006500730070006f006e00640065006e007400730020006e0061006d0065000900500072006500660
065007200720065006400200063006f006e00740061006300740020006d00650074
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Tue Sep 20, 2016 12:42 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

If I cast bytes(when Content-Type is text/plain) to characters with assosiated CCSID and then cast characters to the BLOB with UTF16LE CCSID I get almost correct format
Code:
SET OutputRoot.BLOB.BLOB = CAST(CAST(rPart.*:Data.*:BLOB.*:BLOB AS CHAR CCSID InputRoot.Properties.CodedCharSetId) AS BLOB CCSID 1202);                       
SET OutputRoot.Properties.CodedCharSetId = 1202;

ff00fe00650000006d0000006100000069000000

Removing those extra '00' bytes with while loop is of course too expensive/time consuming.
Back to top
View user's profile Send private message Visit poster's website
marko.pitkanen
PostPosted: Tue Sep 20, 2016 2:01 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Removing "Extra" '00' bytes with ESQL
Code:
--            SET bDest = SUBSTRING(bTemp FROM iI FOR 1);
--            SET iI = iI + 2;
--            WHILE iI < iL DO
--               SET bDest = bDest || SUBSTRING(bTemp FROM iI FOR 1);
--               SET iI = iI + 2;
--            END WHILE;   

was time consuming.

But with simple java method it takes only few hundredth of a second to do it.

Code:
    public static byte[] Remove(byte[] inBlob)
    {
        try
        {
           int len = inBlob.length;
           byte[] data = new byte[len / 2];
           for (int i = 0; i < len / 2; i ++) {
                data[i] = inBlob[i*2];
            }
            return data;
.
.
.
CREATE FUNCTION  removeExtraBytes( IN iBLOB BLOB )
 RETURNS BLOB
 LANGUAGE JAVA
 EXTERNAL NAME "removeExtraBytes.Remove";
.
.
.
SET bDest = removeExtraBytes(bTemp);
SET OutputRoot.BLOB.BLOB = bDest;



So I have a sufficiently good workaround.
Back to top
View user's profile Send private message Visit poster's website
timber
PostPosted: Tue Sep 20, 2016 2:17 am    Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1280

You should not try to decode InputRoot.BLOB.BLOB using InputRoot.Properties.CodedCharSetId. As you noted at the start of this thread:
- InputRoot.Properties.CodedCharSetId is 1208 ( UTF-8 ).
- InputRoot.BLOB.BLOB contains text that is encoded in UTF-16.

The solution is not a loop that removes the zero bytes. That's dangerous as well as inefficient. What happens if the input contains a character that requires the top byte to be non-zero? You are effectively assuming that you will only ever receive characters that are in the first 256 code points of UTF-16.

The correct solution is either:
a) Find out why InputRoot.Properties.CodedCharSetId is being set incorrectly. Fix that, and then continue with your currrent strategy.
or
b) Decide that there's no way to make the sender play nicely, and use a hard-coded CCSID in your ESQL instead of using InputRoot.Properties.CodedCharSetId. ( and document the assumption )

b) may require you to lop off the corrupted BOM in order to get a valid character stream for the first CAST. Obviously, a) is much the best solution. If you can crack that then you may even find that the corruption of the BOM stops happening.
Back to top
View user's profile Send private message
marko.pitkanen
PostPosted: Tue Sep 20, 2016 2:34 am    Post subject: Reply with quote

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks timber!

I have risen a PMR/ESR for this phenomenon.

For me it seem that if Content-Type is text/plain IIB assumes it can do UFT-8 conversion for the attachment and bytes in BLOB are not anymore UTF-16LE encoded.

Lets see what I can get out from PMR.
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB EmailInput node help needed
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.