MQSeries.net :: View topic - IIB EmailInput node help needed

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB EmailInput node help needed

Goto page 1, 2 Next

IIB EmailInput node help needed

« View previous topic :: View next topic »

Author

Message

marko.pitkanen

Posted: Mon Sep 19, 2016 12:50 am Post subject: IIB EmailInput node help needed

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

How should I process emails with attachments?
If at the sending side attachment file is produced with UTF-16LE encoding

Code:

0000000 ff fe 65 00 6d 00 61 00 69 00 6c 00 09 00 52 00

IIB reads it into Root -tree with CCSID 1208

Code:

(0x01000000:Name):Properties = ( ['GENERICPROPERTYPARSER' : 0x7fd91b6c12e0]
(0x03000000:NameValue):MessageSet = '' (CHARACTER)
(0x03000000:NameValue):MessageType = '' (CHARACTER)
(0x03000000:NameValue):MessageFormat = '' (CHARACTER)
(0x03000000:NameValue):Encoding = 546 (INTEGER)
(0x03000000:NameValue):CodedCharSetId = 1208 (INTEGER)
.
.
.
(0x01000000:Name):Part = (
(0x03000000:NameValue):Content-Type = 'text/plain; name="Juhanatesti11.txt"' (CHARACTER)
(0x03000000:NameValue):Content-Description = 'testi11.txt' (CHARACTER)
(0x03000000:NameValue):Content-Disposition = 'attachment; filename="testi11.txt"; size=7690;creation-date="Mon, 19 Sep 2016 08:2
0:31 GMT";modification-date="Mon, 19 Sep 2016 08:20:31 GMT"' (CHARACTER)
(0x03000000:NameValue):Content-ID = '<EA01B9D3465BB94384C26A9FA845DA4E@eurprd04.prod.outlook.com>' (CHARACTER)
(0x03000000:NameValue):Content-Transfer-Encoding = 'base64' (CHARACTER)
(0x03000000:NameValue):X-Microsoft-Exchange-Diagnostics = '...' (CHARACTER)
(0x01000000:Name ):Data = (
(0x01000000:Name):BLOB = ( ['none' : 0x7fd91b705860]
(0x03000000:NameValue):BLOB = X'c3bfc3be65006d00610069006c00....

At least Byte Order Mark(c3bfc3be) seems to be changed some how?

What do I need to do to be able to write file into file system with FileOutput node?

At least by just setting

Code:

SET OutputRoot.BLOB.BLOB = rPart.*:Data.*:BLOB.*:BLOB;

it aint work.

smdavies99

Posted: Mon Sep 19, 2016 1:18 am Post subject:

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Have you tried casting the BLOB (UTF16LE) to a BLOB (UTF-

single byte character set before writing it to the file?

You don't say what does not work though. What goes wrong? Perhaps the above is wrong. I really don't know.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.

marko.pitkanen

Posted: Mon Sep 19, 2016 1:31 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks smdavies99!

What doesn't work is that EDW side loads fails and they says that from threi point of view file is corrupted.

If I open file for example with Firefox it seems like this ÃƒÂ¿ÃƒÂ¾e�m�a�i�l�

I'll try next to cast it to the UTF-8 (CCSID 1208) before writing to the file.

timber

Posted: Mon Sep 19, 2016 1:56 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Your problem statement is not making much sense:

Quote:

the sending side attachment file is produced with UTF-16LE encoding
...
IIB reads it into Root -tree with CCSID 1208

The message tree shows that IIB has parsed the attachment using the BLOB parser. So the BLOB in the message tree is exactly the same as the attachment, and CCSID (UTF16-LE) is not relevant.

Quote:

SET OutputRoot.BLOB.BLOB = rPart.*:Data.*:BLOB.*:BLOB;

This will simply copy the bytes from input to output. If you need to change the encoding of the text then I'm not surprised that you're not getting the required result.

So I agree with smdavies99. You probably need to CAST InputRoot.BLOB.BLOB to CHARACTER and then CAST the resulting text into the required output encoding.

marko.pitkanen

Posted: Mon Sep 19, 2016 2:23 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks timber!

I'm sorry of my lack of understanding about the subject. What does CodedCharSetId define in this case then if it doesn't say anything about the payload?

marko.pitkanen

Posted: Mon Sep 19, 2016 4:15 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

The goal is to read emails and write attachments without modification to the file system.

Observed phenomenon is that in between sender's email client and IIB, attachment is some how re-encoded. For example BOM ain't the same as it is at the sender side. Another observasion is that if I download attachment from mail box for example with web GUI, the file content is the precise the same as it was at the sender's file system.

mqjeff

Posted: Mon Sep 19, 2016 4:22 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The CCID describes the meaning of the bytes if they are treated as characters.

Broker tends to transform messages into CCSID 1200 (1202?) during processing of the message flow. You can set this CCSID to another value before sending the data to an Output node. Then the character message data will be re-transformed into that CCSID.

When dealing with Email and attachments, the best idea is to use the MIME parser. This will treat all of the parts of the email message as blobs, and not do any transformation on them. You can then take the relevant blobs, and do what you need to them, including cast them as characters and manipulating them, forward the blob to an output node, or simply leave them alone.

What parser are you setting on the EmailInput node?
_________________
chmod -R ugo-wx /

marko.pitkanen

Posted: Mon Sep 19, 2016 4:36 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks mqjeff!

If I have understood it right one can't set parser for EmailInput node. It uses MIME as default.

mqjeff

Posted: Mon Sep 19, 2016 4:46 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

marko.pitkanen wrote:

Thanks mqjeff!

If I have understood it right one can't set parser for EmailInput node. It uses MIME as default.

Okay. If it won't let you set it, it won't let you set ti. I haven't worked with it in a while...
_________________
chmod -R ugo-wx /

timber

Posted: Mon Sep 19, 2016 5:21 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

I see what you mean about the BOM. It has changed from FFFE to C3BFC3BE.
But C3BFC3BE is not a valid BOM : https://en.wikipedia.org/wiki/Byte_order_mark#Representations_of_byte_order_marks_by_encoding

I think you need to look carefully at the settings on the EmailInput node and check whether you have set any properties relating to encoding.

marko.pitkanen

Posted: Mon Sep 19, 2016 11:50 pm Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Hi,

This seems to be related to the MIME type of the attachment.

If I fake Outlook client to think that attachment is not plain/text than a Word document then bytes are untouched / correct

Code:

(0x01000000:Name):Part = (
(0x03000000:NameValue):Content-Type = 'application/octet-stream; name="testi11.txt.docs"' (CHARACTER)
(0x03000000:NameValue):Content-Description = 'testi11.txt.docs' (CHARACTER)
(0x03000000:NameValue):Content-Disposition = 'attachment; filename="testi11.txt.docs"; size=7690;creation-date="Tue, 20 Sep 2016
07:43:13 GMT";modification-date="Tue, 20 Sep 2016 07:43:13 GMT"' (CHARACTER)
(0x03000000:NameValue):Content-ID = '<...>' (CHARACTER)
(0x03000000:NameValue):Content-Transfer-Encoding = 'base64' (CHARACTER)
(0x03000000:NameValue):X-Microsoft-Exchange-Diagnostics = '...' (CHARAC
TER)
(0x01000000:Name ):Data = (
(0x01000000:Name):BLOB = ( ['none' : 0x7fd944042a40]
(0x03000000:NameValue):BLOB = X'fffe65006d00610069006c00090052006500730070006f006e00640065006e007400730020006e0061006d0065000900500072006500660
065007200720065006400200063006f006e00740061006300740020006d00650074

marko.pitkanen

Posted: Tue Sep 20, 2016 12:42 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

If I cast bytes(when Content-Type is text/plain) to characters with assosiated CCSID and then cast characters to the BLOB with UTF16LE CCSID I get almost correct format

Code:

SET OutputRoot.BLOB.BLOB = CAST(CAST(rPart.*:Data.*:BLOB.*:BLOB AS CHAR CCSID InputRoot.Properties.CodedCharSetId) AS BLOB CCSID 1202);
SET OutputRoot.Properties.CodedCharSetId = 1202;

ff00fe00650000006d0000006100000069000000

Removing those extra '00' bytes with while loop is of course too expensive/time consuming.

marko.pitkanen

Posted: Tue Sep 20, 2016 2:01 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Removing "Extra" '00' bytes with ESQL

Code:

--          SET bDest = SUBSTRING(bTemp FROM iI FOR 1);
--          SET iI = iI + 2;
--          WHILE iI < iL DO
--             SET bDest = bDest || SUBSTRING(bTemp FROM iI FOR 1);
--             SET iI = iI + 2;
--          END WHILE;

was time consuming.

But with simple java method it takes only few hundredth of a second to do it.

Code:

public static byte[] Remove(byte[] inBlob)
{
try
{
int len = inBlob.length;
byte[] data = new byte[len / 2];
for (int i = 0; i < len / 2; i ++) {
data[i] = inBlob[i*2];
}
return data;
.
.
.
CREATE FUNCTION removeExtraBytes( IN iBLOB BLOB )
RETURNS BLOB
LANGUAGE JAVA
EXTERNAL NAME "removeExtraBytes.Remove";
.
.
.
SET bDest = removeExtraBytes(bTemp);
SET OutputRoot.BLOB.BLOB = bDest;

So I have a sufficiently good workaround.

timber

Posted: Tue Sep 20, 2016 2:17 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

You should not try to decode InputRoot.BLOB.BLOB using InputRoot.Properties.CodedCharSetId. As you noted at the start of this thread:
- InputRoot.Properties.CodedCharSetId is 1208 ( UTF-8 ).
- InputRoot.BLOB.BLOB contains text that is encoded in UTF-16.

The solution is not a loop that removes the zero bytes. That's dangerous as well as inefficient. What happens if the input contains a character that requires the top byte to be non-zero? You are effectively assuming that you will only ever receive characters that are in the first 256 code points of UTF-16.

The correct solution is either:
a) Find out why InputRoot.Properties.CodedCharSetId is being set incorrectly. Fix that, and then continue with your currrent strategy.
or
b) Decide that there's no way to make the sender play nicely, and use a hard-coded CCSID in your ESQL instead of using InputRoot.Properties.CodedCharSetId. ( and document the assumption )

b) may require you to lop off the corrupted BOM in order to get a valid character stream for the first CAST. Obviously, a) is much the best solution. If you can crack that then you may even find that the corruption of the BOM stops happening.

marko.pitkanen

Posted: Tue Sep 20, 2016 2:34 am Post subject:

Chevalier

Joined: 23 Jul 2008
Posts: 440
Location: Jamsa, Finland

Thanks timber!

I have risen a PMR/ESR for this phenomenon.

For me it seem that if Content-Type is text/plain IIB assumes it can do UFT-8 conversion for the attachment and bytes in BLOB are not anymore UTF-16LE encoded.

Lets see what I can get out from PMR.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » IIB EmailInput node help needed

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP