Author |
Message
|
goffinf |
Posted: Tue Jan 31, 2012 6:45 am Post subject: Base64 Decode - Efficiency |
|
|
Chevalier
Joined: 05 Nov 2005 Posts: 401
|
version: 6.1.0.9
I have a flow which retrieves emails from an server. Some of these have attachments. The attachments are Base64 encoded.
Thats not a big problem EXCEPT when the attachments are large (ish) lets say 5MB.
I am using this code to do the Base64 Decode but it can take a couple of seconds (admittedly on a laptop, but even still) ... taking a user trace the line below is easily the most compute intensive part
Is there a more efficient method that I could try ???
... (I have noted that the XMLNSC parser supports Base64 encoding/decoding but I couldn't find an example of Decoding - not sure if that would help me but ...)
Code: |
-- Replace the existing Part data with the Base64 DEcoded version
-- WOW this can take a bunch of time on a 5MB attachment !
SET Part.Data.BLOB.BLOB = base64Decode(CAST(Part.Data.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId));
...
CREATE PROCEDURE base64Decode(IN msgChar CHARACTER) RETURNS BLOB
LANGUAGE JAVA
EXTERNAL NAME "com.ibm.broker.javacompute.Base64.decode";
|
|
|
Back to top |
|
 |
kimbert |
Posted: Tue Jan 31, 2012 6:56 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I have noted that the XMLNSC parser supports Base64 encoding/decoding but I couldn't find an example of Decoding |
No example required, because it is all done by the parser. All you need to do is tell the parser that the element in question is of type xsd:base64Binary. You do that using an xsd. When you parse the XML document the XMLNSC parser recognises the element, automatically decodes the base64 string and puts a BLOB into the message tree. Of course, in order to exploit this feature, you must
a) be using data that is in the XML format
b) have an xsd that describes the XML
c) switch on validation in your message flow
d) tell the XMLNSC parser to build the message tree using XML Schema types ( which is not the default setting, unlike in the SOAP domain ).
Not sure whether your email message is an XML message, but at least you have the facts at your disposal now. You may be able to use CREATE..PARSE provided that at least the base64 part of the email is XML. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jan 31, 2012 6:57 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Separate the call to the Java code from the CAST as character, just for performance analysis testing.
Take note of the fact that this operation does have to touch every single byte of the blob value, and thus scales linearly with the size of that blob. |
|
Back to top |
|
 |
goffinf |
Posted: Tue Jan 31, 2012 7:00 am Post subject: |
|
|
Chevalier
Joined: 05 Nov 2005 Posts: 401
|
kimbert wrote: |
Not sure whether your email message is an XML message, but at least you have the facts at your disposal now. You may be able to use CREATE..PARSE provided that at least the base64 part of the email is XML. |
Thanks. Unfortunately the attachments are mostly jpeg and such like, so no dice there I guess. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jan 31, 2012 7:49 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Unfortunately the attachments are mostly jpeg and such like, so no dice there I guess. |
Well...of course. That's why the sender is encoding them as base64! What I meant was "You may be able to use CREATE..PARSE provided that at least the base64 part of the email is wrapped in XML". |
|
Back to top |
|
 |
goffinf |
Posted: Tue Jan 31, 2012 8:54 am Post subject: |
|
|
Chevalier
Joined: 05 Nov 2005 Posts: 401
|
kimbert wrote: |
Quote: |
Unfortunately the attachments are mostly jpeg and such like, so no dice there I guess. |
Well...of course. That's why the sender is encoding them as base64! What I meant was "You may be able to use CREATE..PARSE provided that at least the base64 part of the email is wrapped in XML". |
Maybe I'm just being a bit thick ... the email is a multipart related MIME message, each attachment in its own boundary and is just a bunch of Base64 encoded binary as a representation of a PDF, JPEG, etc ...
No XML AFAIK ?? |
|
Back to top |
|
 |
goffinf |
Posted: Tue Jan 31, 2012 9:03 am Post subject: |
|
|
Chevalier
Joined: 05 Nov 2005 Posts: 401
|
mqjeff wrote: |
Take note of the fact that this operation does have to touch every single byte of the blob value, and thus scales linearly with the size of that blob. |
Good idea. For the following code :-
Code: |
SET attachmentAsChar = CAST(Part.Data.BLOB.BLOB AS CHAR CCSID InputProperties.CodedCharSetId);
SET attachmentB64Decoded = base64Decode(attachmentAsChar);
...
SET Part.Data.BLOB.BLOB = attachmentB64Decoded;
|
The CAST is about 30% of the total time
The DECODE about 40%
The rest is other stuff before and after.
mqjeff wrote: |
Take note of the fact that this operation does have to touch every single byte of the blob value, and thus scales linearly with the size of that blob. |
Sure I see this when I send 3 attachment of different sizes :-
att1 (500K) :- Total time around 1 sec (CAST: 0.3, DECODE: 0.4)
att2 (2500K): Total time around 4.5 sec (CAST: 1.8, DECODE: 2.3
att3: (<1K) Total time : 0.003 secs |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Jan 31, 2012 9:11 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
goffinf wrote: |
kimbert wrote: |
Quote: |
Unfortunately the attachments are mostly jpeg and such like, so no dice there I guess. |
Well...of course. That's why the sender is encoding them as base64! What I meant was "You may be able to use CREATE..PARSE provided that at least the base64 part of the email is wrapped in XML". |
Maybe I'm just being a bit thick ... the email is a multipart related MIME message, each attachment in its own boundary and is just a bunch of Base64 encoded binary as a representation of a PDF, JPEG, etc ...
No XML AFAIK ?? |
He's suggesting you do something like
Code: |
Create Last Child of OutputRoot DOMAIN(XMLNSC) PARSE('<xml><body>'||Part.Data.BLOB.BLOB||'</body>'); |
but of course aligned to a message set that indicated that Body was a base64 encoded field.
I can't say as this is particularly faster than a raw base64 decode. Kimbert might know, but he might not have tested it either. |
|
Back to top |
|
 |
mgk |
Posted: Tue Jan 31, 2012 9:38 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
This may not help you now, but in V7 ESQL has a native BASE64DECODE function built in...
Kind regards _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Jan 31, 2012 11:29 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Maybe I'm just being a bit thick |
No - more likely I'm not explaining myself very well. However, mqjeff has successfully worked out what I was trying to say - we'll put that down to long and bitter experience!
re: the performance of that option...you would have to try it. I'm not making any predictions at all about the relative performance of the Java and C implementations of base64 decode. What I do know is that it's not a good idea to make bold assertions about the relative performance of Java and C++ unless you have the evidence in your hand  |
|
Back to top |
|
 |
xvigo |
Posted: Thu Nov 08, 2012 6:37 pm Post subject: |
|
|
Newbie
Joined: 08 Nov 2012 Posts: 4
|
kimbert wrote: |
Quote: |
I have noted that the XMLNSC parser supports Base64 encoding/decoding but I couldn't find an example of Decoding |
No example required, because it is all done by the parser. All you need to do is tell the parser that the element in question is of type xsd:base64Binary. You do that using an xsd. When you parse the XML document the XMLNSC parser recognises the element, automatically decodes the base64 string and puts a BLOB into the message tree. Of course, in order to exploit this feature, you must
a) be using data that is in the XML format
b) have an xsd that describes the XML
c) switch on validation in your message flow
d) tell the XMLNSC parser to build the message tree using XML Schema types ( which is not the default setting, unlike in the SOAP domain ).
Not sure whether your email message is an XML message, but at least you have the facts at your disposal now. You may be able to use CREATE..PARSE provided that at least the base64 part of the email is XML. |
Hi Thanks to your post I was able to have my xml attribute from the soap request decoded, unfortunately the result is an hexadecimal string.
(I put a trace just after the soap input node).
Do you have any suggestion to get the ascii representation of my request message?
(I tested the hex cod from the trace in a hex2ascii converted and I can see my ascii string (that actually is another xml message).
I think it's related to the type of WMB message (BLOB instead of character), but I don't know how to set that in the SOAP Input node, where the parser oerate.
(0x03000060:PCDataField+base64Binary)http://www.curamsoftware.com/WorkspaceServices/IntakeApplication:applicationData = X'3c3f786d6c2076657... Truncated by me' (BLOB)
Thanks, Giovanni |
|
Back to top |
|
 |
mgk |
Posted: Fri Nov 09, 2012 12:04 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Quote: |
Do you have any suggestion to get the ascii representation of my request message? |
Yes, CAST it with a CCSID to a CHARACTER string.
Kind regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
xvigo |
Posted: Fri Nov 09, 2012 12:58 am Post subject: |
|
|
Newbie
Joined: 08 Nov 2012 Posts: 4
|
mgk wrote: |
Quote: |
Do you have any suggestion to get the ascii representation of my request message? |
Yes, CAST it with a CCSID to a CHARACTER string.
Kind regards, |
Hey thanks for the quick answer.
I'm definitely new with Borker development, locking the infocenter and other resources, I wrote this piece of esql code, but I didn't get any result (at least in the debugger).
I guess that's the wrong way to add a new element to the output message.
Any advice?
thanks giovanni
CREATE COMPUTE MODULE C2B_Compute
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
CALL CopyMessageHeaders();
CALL CopyEntireMessage();
DECLARE requestIntakeOut CHARACTER '';
SET requestIntakeOut = CAST (InputRoot.XMLNSC.ns:receiveApplication.ns:applicationData AS CHARACTER CCSID InputRoot.Properties.CodedCharSetId);
SET OutputRoot.XMLNSC.Message.Type.requestIntakeOut = requestIntakeOut;
RETURN TRUE;
END; |
|
Back to top |
|
 |
kimbert |
Posted: Fri Nov 09, 2012 1:10 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
It looks as if you are base64-decoding the attribute and then putting the resulting content directly into OutputRoot.XMLNSC. Are you sure that this is a safe thing to do? Maybe there was a good reason why the sender base64-encoded the attribute contents.
If the debugger does not show you want you need, you can use a Trace node and/or take a user trace ( command-line utilities mqsichangetrace, mqsireadlog, mqsiformatlog. Search this forum for example command lines ). |
|
Back to top |
|
 |
xvigo |
Posted: Fri Nov 09, 2012 1:21 am Post subject: |
|
|
Newbie
Joined: 08 Nov 2012 Posts: 4
|
The filed InputRoot.XMLNSC.ns:receiveApplication.ns:applicationData is already decoded (by the XMLNSC), what I need is to have the result as an XML (in order to map the content to another schema). Unfortunately the XMLNSC parser decode the message as hex.. so I need that as ascii (it's now a readable xml) and then do something to have that as xml message 'mappable" by a Borker mapper (graphical is better for me)
Hope this gives a bit of context.
Giovanni |
|
Back to top |
|
 |
|