MQSeries.net :: View topic - Memory usage when parsing Large message

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Memory usage when parsing Large message

Memory usage when parsing Large message

« View previous topic :: View next topic »

Author

Message

happyj

Posted: Thu Apr 26, 2007 6:11 am Post subject: Memory usage when parsing Large message

Voyager

Joined: 07 Feb 2005
Posts: 87

WBIMB V5 CSD 5
on Solaris

Hi

I am attempting to Parse a large 12Mb message (repeating structure) and to
propagate out many smaller messages.

The problem that I have is that in processing the message the execution group process
grows from ~ 200 Mb to ~1.1Gb and remains at this size.
I am keen to keep the size of this process as low as possible.

I would also like to be able to process larger messages of this type.

The message is parsed using a RCD node to a MRM TDS message set - is this on demand parsing ?
and then reformatted within a single compute node. The transaction mode on the output node is set to 'no'.

The data for each output record is contained within the REPEATING_RECORD structure of the message set

If anyone has any comments on the code or how I can reduce or re-use memory it is much appreciated.

thank you

Code:

SET Environment.Variables.InputRoot = InputRoot.MRM;

DECLARE inMsg REFERENCE TO Environment.Variables.InputRoot.REPEATING_RECORD;
DECLARE outMsg REFERENCE TO OutputRoot.MRM;

WHILE LASTMOVE(inMsg) DO

CALL CopyMessageHeaders();

CREATE FIELD OutputRoot.MRM ;
MOVE outMsg TO OutputRoot.MRM ;

-- Lots of processing

PROPAGATE;

MOVE inMsg NEXTSIBLING;

DELETE PREVIOUSSIBLING OF inMsg;

END WHILE;

kimbert

Posted: Thu Apr 26, 2007 6:52 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

I expect you've seen this already, but just in case...
http://www-128.ibm.com/developerworks/websphere/library/techarticles/0505_storey/0505_storey.html

Quote:

The message is parsed using a RCD node to a MRM TDS message set - is this on demand parsing

It depends on the 'Parse Timing' property on the RCD node ( under 'Parser Options' ).

happyj

Posted: Thu Apr 26, 2007 7:00 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

yes, I have seen the article. I have tried to use the pointers given in this but am surprised at how much memory is used.

Anything obvious wrong in the code ?

I dont have the parse timing option on V5.

kimbert

Posted: Thu Apr 26, 2007 1:43 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Nothing wrong that I can see. Maybe someone else can help...

You could try adding a line in your ESQl which accesses the very last field of your message. Something like
SET myDummyVar = InputRoot.MRM.*[<];
If your memory usage is unchanged afterwards, then your current strategy is not reducing memory usage at all. If memory usage goes up significantly then your current memory usage is as good as it gets.

happyj

Posted: Tue May 01, 2007 4:09 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

many thanks for your help

adding this line sent the memory usage up -
i'm going to look again at the message set that i'm using.

happyj

Posted: Tue May 01, 2007 11:46 pm Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

OK this is the message structure and message set settings.

I don't usually have to do anything beyond fixed width or a comma
seperated list so this is all quite new to me.

Message structure is

HEADER
HEADER
...
DATA
DATA
...
FOOTER
FOOTER
...

where DATA is AAA|BBBBB|CCCC|DDDD|...

There are a variable number of HEADER, DATA and FOOTER lines
the data lines have a variable number of fields delimited with a PIPE character

I need to do different processing on the HEADER/FOOTER and DATA lines and to process data within specific
position elements in the DATA lines

the Message Set uses TDS

TOP_LEVEL has Data Element Seperation All Elements Delimited, Delimiter <LF>

below is

REPEATING_RECORD has Min Occurs 1 Max Occurs -1, Data Element Seperation Use Data Pattern

below is

DATA_RECORD has Data Pattern (.*\|.*\|)

and

HEADER_RECORD has Data Pattern .*

The DATA_RECORD is further defined as Data Element Seperation All Elements Delimited, Delimiter |

and below this DATA_FIELD has Min Occurs 1 Max Occurs -1.

This works but it is using too much memory.
Any ideas how this could be improved ? - Maybe in the regular expressions.

elvis_gn

Posted: Wed May 02, 2007 12:53 am Post subject:

Padawan

Joined: 08 Oct 2004
Posts: 1905
Location: Dubai

Hi happyj,

I'm not sure what ur message structure looks like, but if you avoided data patterns, it would help the performace I think....Kimbert should be able to help you there, perhaps you should post the message details...

Coming to the code, you are copying the entire InputRoot to Environment...Why ? Can't you work from the input message itself, using a while and not deleting the previous siblings ?

Regards.

happyj

Posted: Wed May 02, 2007 1:25 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

Hi elvis_gn

The approach was copied from the article by D Storey referenced above.

I believe it allows the message to parsed as it is required and to
delete parts of the message (in the environment) that are no longer
required.

The message structure I have consists of a number of header lines
containing text, then data lines which are delimited by pipes and then
finally footer lines which again contain text. Both the number of lines and
the number of pipes in the data lines are not known.

I need to determine whether a line is a Data or a Header/Footer line
and used the Data Pattern to do this.

I am propagating an output message per data line.

Again any advice is much appreciated.

kimbert

Posted: Wed May 02, 2007 2:08 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

- Take a look at the CSV samples in the samples gallery. There's a sample which does almost exactly what you need.
- I think your message set can be simplified a lot, but I cannot be sure unless you post a complete example of an input message. ( not the entire thing, but enough to show all of the important details, including the internal structure of HEADER, BODY, TRAILER, and details on how you tell them apart ).

happyj

Posted: Thu May 03, 2007 5:03 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

Is the CSV sample with V6 ? I don't currently have access to this but I will request it. Is the sample available online - the links on the V6 infocenter don't go anywhere?

The real message contains sensitive information but this is a representation.

It has a variable number of lines ( seperated with <LF> )
which can be headers or footers or data lines. I need to
distinguish between the data lines and the other two and to do some
processing on the data in some of the fields on the data lines.

Although the total number of pipe delimited fields can vary the processing is always on a fixed element position. i.e. the date field to process is always the second field.

HEADER1 just some text but no pipes
HEADER2 just some more text but no pipe chars
...
DATA1|a date|a number|a name|variable number of pipe delimited fields
DATA2|a date|another number|another name|variable number of pipe delimited fields
...
FOOTER1 just some text but no pipes
FOOTER2 just some more text again no pipes

I would be very interested in a better way to model this as a message set

kimbert

Posted: Thu May 03, 2007 7:00 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Do the lines actually start with 'HEADER1' , 'DATA1' , 'FOOTER'?
If so, you can treat these as tags, and set Data Element Separation to Tagged Delimited.
If not, you need to continue using 'Use Data Pattern'.

I'm away on holidays for a week starting tomorrow, so don't expect any more posts from me on this thread

happyj

Posted: Fri May 04, 2007 2:02 am Post subject:

Voyager

Joined: 07 Feb 2005
Posts: 87

no the lines just contain text or pipe delim data.
I'm going to read all the V6 message set samples to see if anything
applies - have a nice holiday

elvis_gn

Posted: Fri May 04, 2007 2:15 am Post subject:

Padawan

Joined: 08 Oct 2004
Posts: 1905
Location: Dubai

Hi happyj,

Please post a sample of your input message...containing all the sections, HEADER, FOOTER etc...

Regards.

jefflowrey

Posted: Fri May 04, 2007 3:31 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Does there exist more than one instance of the HEADERs in the input message?

If not, you don't have to worry about finding them again, and all you have to be concerned with is telling when DATA lines end and FOOTER lines begin.

Also, if HEADER lines and FOOTER lines are fixed lenght, you could preprocess the data as BLOB and insert tags on those four lines.
_________________
I am *not* the model of the modern major general.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Memory usage when parsing Large message

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP