MQSeries.net :: View topic - Large Message processing performance FileRead/DFDL

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Large Message processing performance FileRead/DFDL

Large Message processing performance FileRead/DFDL

« View previous topic :: View next topic »

Author

Message

longhorn

Posted: Tue May 03, 2016 11:52 am Post subject: Large Message processing performance FileRead/DFDL

Novice

Joined: 22 May 2014
Posts: 14

OS: Windows 7 64 bit
CPU: i5
RAM: 16GB
Toolkit version 8.0.0.6
WMB capability level: 8.0.0.5

I have been running some tests with a new message flow and the performance is not as good as expected. I am only explicitly 3 fields in the output message and just accepting all other 17 values from the input. When processing a production size file, I am rather alarmed to see the memory usage ramp up by about 1GB.

My flow is MQInput -> FileRead -> Compute -> MQHeader -> MQOutput

Some relevant properties of these nodes:

MQInput: BLOB domain,
FileRead: DFDL domain, On Demand parsing, Whole File

The input message is merely a tiny trigger message to invoke the file read. I have tried processing a 70MB file, a 93MB file and a 226MB file. The number of rows in these files: 216K, 800K, 2 million respectively.

Performance is as follows:

70MB file: Working set memory increase 430MB, time to process 3 minutes
93MB file: Working set memory increase 500MB, time to process 3.75 minutes
226MB file: Working set memory increase 1GB, time to process 9 minutes

An extract from the file being processed is:

Code:

852006|Y| 852006|g5|c|Y|s||||||0|||||0|1|Avaue eien|
000101|N|000101|r1|L|Y|z|ir14|100485||05/06/14||36.15||14/01/16|Pending|Ipso Lorem Ditto|2|30|ABCCD|
000101|N|000101|r1|L|Y|z|ir14|100518||17/09/15||4.19||19/09/15|Pending|Oblongata|2|30|XYZZS|
000101|N|000101|r1|L|Y|z|ir14|102823||18/08/15||24.16|||Pending|Vasco Da Gama|2|30|TTTRRT|

I am using the large messaging technique (code below) and I was expecting memory usage to be much lower given that partial parsing is in effect.

Code:

CALL CopyMessageHeaders();
   DECLARE outMsgRef REFERENCE TO OutputRoot;

   -- Set up mutable input message tree
   DECLARE rowCachedInputDFDL ROW;
   CREATE FIRSTCHILD OF rowCachedInputDFDL DOMAIN ('DFDL') NAME 'DFDL';
   DECLARE mutableMsgRef REFERENCE TO rowCachedInputDFDL.DFDL;
   SET rowCachedInputDFDL.DFDL = InputRoot.DFDL;
   MOVE mutableMsgRef FIRSTCHILD NAME 'inputMsg';
   MOVE mutableMsgRef FIRSTCHILD NAME 'body';

   WHILE LASTMOVE(mutableMsgRef) DO
      CREATE LASTCHILD OF OutputRoot.DFDL.outputMsg AS outMsgRef NAME 'body';
      SET outMsgRef = mutableMsgRef;

      -- Set explicit values for 3 fields in output message
      SET tmpChar = TRIM(UCASE(outMsgRef.Field2));
      IF tmpChar = 'Y' THEN
         SET outMsgRef.Field2 = 'FS';
      ELSEIF tmpChar = 'N' THEN
         SET outMsgRef.Field2 = 'DT';
      END IF;

      SET tmpChar = COALESCE(TRIM(UCASE(outMsgRef.Field5)),'');
      CASE tmpChar
         WHEN '' THEN
            SET outMsgRef.Field5 = 'Standard System Price';
         WHEN 'C' THEN
            SET outMsgRef.Field5 = 'Contract Price';
         WHEN 'L' THEN
            SET outMsgRef.Field5 = 'Lower of Contract or Promotion';
      END CASE;

      SET tmpChar = COALESCE(TRIM(UCASE(outMsgRef.Field16)),'');
      IF tmpChar = '' THEN
         SET outMsgRef.Field16 = 'No Contract Type';
      END IF;

      SET propagateCount = propagateCount + 1;

      IF propagateCount = 25000 THEN
         PROPAGATE;
         CALL CopyMessageHeaders();
         SET propagateCount = 0;
      END IF;

      IF propagateCount > 1 THEN
         DELETE PREVIOUSSIBLING OF mutableMsgRef;
      END IF;
      MOVE mutableMsgRef NEXTSIBLING NAME 'body';

   END WHILE;

   IF propagateCount > 0 THEN
      PROPAGATE;
   END IF;

   RETURN FALSE;

I have also attached my DFDL definitions for the input and output:

input:

Code:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" xmlns:ref="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">

<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="initiator" dfdl:lengthKind="delimited" dfdl:outputNewLine="%CR;%LF;" dfdl:terminator="%CR;%LF;" ibmSchExtn:docRoot="true" name="inputMsg">
<xsd:complexType>
<xsd:sequence dfdl:outputNewLine="%CR;%LF;" dfdl:separator="%CR;%LF;%WSP*;" dfdl:separatorPosition="infix" dfdl:terminator="">
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:emptyValueDelimiterPolicy="initiator" dfdl:occursCountKind="implicit" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" maxOccurs="unbounded" name="body">
<xsd:complexType>
<xsd:sequence dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:separator="%#124;" dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="postfix">
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:useNilForDefault="yes" name="Field1" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field2" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:useNilForDefault="yes" name="Field3" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field4" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field5" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field6" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field7" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field8" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field9" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field10" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field11" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field12" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="######0.00" dfdl:useNilForDefault="yes" name="Field13" nillable="true" type="xsd:decimal"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field14" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field15" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field16" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field17" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field18" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field19" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:emptyValueDelimiterPolicy="terminator" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:nilValueDelimiterPolicy="terminator" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:terminator="" dfdl:useNilForDefault="yes" name="Field20" nillable="true" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
   <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format byteOrder="{$dfdl:byteOrder}" documentFinalTerminatorCanBeMissing="yes" encoding="UTF-8" escapeSchemeRef="recFixLengthFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" outputNewLine="{$dfdl:outputNewLine}" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat" separatorPolicy="suppressed" textPadKind="padChar"/>
   </xsd:appinfo>
</xsd:annotation>

</xsd:schema>

output:

Code:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:ref="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" xmlns:ref1="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">

<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:element ibmSchExtn:docRoot="true" name="outputMsg">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:encoding="{$dfdl:encoding}" dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="unbounded" name="body">
<xsd:complexType>
<xsd:sequence dfdl:separator="|">
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field1" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field2" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field3" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:useNilForDefault="yes" name="Field4" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field5" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field6" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field7" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field8" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field9" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field10" nillable="true" type="xsd:string"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field11" nillable="true" type="xsd:date"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field12" nillable="true" type="xsd:date"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0.00" dfdl:representation="text" dfdl:textNumberPattern="######0.00" dfdl:useNilForDefault="yes" name="Field13" nillable="true" type="xsd:decimal"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field14" nillable="true" type="xsd:string"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field15" nillable="true" type="xsd:date"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field16" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field17" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field18" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field19" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="terminator" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:nilValueDelimiterPolicy="terminator" dfdl:representation="text" dfdl:terminator="|" dfdl:useNilForDefault="yes" name="Field20" nillable="true" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
   <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format byteOrder="{$dfdl:byteOrder}" documentFinalTerminatorCanBeMissing="yes" outputNewLine="{$dfdl:outputNewLine}" ref="ref1:RecordSeparatedFieldsFormat"/>
   </xsd:appinfo>
</xsd:annotation>

</xsd:schema>

Has anyone processed similar messages with much better performance? What other techniques did you use to achieve that performance?

Thanks,

John

mqjeff

Posted: Tue May 03, 2016 12:12 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Are you able to determine that the time spent is in processing the message (i.e. the compute node) rather than in reading and outputting the message data?

The time does seem to grow (relatively) linearly with the size of the file.

You also haven't said if you're using a single instance of the flow or not.

Perhaps you can't, because you need to process the records in order.

You should also look at having the trigger message cause the flow to move the file or rename it where it can be processed by a FileInput node - and have that send one record at a time to your flow.
_________________
chmod -R ugo-wx /

timber

Posted: Tue May 03, 2016 1:11 pm Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

I recommend the following approach:
- Change the FileInput node setting from 'Whole File' to 'Delimited'
- This will make each invocation of the flow a single record, so you will need to use a SHARED ROW variable to perform the counting from 0 to 25000.
- Append one record to the output file for each invocation of the flow
- When the count reaches 25000, send a message to the Finish File terminal of the FileOutput node.

This will avoid loading up the entire 1Gb message in memory ( the BLOB parser is not a streaming parser, and will load the entire input document ).

longhorn

Posted: Wed May 04, 2016 1:27 am Post subject:

Novice

Joined: 22 May 2014
Posts: 14

Thanks guys for your responses.

@mqjeff
I'm using a single instance of the flow.

I ran some stats on a message with just 5000 lines. The results were

Code:

TotalElapsedTime='1471996'
MaximumCPUTime='1451693'

Compute node:
TotalElapsedTime='546626'
TotalCPUTime='546382'

MQOutput node:
TotalElapsedTime='908592'
TotalCPUTime='904806'

So it looks like majority of the time is actually spent in writing the message.

I had initially looked at changing the FileRead node to spit out 1 record at a time but it looks like the FileRead node will only spit out 1 record unless additional logic is added to compute nodes before and after the FileRead node to keep track of RecordCount. That looked a bit messy so I abandoned that as I thought I would be able to get away with partial parsing.

We didn't consider the FileInput node as we wanted to avoid polling for a file that only arrives once a week (but may occasionally arrive adhoc and that's why we opted for a trigegered FileRead node).

@timber
I am specifying DFDL on the FileRead node and also specifying on demand parsing so I expected that the whole document would not be loaded.

timber

Posted: Wed May 04, 2016 5:49 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

I am specifying DFDL on the FileRead node and also specifying on demand parsing so I expected that the whole document would not be loaded.

On-demand parsing is not the same as streaming. I recommend that you give my suggestion a try - it's simpler than having a FileRead node, and I think it's more likely to maintain low memory usage.

longhorn

Posted: Wed May 04, 2016 11:31 am Post subject:

Novice

Joined: 22 May 2014
Posts: 14

@timber
Thanks for the clarification on streaming vs on-demand parsing. I hadn't fully appreciated that.

I think we are limited to the FileRead node in this instance as the file will generally arrive once a week but may also occasionally arrive adhoc. The requirement is to avoid polling for the file.

In any case, by tinkering with the number of rows processed in the Compute node before propagating, it has been possible to rein in the memory usage to more acceptable levels.

Thanks

mqjeff

Posted: Wed May 04, 2016 11:47 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

you could use your trigger message to run CMP code (or mqsi commands) to start the FileInput flow.
_________________
chmod -R ugo-wx /

PeterPotkay

Posted: Wed May 04, 2016 5:25 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Are you producing 1 MQ message per record in the file?
_________________
Peter Potkay
Keep Calm and MQ On

longhorn

Posted: Thu May 05, 2016 1:17 am Post subject:

Novice

Joined: 22 May 2014
Posts: 14

@mqjeff
Thanks for the suggestion. We had thought of triggering the FileInput flow but then for robustness, we'd need checks to verify that the flow is triggered when it's meant to. Initially, it seemed more straightforward to use the FileRead node and avoid having to check for a triggered FileInput flow.

@PeterP
We're reading in the whole file via the FileRead node and then spitting out MQ messages from the compute node with each MQ message containing 25000 rows (currently). The file has 2 million rows and so we don't want to send out 1 MQ message per row from the compute node.

I suppose if we did spit out 1 MQ message per record, we could use a Collector node after the compute to batch these up into X thousand per message but I think that could get a bit messy as we'd then have 2 transactions to manage (before Collector and after Collector). Not impossible, just a bit more convoluted.

mqjeff

Posted: Thu May 05, 2016 3:55 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Using collector in this case wouldn't help the memory settings. You would still need to keep the entire file in memory.

You need to do something to process the file one record at a time.

You should also really discuss what it means to poll for the file, and where it's really bad to do that...

And, the CMP will tell you that the flow has started. Or at least, you can then proceed to check that it's running.

If the flow hasn't started, or it has started but the file isn't being read, then it's really not a different error case than if the FileRead node doesn't work. Something went wrong in broker, you need to react to it.
_________________
chmod -R ugo-wx /

timber

Posted: Thu May 05, 2016 2:51 pm Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

The requirement is to avoid polling for the file.

What's wrong with having the flow deployed and doing nothing for most of the time?

Quote:

by tinkering with the number of rows processed in the Compute node before propagating, it has been possible to rein in the memory usage to more acceptable levels.

Any reason why the rows have to be batched up like this? Why not send out one message per row in the input file?

kash3338

Posted: Thu May 05, 2016 9:47 pm Post subject:

Shaman

Joined: 08 Feb 2009
Posts: 709
Location: Chennai, India

Other approach I could think of is, split your flows into two (like said by the experts here) and instead of having a File Input Node, have a MQ Input node and parse one record at a time with your DFDL. Not sure if this would improve your performance, but you can try this.

longhorn

Posted: Fri May 06, 2016 12:27 am Post subject:

Novice

Joined: 22 May 2014
Posts: 14

Thanks guys for all your input.

We eventually decided to go with the FileInput node. Interestingly, memory usage when using the FileInput node is significantly lower. Memory usage creeps up by just 70MB when using FileInput. Memory usage increases by about 300MB when using FileRead. The ESQL code and settings on the nodes are the same (Whole file, Partial parsing, 25000 records per emitted message). I wasn't expecting that.

@kash3338
We're kind of stuck with using a File node as the starting point as the message is 200MB+ and the sender can't make any changes to reduce the size.

@timber
We thought it might be 'expensive' having broker continually polling for a file which will generally arrive once a week but may also occasionally arrive adhoc. We're probably going to go with polling every 2/3 hours.

We're batching the output like this as there are about 2 million rows in the input file. We don't want the recipient to receive 2 million messages and as already mentioned, we'd rather not use the collector node in this instance.

timber

Posted: Fri May 06, 2016 1:38 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

We thought it might be 'expensive' having broker continually polling for a file which will generally arrive once a week but may also occasionally arrive adhoc.

At least you had the honesty to put the word 'expensive' in quotes

Your assumptions led you into the trap of premature optimisation. As a result, you almost chose a solution that is complex and expensive (in memory) to avoid a solution that is not actually very expensive at all.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Large Message processing performance FileRead/DFDL

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP