Author |
Message
|
longhorn |
Posted: Tue May 03, 2016 11:52 am Post subject: Large Message processing performance FileRead/DFDL |
|
|
Novice
Joined: 22 May 2014 Posts: 14
|
OS: Windows 7 64 bit
CPU: i5
RAM: 16GB
Toolkit version 8.0.0.6
WMB capability level: 8.0.0.5
I have been running some tests with a new message flow and the performance is not as good as expected. I am only explicitly 3 fields in the output message and just accepting all other 17 values from the input. When processing a production size file, I am rather alarmed to see the memory usage ramp up by about 1GB.
My flow is MQInput -> FileRead -> Compute -> MQHeader -> MQOutput
Some relevant properties of these nodes:
MQInput: BLOB domain,
FileRead: DFDL domain, On Demand parsing, Whole File
The input message is merely a tiny trigger message to invoke the file read. I have tried processing a 70MB file, a 93MB file and a 226MB file. The number of rows in these files: 216K, 800K, 2 million respectively.
Performance is as follows:
70MB file: Working set memory increase 430MB, time to process 3 minutes
93MB file: Working set memory increase 500MB, time to process 3.75 minutes
226MB file: Working set memory increase 1GB, time to process 9 minutes
An extract from the file being processed is:
Code: |
852006|Y| 852006|g5|c|Y|s||||||0|||||0|1|Avaue eien|
000101|N|000101|r1|L|Y|z|ir14|100485||05/06/14||36.15||14/01/16|Pending|Ipso Lorem Ditto|2|30|ABCCD|
000101|N|000101|r1|L|Y|z|ir14|100518||17/09/15||4.19||19/09/15|Pending|Oblongata|2|30|XYZZS|
000101|N|000101|r1|L|Y|z|ir14|102823||18/08/15||24.16|||Pending|Vasco Da Gama|2|30|TTTRRT|
|
I am using the large messaging technique (code below) and I was expecting memory usage to be much lower given that partial parsing is in effect.
Code: |
CALL CopyMessageHeaders();
DECLARE outMsgRef REFERENCE TO OutputRoot;
-- Set up mutable input message tree
DECLARE rowCachedInputDFDL ROW;
CREATE FIRSTCHILD OF rowCachedInputDFDL DOMAIN ('DFDL') NAME 'DFDL';
DECLARE mutableMsgRef REFERENCE TO rowCachedInputDFDL.DFDL;
SET rowCachedInputDFDL.DFDL = InputRoot.DFDL;
MOVE mutableMsgRef FIRSTCHILD NAME 'inputMsg';
MOVE mutableMsgRef FIRSTCHILD NAME 'body';
WHILE LASTMOVE(mutableMsgRef) DO
CREATE LASTCHILD OF OutputRoot.DFDL.outputMsg AS outMsgRef NAME 'body';
SET outMsgRef = mutableMsgRef;
-- Set explicit values for 3 fields in output message
SET tmpChar = TRIM(UCASE(outMsgRef.Field2));
IF tmpChar = 'Y' THEN
SET outMsgRef.Field2 = 'FS';
ELSEIF tmpChar = 'N' THEN
SET outMsgRef.Field2 = 'DT';
END IF;
SET tmpChar = COALESCE(TRIM(UCASE(outMsgRef.Field5)),'');
CASE tmpChar
WHEN '' THEN
SET outMsgRef.Field5 = 'Standard System Price';
WHEN 'C' THEN
SET outMsgRef.Field5 = 'Contract Price';
WHEN 'L' THEN
SET outMsgRef.Field5 = 'Lower of Contract or Promotion';
END CASE;
SET tmpChar = COALESCE(TRIM(UCASE(outMsgRef.Field16)),'');
IF tmpChar = '' THEN
SET outMsgRef.Field16 = 'No Contract Type';
END IF;
SET propagateCount = propagateCount + 1;
IF propagateCount = 25000 THEN
PROPAGATE;
CALL CopyMessageHeaders();
SET propagateCount = 0;
END IF;
IF propagateCount > 1 THEN
DELETE PREVIOUSSIBLING OF mutableMsgRef;
END IF;
MOVE mutableMsgRef NEXTSIBLING NAME 'body';
END WHILE;
IF propagateCount > 0 THEN
PROPAGATE;
END IF;
RETURN FALSE; |
I have also attached my DFDL definitions for the input and output:
input:
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" xmlns:ref="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="initiator" dfdl:lengthKind="delimited" dfdl:outputNewLine="%CR;%LF;" dfdl:terminator="%CR;%LF;" ibmSchExtn:docRoot="true" name="inputMsg">
<xsd:complexType>
<xsd:sequence dfdl:outputNewLine="%CR;%LF;" dfdl:separator="%CR;%LF;%WSP*;" dfdl:separatorPosition="infix" dfdl:terminator="">
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:emptyValueDelimiterPolicy="initiator" dfdl:occursCountKind="implicit" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" maxOccurs="unbounded" name="body">
<xsd:complexType>
<xsd:sequence dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:separator="%#124;" dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="postfix">
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:useNilForDefault="yes" name="Field1" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field2" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:useNilForDefault="yes" name="Field3" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field4" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field5" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field6" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field7" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field8" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field9" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field10" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field11" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field12" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="######0.00" dfdl:useNilForDefault="yes" name="Field13" nillable="true" type="xsd:decimal"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field14" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field15" nillable="true" type="xsd:date"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field16" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field17" nillable="true" type="xsd:string"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field18" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field19" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:byteOrder="{$dfdl:byteOrder}" dfdl:emptyValueDelimiterPolicy="terminator" dfdl:nilKind="logicalValue" dfdl:nilValue="%ES;" dfdl:nilValueDelimiterPolicy="terminator" dfdl:outputNewLine="%CR;%LF;" dfdl:ref="ref:RecordSeparatedFieldsFormat" dfdl:representation="text" dfdl:terminator="" dfdl:useNilForDefault="yes" name="Field20" nillable="true" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}" documentFinalTerminatorCanBeMissing="yes" encoding="UTF-8" escapeSchemeRef="recFixLengthFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" outputNewLine="{$dfdl:outputNewLine}" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat" separatorPolicy="suppressed" textPadKind="padChar"/>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema> |
output:
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:ref="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" xmlns:ref1="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:element ibmSchExtn:docRoot="true" name="outputMsg">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:encoding="{$dfdl:encoding}" dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="unbounded" name="body">
<xsd:complexType>
<xsd:sequence dfdl:separator="|">
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field1" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field2" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field3" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:useNilForDefault="yes" name="Field4" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field5" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field6" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field7" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field8" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field9" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field10" nillable="true" type="xsd:string"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field11" nillable="true" type="xsd:date"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field12" nillable="true" type="xsd:date"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0.00" dfdl:representation="text" dfdl:textNumberPattern="######0.00" dfdl:useNilForDefault="yes" name="Field13" nillable="true" type="xsd:decimal"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field14" nillable="true" type="xsd:string"/>
<xsd:element dfdl:calendarPattern="dd/MM/yy" dfdl:calendarPatternKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field15" nillable="true" type="xsd:date"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field16" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:representation="text" dfdl:useNilForDefault="yes" name="Field17" nillable="true" type="xsd:string"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field18" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="0" dfdl:representation="text" dfdl:textNumberPattern="#0" dfdl:useNilForDefault="yes" name="Field19" nillable="true" type="xsd:integer"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="terminator" dfdl:nilKind="literalValue" dfdl:nilValue="%WSP*;" dfdl:nilValueDelimiterPolicy="terminator" dfdl:representation="text" dfdl:terminator="|" dfdl:useNilForDefault="yes" name="Field20" nillable="true" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}" documentFinalTerminatorCanBeMissing="yes" outputNewLine="{$dfdl:outputNewLine}" ref="ref1:RecordSeparatedFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema> |
Has anyone processed similar messages with much better performance? What other techniques did you use to achieve that performance?
Thanks,
John |
|
Back to top |
|
 |
mqjeff |
Posted: Tue May 03, 2016 12:12 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Are you able to determine that the time spent is in processing the message (i.e. the compute node) rather than in reading and outputting the message data?
The time does seem to grow (relatively) linearly with the size of the file.
You also haven't said if you're using a single instance of the flow or not.
Perhaps you can't, because you need to process the records in order.
You should also look at having the trigger message cause the flow to move the file or rename it where it can be processed by a FileInput node - and have that send one record at a time to your flow. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Tue May 03, 2016 1:11 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
I recommend the following approach:
- Change the FileInput node setting from 'Whole File' to 'Delimited'
- This will make each invocation of the flow a single record, so you will need to use a SHARED ROW variable to perform the counting from 0 to 25000.
- Append one record to the output file for each invocation of the flow
- When the count reaches 25000, send a message to the Finish File terminal of the FileOutput node.
This will avoid loading up the entire 1Gb message in memory ( the BLOB parser is not a streaming parser, and will load the entire input document ). |
|
Back to top |
|
 |
longhorn |
Posted: Wed May 04, 2016 1:27 am Post subject: |
|
|
Novice
Joined: 22 May 2014 Posts: 14
|
Thanks guys for your responses.
@mqjeff
I'm using a single instance of the flow.
I ran some stats on a message with just 5000 lines. The results were
Code: |
TotalElapsedTime='1471996'
MaximumCPUTime='1451693'
Compute node:
TotalElapsedTime='546626'
TotalCPUTime='546382'
MQOutput node:
TotalElapsedTime='908592'
TotalCPUTime='904806' |
So it looks like majority of the time is actually spent in writing the message.
I had initially looked at changing the FileRead node to spit out 1 record at a time but it looks like the FileRead node will only spit out 1 record unless additional logic is added to compute nodes before and after the FileRead node to keep track of RecordCount. That looked a bit messy so I abandoned that as I thought I would be able to get away with partial parsing.
We didn't consider the FileInput node as we wanted to avoid polling for a file that only arrives once a week (but may occasionally arrive adhoc and that's why we opted for a trigegered FileRead node).
@timber
I am specifying DFDL on the FileRead node and also specifying on demand parsing so I expected that the whole document would not be loaded. |
|
Back to top |
|
 |
timber |
Posted: Wed May 04, 2016 5:49 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
I am specifying DFDL on the FileRead node and also specifying on demand parsing so I expected that the whole document would not be loaded. |
On-demand parsing is not the same as streaming. I recommend that you give my suggestion a try - it's simpler than having a FileRead node, and I think it's more likely to maintain low memory usage. |
|
Back to top |
|
 |
longhorn |
Posted: Wed May 04, 2016 11:31 am Post subject: |
|
|
Novice
Joined: 22 May 2014 Posts: 14
|
@timber
Thanks for the clarification on streaming vs on-demand parsing. I hadn't fully appreciated that.
I think we are limited to the FileRead node in this instance as the file will generally arrive once a week but may also occasionally arrive adhoc. The requirement is to avoid polling for the file.
In any case, by tinkering with the number of rows processed in the Compute node before propagating, it has been possible to rein in the memory usage to more acceptable levels.
Thanks |
|
Back to top |
|
 |
mqjeff |
Posted: Wed May 04, 2016 11:47 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
you could use your trigger message to run CMP code (or mqsi commands) to start the FileInput flow. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed May 04, 2016 5:25 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Are you producing 1 MQ message per record in the file? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
longhorn |
Posted: Thu May 05, 2016 1:17 am Post subject: |
|
|
Novice
Joined: 22 May 2014 Posts: 14
|
@mqjeff
Thanks for the suggestion. We had thought of triggering the FileInput flow but then for robustness, we'd need checks to verify that the flow is triggered when it's meant to. Initially, it seemed more straightforward to use the FileRead node and avoid having to check for a triggered FileInput flow.
@PeterP
We're reading in the whole file via the FileRead node and then spitting out MQ messages from the compute node with each MQ message containing 25000 rows (currently). The file has 2 million rows and so we don't want to send out 1 MQ message per row from the compute node.
I suppose if we did spit out 1 MQ message per record, we could use a Collector node after the compute to batch these up into X thousand per message but I think that could get a bit messy as we'd then have 2 transactions to manage (before Collector and after Collector). Not impossible, just a bit more convoluted. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu May 05, 2016 3:55 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Using collector in this case wouldn't help the memory settings. You would still need to keep the entire file in memory.
You need to do something to process the file one record at a time.
You should also really discuss what it means to poll for the file, and where it's really bad to do that...
And, the CMP will tell you that the flow has started. Or at least, you can then proceed to check that it's running.
If the flow hasn't started, or it has started but the file isn't being read, then it's really not a different error case than if the FileRead node doesn't work. Something went wrong in broker, you need to react to it. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Thu May 05, 2016 2:51 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
The requirement is to avoid polling for the file. |
What's wrong with having the flow deployed and doing nothing for most of the time?
Quote: |
by tinkering with the number of rows processed in the Compute node before propagating, it has been possible to rein in the memory usage to more acceptable levels. |
Any reason why the rows have to be batched up like this? Why not send out one message per row in the input file? |
|
Back to top |
|
 |
kash3338 |
Posted: Thu May 05, 2016 9:47 pm Post subject: |
|
|
Shaman
Joined: 08 Feb 2009 Posts: 709 Location: Chennai, India
|
Other approach I could think of is, split your flows into two (like said by the experts here) and instead of having a File Input Node, have a MQ Input node and parse one record at a time with your DFDL. Not sure if this would improve your performance, but you can try this. |
|
Back to top |
|
 |
longhorn |
Posted: Fri May 06, 2016 12:27 am Post subject: |
|
|
Novice
Joined: 22 May 2014 Posts: 14
|
Thanks guys for all your input.
We eventually decided to go with the FileInput node. Interestingly, memory usage when using the FileInput node is significantly lower. Memory usage creeps up by just 70MB when using FileInput. Memory usage increases by about 300MB when using FileRead. The ESQL code and settings on the nodes are the same (Whole file, Partial parsing, 25000 records per emitted message). I wasn't expecting that.
@kash3338
We're kind of stuck with using a File node as the starting point as the message is 200MB+ and the sender can't make any changes to reduce the size.
@timber
We thought it might be 'expensive' having broker continually polling for a file which will generally arrive once a week but may also occasionally arrive adhoc. We're probably going to go with polling every 2/3 hours.
We're batching the output like this as there are about 2 million rows in the input file. We don't want the recipient to receive 2 million messages and as already mentioned, we'd rather not use the collector node in this instance. |
|
Back to top |
|
 |
timber |
Posted: Fri May 06, 2016 1:38 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
We thought it might be 'expensive' having broker continually polling for a file which will generally arrive once a week but may also occasionally arrive adhoc. |
At least you had the honesty to put the word 'expensive' in quotes
Your assumptions led you into the trap of premature optimisation. As a result, you almost chose a solution that is complex and expensive (in memory) to avoid a solution that is not actually very expensive at all. |
|
Back to top |
|
 |
|