MQSeries.net :: View topic - Parsed Record Sequence with Large message

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsed Record Sequence with Large message

Parsed Record Sequence with Large message

« View previous topic :: View next topic »

Author

Message

sumit

Posted: Sun Mar 01, 2015 12:16 pm Post subject: Parsed Record Sequence with Large message

Partisan

Joined: 19 Jan 2006
Posts: 398

Hi All,

OS - Windows (for now, have to move it to Linux)
IIB (v9)

Developed a message flow to process a structured large flat file. File has header, multiple data (no specific count) and trailer.
Prepared a basic DFDL with header (optional), data (max count set to 20), trailer (optional).
I am using this DFDL in my fileInput node with Parse record sequence. The input flat file can have any number of Data records. Intention is to let the FileInput node take 20 data at once from all the available data records with the help of DFDL and sends it through the flow for processing.
I have tested this scenario with a smaller file and it works well. Even open in debug mode (I know the behavior of debug doesn't imitate the real processing), I can see that the flow picks 20 records in one go, process them and send them to output file (FileOutput node, default settings). I can see a set of 20 data records are going into the output file before flow picks next set of 20 data records.

However, when tested this flow with a large file (50Mb), it started taking a lot of time. I can see the output file getting created in transit folder but with 0kb size. I cannot see the file size growing.

I ran trace and I can see the offset value changes in trace, which shows that flow is picking 20 records in one go, but don't see them going into the output file for each processing.

I understand that parse record sequence is an expensive way of dealing with file. However, I am struggling to understand why doesn't the output file being appended with each set of 20 Data record processing.

Had this worked, I was planning to test it with 1GB and then 2Gb input message.
_________________
Regards
Sumit

Vitor

Posted: Mon Mar 02, 2015 6:16 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Without looking at your DFDL schema (which you've not posted), one possible theory is that it's picking up 20 data records then parsing the rest of the 50Mb file looikng for somthing it recognises (which it doesn't find). The output file never gets any data because the flow doesn't commit anything.

If I was coding this, I'd build a DFDL model that correctly described the data (optional header, 1-n data records, optional trailer) and put a Collector node as the next one in sequence after the FileInput with a collection size of 20.

You could of course perform the same collection of records into groups of 20 with a shared varable, database, global cache (if you're on that version) or other mechanism of your choice. I'd use a Collector, but the key point is to shred the file with DFDL and group for processing with code.
_________________
Honesty is the best policy.
Insanity is the best defence.

sumit

Posted: Mon Mar 02, 2015 7:52 am Post subject:

Partisan

Joined: 19 Jan 2006
Posts: 398

Vitor wrote:

Without looking at your DFDL schema (which you've not posted)

Input data is an EDI data with multiple 5000s and 5990s in it. Each 5000-5990 represents a record. DFDL is designed to pick 20 such records in one go. Here is the DFDL strcutrure-

Code:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fmt="http://www.ibm.com/dfdl/GeneralPurposeFormat" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/GeneralPurposeFormat" schemaLocation="../IBMdefined/GeneralPurposeFormat.xsd"/>
<xsd:element ibmSchExtn:docRoot="true" name="Claims_split" type="Claims_msg"/>
<xsd:complexType name="body_msg">
   <xsd:sequence dfdl:separator="" dfdl:terminator="">
      <xsd:element dfdl:initiator="5000" dfdl:terminator="%LF;5990" ibmDfdlExtn:sampleValue="" name="body5000" type="xsd:string"/>
      <xsd:element dfdl:terminator="%LF;" ibmDfdlExtn:sampleValue="" name="body5990" type="xsd:string"/>
   </xsd:sequence>
</xsd:complexType>
<xsd:element name="Claims_test" type="Claims_msg"/>
<xsd:complexType name="Claims_msg">
   <xsd:sequence dfdl:separator="">
      <xsd:element dfdl:initiator="HEADER" dfdl:occursCountKind="implicit" dfdl:terminator="%LF;" ibmDfdlExtn:sampleValue="" minOccurs="0" name="header" type="xsd:string"/>
      <xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="20" name="body" type="body_msg"/>
      <xsd:element dfdl:initiator="TRAILER" dfdl:occursCountKind="implicit" ibmDfdlExtn:sampleValue="" minOccurs="0" name="trailer" type="xsd:string"/>
   </xsd:sequence>
</xsd:complexType>
<xsd:annotation>
   <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="fmt:GeneralPurposeFormat"/>
   </xsd:appinfo>
</xsd:annotation>
</xsd:schema>

I hope this way of presentation is fine.

Vitor wrote:

...it's picking up 20 data records then parsing the rest of the 50Mb file looikng for somthing it recognises (which it doesn't find). The output file never gets any data because the flow doesn't commit anything.

Hmm.. I have tested the DFDL on a smaller file. I could see the flow was writing the output file in debug mode for each iteration. As you mentioned, it may be because 'commit' behaviour is different in debug mode as compared to the actual flow processing.

Vitor wrote:

If I was coding this, I'd build a DFDL model that correctly described the data (optional header, 1-n data records, optional trailer) and put a Collector node as the next one in sequence after the FileInput with a collection size of 20.

You could of course perform the same collection of records into groups of 20 with a shared varable, database, global cache (if you're on that version) or other mechanism of your choice. I'd use a Collector, but the key point is to shred the file with DFDL and group for processing with code.

Thanks for the suggestion. I will try using the collector node.
_________________
Regards
Sumit

Vitor

Posted: Mon Mar 02, 2015 8:45 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

sumit wrote:

Vitor wrote:

Without looking at your DFDL schema (which you've not posted)

Input data is an EDI data with multiple 5000s and 5990s in it. Each 5000-5990 represents a record. DFDL is designed to pick 20 such records in one go

I'm no @kimbert, but I think that's telling DFDL there are only a maximum of 20 occurances of that structure. I don't see where you describe that the group of 1-20 repeats through the file.

sumit wrote:

I hope this way of presentation is fine.

This way of presentation is ideal, and I thank you for using those tags.

sumit wrote:

Vitor wrote:

It is.

sumit wrote:

Vitor wrote:

Thanks for the suggestion. I will try using the collector node.

Please post again (on a new thread if appropriate) if you continue to expereince issues.
_________________
Honesty is the best policy.
Insanity is the best defence.

sumit

Posted: Mon Mar 02, 2015 10:22 am Post subject:

Partisan

Joined: 19 Jan 2006
Posts: 398

Vitor wrote:

I'm no @kimbert, but I think that's telling DFDL there are only a maximum of 20 occurances of that structure. I don't see where you describe that the group of 1-20 repeats through the file.

My understanding is that using Parsed Record Sequence at FileInput node will ensure that the flow picks first 20 records only for one set of processing.
I must mention here, the sample flow has just FileInput node and FileOutput node.
_________________
Regards
Sumit

sumit

Posted: Mon Mar 02, 2015 11:09 am Post subject:

Partisan

Joined: 19 Jan 2006
Posts: 398

May be I am all wrong with my understanding. Testing the flow with various scenarios now.
_________________
Regards
Sumit

Vitor

Posted: Mon Mar 02, 2015 11:13 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

sumit wrote:

Vitor wrote:

I'm no @kimbert, but I think that's telling DFDL there are only a maximum of 20 occurances of that structure. I don't see where you describe that the group of 1-20 repeats through the file.

My understanding is that using Parsed Record Sequence at FileInput node will ensure that the flow picks first 20 records only for one set of processing.

So your assertion is that because the record definition allows for a maximum of 20 records, the file input node will pick 20 records?

Fair enough; my assertion is that the file input node will use the record definition to identify the first 20 records of data, then trawl in confusion through the rest of the file.

One of us is right and I'm not that certain it's me (I go to a lot of trouble to avoid parsed record sequence due to the cost involved so my experience it limited). If you're right then I don't know what's the issue with your flow.
_________________
Honesty is the best policy.
Insanity is the best defence.

Vitor

Posted: Mon Mar 02, 2015 11:21 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

sumit wrote:

May be I am all wrong with my understanding.

As I said, I wouldn't immediately assume that. I think I've used parsed record sequence once or maybe twice in the 14 years I've been using whatever-the-product-is-called-now and certainly not recently.

"Will Mr Kimbert please answer a thread on the white phone. Mr Kimbert to the white phone please."
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Mon Mar 02, 2015 11:29 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

The DFDL as posted says the following:
There is a message. It consists of three parts: a header, some body stuff, and a trailer.

There is exactly one header.

There are up to but no more than 20 body records.

There is exactly one trailer.

Is that actually how your data is organized? That for every 20 records, there is a separate header and trailer?

Or is it one header and one trailer in the entire file, and a whole lot of body records?

Either way, you need to figure out how to associate 'parsed record sequence' with the fact that you have a header and a trailer.

sumit

Posted: Mon Mar 02, 2015 12:25 pm Post subject:

Partisan

Joined: 19 Jan 2006
Posts: 398

mqjeff wrote:

Is that actually how your data is organized? That for every 20 records, there is a separate header and trailer?

Or is it one header and one trailer in the entire file, and a whole lot of body records?

The input file has one header and one trailer. In between, there could be any number of body records.

I've tested my flow with a smaller file as well. I've got a file with 1 Header, 8 data records and 1 trailer. I've updated my DFDL to pick 2 records at a time (max count 2). All other properties of FileInput node are same. I have also placed a collector node with Quantity set to 1 and timeout of 10 secs. Event coodination property is set to 'Disabled'. A compute node, just to send inputroot to outputroot and then an MQ Output node.

Code:

FileInput -> Compute (Outputroot = InputRoot) -> Collector -> Compute (OutputRoot = InputRoot) -> MQ Output

When I am running this flow, I can see 4 messages in my output queue.
First - 1 header, a body with 2 records
Second - a body with 2 records
Third - a body with 2 records
Forth - a body with 2 records and 1 trailer.

But when I do the same thing for large Max count in DFDL (20) and run the interface on a large file, nothing comes in output for long. I am going to run trace on my new setup to see what it suggests.
_________________
Regards
Sumit

kimbert

Posted: Tue Mar 03, 2015 5:32 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

The DFDL model needs to be a repeating choice of header/body/trailer. You cannot set maxOccurs=unbounded on a choice group, so you must create a model with an element that repeats unbounded. The element contains the choice of header/body/trailer.

If you want 'body' to represent ( up to ) 20 body records then you must create a sequence group that contains an element with minOccurs=0 and maxOccurs=20 on that branch of the choice.

Easier to show it like this:

Code:

Message
complex type
element name='record' maxOccurs='unbounded'
choice group
element name='header'
sequence group
element name='body' minOccurs=0 maxOccurs=20
element name='trailer'

You may need to add discriminators to header, body and trailer elements to ensure that the DFDL parser always resolves the choice correctly. Remember that the DFDL trace is your best diagnosis tool.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

sumit

Posted: Fri Mar 06, 2015 9:29 am Post subject:

Partisan

Joined: 19 Jan 2006
Posts: 398

Thanks for suggestion Kimbert. I'll try it out.

After my last post, I dropped the idea of using 'Parse Record Sequenc' and built a flow with 'Records and Elements' set to 'Delimited' at my FileInput node.
Used a collector node to collect 20/50 records at a time, then built a logical mesasge with header and trailer and sent it to an output queue. This is on the same lines what 'wbi_telecom' mentioned in this post.

Tested the flow on 2GB file and it processed the whole fine in close to 2 mins and 20 secs.
_________________
Regards
Sumit

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parsed Record Sequence with Large message

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP