MQSeries.net :: View topic - DFDL parser for unbounded records with delimited fields

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parser for unbounded records with delimited fields

Goto page 1, 2 Next

DFDL parser for unbounded records with delimited fields

« View previous topic :: View next topic »

Author

Message

rahulk01

Posted: Thu Dec 26, 2019 11:53 am Post subject: DFDL parser for unbounded records with delimited fields

Apprentice

Joined: 26 Dec 2019
Posts: 35

Hi,
I have to generate a DFDL parser for a format which contains some unbounded records within a sequence, like following:
Transaction
sequence
Record1 1,1
Record2 0,unbounded
Record3 1,2
Record4 1,1
end of sequence
end of Transaction

The parser that I have created looks like this:
<xsd:element ibmSchExtn:docRoot="true" name="Message">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:outputNewLine="{$dfdl:outputNewLine}" dfdl:terminator="%CR;%LF;" name="FileHeader" type="HeaderRecord"/>
<xsd:element name="Bundle">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;" name="BundleHeader" type="BundleHeaderRecord"/>
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="unbounded" name="Transactions" type="Transaction"/>
<xsd:element dfdl:terminator="%CR;%LF;" name="BundleTrailer" type="BundleTrailerRecord"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="" name="FileTrailer" type="TrailerRecord"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="Transaction">
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;" name="BelopInfo" type="BelopRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="2" minOccurs="0" name="AccountPrintInfo" type="AccountPrintRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="KidInfo" type="KIDRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="AddressInfo" type="AddressRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="MessageInfo" type="MessageRecord"/>
</xsd:sequence>
</xsd:complexType>

But when I am trying to parse a message which has multiple KidInfo, it parses the first occurence as KidInfo and the next one as AddressInfo and fails (since the structures are different).

I am not sure if I need to add a descriptor in each record to identify the record while parsing. If yes, then I am not sure on how to do it.
Actually the 8th field in each record (called RECORD-IDENTIFIER) has the value which determines what record it is, but I am not sure on how to use it, as this is my first DFDL parser.

Any help would be greatly appreciated.

BR
Rahul

fjb_saper

Posted: Fri Dec 27, 2019 7:04 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

You need to add a DFDL discriminator for each record type, and link that to the record identifier you described. Have fun

_________________
MQ & Broker admin

rahulk01

Posted: Fri Dec 27, 2019 10:18 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

fjb_saper wrote:

You need to add a DFDL discriminator for each record type, and link that to the record identifier you described. Have fun

Thanks for your reply.
I added the DFDL discriminator, initially for 2 records, AccountPrintInfo and KidInfo.
See below the schema

Code:

<xsd:complexType name="Transaction">
   <xsd:sequence dfdl:separator="">
      <xsd:element dfdl:terminator="%CR;%LF;" name="BelopInfo" type="BelopRecord"/>
      <xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="2" minOccurs="0" name="AccountPrintInfo" type="AccountPrintRecord">
      <xsd:annotation>
      <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:discriminator>{fn:contains(/Message/Bundle/Transactions/AccountPrintInfo/RECORD_IDENTIFIER , 'X')}</dfdl:discriminator>
      </xsd:appinfo>
      </xsd:annotation>
   </xsd:element>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="KidInfo" type="KIDRecord">
      <xsd:annotation>
      <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:discriminator>{/Message/Bundle/Transactions/KidInfo/RECORD_IDENTIFIER eq 'U'}</dfdl:discriminator>
      </xsd:appinfo>
      </xsd:annotation>
   </xsd:element>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="AddressInfo" type="AddressRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="MessageInfo" type="MessageRecord"/>
</xsd:sequence>
</xsd:complexType>

AccountPrintInfo record has been defined to have 0-2 occurence and the next record KidInfo can have 0-unbounded occurrence.
The message that I used had 1 AccountInfo and the next record was a KidInfo, but during the parsing
The Model created 1 AccountPrintInfo successfully and the next record (which was a KidInfo) again got parsed as AccountPrintInfo, even though the next record had the RECORD_IDENTIFIER field as 'U'. I am not sure what am I missing.
Thanks in advance for any help.

fjb_saper

Posted: Fri Dec 27, 2019 11:43 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Look at the tutorial for discriminators.

RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!

_________________
MQ & Broker admin

rahulk01

Posted: Sat Dec 28, 2019 3:13 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

fjb_saper wrote:

Look at the tutorial for discriminators.

RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!

Thanks for your lead. I was able to achieve what I was trying to. The XPath of Record_Identifier in the discriminator was the issue, when I used it as ./RECORD_IDENTIFIER instead, it worked for me.
And by the way, I did not post the complete schema in my posts, to save space. RECORD_IDENTIFIER is defined as a fixed length field inside the complex types for ACCOUNTINFO, KIDINFO and many others.

fjb_saper

Posted: Mon Dec 30, 2019 6:02 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

rahulk01 wrote:

fjb_saper wrote:

Look at the tutorial for discriminators.

RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!

Glad I could help and thanks for sharing the solution.

_________________
MQ & Broker admin

rahulk01

Posted: Mon Dec 30, 2019 10:11 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

fjb_saper wrote:

rahulk01 wrote:

fjb_saper wrote:

Look at the tutorial for discriminators.

RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!

Glad I could help and thanks for sharing the solution.

Thought I was done, but then landed up in another problem. I am building a DFDL schema where some elements are defined to be delimited with '%'. But when I use % in the terminator for the element, I get an error saying 'CTDV1438E : DFDL property 'terminator' contains an invalid entity '%'. A valid entity must obey pattern ['%#' [0-9]+ ';' | '%#x' [0-9a-fA-F]+ ';' | '%#r' [0-9a-fA-F] (2)';' | '%' <name> ';']. '.
Please help me to identify on how to set % as a delimiter, and what escape should I use. I have tried setting the delimiter as "%", {%, "{%" but none worked

timber

Posted: Mon Dec 30, 2019 3:31 pm Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

If you need to use the character % in a delimiter or initiator string, just use the string %%.

In case it helps, the DFDL specification is here: https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
The section that describes String Literals is 6.3.1.2. You may find it useful in future if you need to represent control characters or raw byte values in your DFDL models.

rahulk01

Posted: Tue Dec 31, 2019 2:16 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

timber wrote:

Thanks a lot for your input. It works with %%.

rahulk01

Posted: Tue Dec 31, 2019 3:53 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

rahulk01 wrote:

timber wrote:

Thanks a lot for your input. It works with %%.

Hey, just when I think my DFDL is complete, I get stuck with something else.
I am defining a variable length Amount field, which will be terminated by either '+' or '-'. I am able to use one of them at a time and it works, but how do I use both like an enumeration?

timber

Posted: Wed Jan 01, 2020 4:51 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

That's an easy one. All DFDL delimiters (initiators, separators, terminators) are a whitespace-separated list of alternatives.

It is unusual for a data format to allow alternative delimiters with exactly the same meaning. Is there a specification for this format that you are trying to model? If so, you should check whether there are rules about when a + or a - are used. If you don't check, there is a risk that your DFDL model will fail to parse valid documents.

rahulk01

Posted: Thu Jan 02, 2020 5:37 am Post subject:

Apprentice

Joined: 26 Dec 2019
Posts: 35

timber wrote:

I agree to your point, but the requirement for me is to define a variable length amount field which would be followed by amount sign i.e. + or -. Padding of 0s at the front is also not accepted, so the only way to identify that the amount has ended is by the way of it's sign.

I have another issue now. I have been using the field 8 of my records as the record identifier. But now there are 2 different records which have the same value in field 8. So I need to put an additional check in the first record to not have the first field of the record's value as 082 (this is a hardcoded value in 1st field of 2nd record).
I have used the check as
{./RECORD_IDENTIFIER eq 'B' AND fn:contains(./ANTALL_BYTES, '082') ne TRUE}
but I am getting an error saying Xpath exression ... contains a path location that does not resolve to an element in th schema.
When I use the check as {./RECORD_IDENTIFIER eq 'B'} it works, but this check is just not enough for me.
I am trying to build a model for a very old Mainframe application consumption being used in a bank. They do not have a copybook for it, and as such does not follow much standards in their message.

fjb_saper

Posted: Thu Jan 02, 2020 6:23 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Trust me if it is a mainframe application there is a copybook somewhere... you just have to unearth it...

_________________
MQ & Broker admin

Vitor

Posted: Thu Jan 02, 2020 6:37 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

fjb_saper wrote:

Trust me if it is a mainframe application there is a copybook somewhere... you just have to unearth it...

Especially if it's that old an application, no one back in the day wrote the kind of complex parsing code you're having to build in DFDL out of original OS COBOL. I speak as someone who was writing COBOL code back in the day.

There is a copybook or copybooks. Old COBOL code "does not follow much standards in their messages" - HA!

(I'll just repeat that - HA!)

Do you have any idea how hard it is to get OS/COBOL to write out free form text? What the COBOL equivalent of

Code:

Console.WriteLine("Hello World!")

looks like? Not following standards in ancient COBOL is like a vampire installing a sun bed.

They may not know where the copybook(s) is(are) but they exist.
_________________
Honesty is the best policy.
Insanity is the best defence.

timber

Posted: Thu Jan 02, 2020 8:26 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Quote:

I have used the check as
{./RECORD_IDENTIFIER eq 'B' AND fn:contains(./ANTALL_BYTES, '082') ne TRUE}
but I am getting an error saying Xpath exression ... contains a path location that does not resolve to an element in th schema.

I think it is risky to write that expression without parentheses to force your intended evaluation order. I would write it thus:

Code:

{./RECORD_IDENTIFIER eq 'B' AND (fn:contains(./ANTALL_BYTES, '082') ne TRUE)}

Also, most programmers would use the NOT function instead of writing ' ne TRUE'.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parser for unbounded records with delimited fields

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP