Author |
Message
|
rahulk01 |
Posted: Thu Dec 26, 2019 11:53 am Post subject: DFDL parser for unbounded records with delimited fields |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
Hi,
I have to generate a DFDL parser for a format which contains some unbounded records within a sequence, like following:
Transaction
sequence
Record1 1,1
Record2 0,unbounded
Record3 1,2
Record4 1,1
end of sequence
end of Transaction
The parser that I have created looks like this:
<xsd:element ibmSchExtn:docRoot="true" name="Message">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:outputNewLine="{$dfdl:outputNewLine}" dfdl:terminator="%CR;%LF;" name="FileHeader" type="HeaderRecord"/>
<xsd:element name="Bundle">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;" name="BundleHeader" type="BundleHeaderRecord"/>
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="unbounded" name="Transactions" type="Transaction"/>
<xsd:element dfdl:terminator="%CR;%LF;" name="BundleTrailer" type="BundleTrailerRecord"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="" name="FileTrailer" type="TrailerRecord"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="Transaction">
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;" name="BelopInfo" type="BelopRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="2" minOccurs="0" name="AccountPrintInfo" type="AccountPrintRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="KidInfo" type="KIDRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="AddressInfo" type="AddressRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="MessageInfo" type="MessageRecord"/>
</xsd:sequence>
</xsd:complexType>
But when I am trying to parse a message which has multiple KidInfo, it parses the first occurence as KidInfo and the next one as AddressInfo and fails (since the structures are different).
I am not sure if I need to add a descriptor in each record to identify the record while parsing. If yes, then I am not sure on how to do it.
Actually the 8th field in each record (called RECORD-IDENTIFIER) has the value which determines what record it is, but I am not sure on how to use it, as this is my first DFDL parser.
Any help would be greatly appreciated.
BR
Rahul |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Dec 27, 2019 7:04 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
You need to add a DFDL discriminator for each record type, and link that to the record identifier you described. Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rahulk01 |
Posted: Fri Dec 27, 2019 10:18 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
fjb_saper wrote: |
You need to add a DFDL discriminator for each record type, and link that to the record identifier you described. Have fun  |
Thanks for your reply.
I added the DFDL discriminator, initially for 2 records, AccountPrintInfo and KidInfo.
See below the schema
Code: |
<xsd:complexType name="Transaction">
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;" name="BelopInfo" type="BelopRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" maxOccurs="2" minOccurs="0" name="AccountPrintInfo" type="AccountPrintRecord">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:discriminator>{fn:contains(/Message/Bundle/Transactions/AccountPrintInfo/RECORD_IDENTIFIER , 'X')}</dfdl:discriminator>
</xsd:appinfo>
</xsd:annotation>
</xsd:element>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="KidInfo" type="KIDRecord">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:discriminator>{/Message/Bundle/Transactions/KidInfo/RECORD_IDENTIFIER eq 'U'}</dfdl:discriminator>
</xsd:appinfo>
</xsd:annotation>
</xsd:element>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="AddressInfo" type="AddressRecord"/>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="2" name="MessageInfo" type="MessageRecord"/>
</xsd:sequence>
</xsd:complexType> |
AccountPrintInfo record has been defined to have 0-2 occurence and the next record KidInfo can have 0-unbounded occurrence.
The message that I used had 1 AccountInfo and the next record was a KidInfo, but during the parsing
The Model created 1 AccountPrintInfo successfully and the next record (which was a KidInfo) again got parsed as AccountPrintInfo, even though the next record had the RECORD_IDENTIFIER field as 'U'. I am not sure what am I missing.
Thanks in advance for any help. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Dec 27, 2019 11:43 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Look at the tutorial for discriminators. RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rahulk01 |
Posted: Sat Dec 28, 2019 3:13 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
fjb_saper wrote: |
Look at the tutorial for discriminators. RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!  |
Thanks for your lead. I was able to achieve what I was trying to. The XPath of Record_Identifier in the discriminator was the issue, when I used it as ./RECORD_IDENTIFIER instead, it worked for me.
And by the way, I did not post the complete schema in my posts, to save space. RECORD_IDENTIFIER is defined as a fixed length field inside the complex types for ACCOUNTINFO, KIDINFO and many others. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Dec 30, 2019 6:02 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
rahulk01 wrote: |
fjb_saper wrote: |
Look at the tutorial for discriminators. RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!  |
Thanks for your lead. I was able to achieve what I was trying to. The XPath of Record_Identifier in the discriminator was the issue, when I used it as ./RECORD_IDENTIFIER instead, it worked for me.
And by the way, I did not post the complete schema in my posts, to save space. RECORD_IDENTIFIER is defined as a fixed length field inside the complex types for ACCOUNTINFO, KIDINFO and many others. |
Glad I could help and thanks for sharing the solution.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
rahulk01 |
Posted: Mon Dec 30, 2019 10:11 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
fjb_saper wrote: |
rahulk01 wrote: |
fjb_saper wrote: |
Look at the tutorial for discriminators. RECORD_IDENTIFIER isn't defined anywhere!!! How do you expect the system to recognize the record if the identifier field for the record isn't defined anywhere on the record!!!  |
Thanks for your lead. I was able to achieve what I was trying to. The XPath of Record_Identifier in the discriminator was the issue, when I used it as ./RECORD_IDENTIFIER instead, it worked for me.
And by the way, I did not post the complete schema in my posts, to save space. RECORD_IDENTIFIER is defined as a fixed length field inside the complex types for ACCOUNTINFO, KIDINFO and many others. |
Glad I could help and thanks for sharing the solution.  |
Thought I was done, but then landed up in another problem. I am building a DFDL schema where some elements are defined to be delimited with '%'. But when I use % in the terminator for the element, I get an error saying 'CTDV1438E : DFDL property 'terminator' contains an invalid entity '%'. A valid entity must obey pattern ['%#' [0-9]+ ';' | '%#x' [0-9a-fA-F]+ ';' | '%#r' [0-9a-fA-F] (2)';' | '%' <name> ';']. '.
Please help me to identify on how to set % as a delimiter, and what escape should I use. I have tried setting the delimiter as "%", {%, "{%" but none worked |
|
Back to top |
|
 |
timber |
Posted: Mon Dec 30, 2019 3:31 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
If you need to use the character % in a delimiter or initiator string, just use the string %%.
In case it helps, the DFDL specification is here: https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
The section that describes String Literals is 6.3.1.2. You may find it useful in future if you need to represent control characters or raw byte values in your DFDL models. |
|
Back to top |
|
 |
rahulk01 |
Posted: Tue Dec 31, 2019 2:16 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
timber wrote: |
If you need to use the character % in a delimiter or initiator string, just use the string %%.
In case it helps, the DFDL specification is here: https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
The section that describes String Literals is 6.3.1.2. You may find it useful in future if you need to represent control characters or raw byte values in your DFDL models. |
Thanks a lot for your input. It works with %%.  |
|
Back to top |
|
 |
rahulk01 |
Posted: Tue Dec 31, 2019 3:53 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
rahulk01 wrote: |
timber wrote: |
If you need to use the character % in a delimiter or initiator string, just use the string %%.
In case it helps, the DFDL specification is here: https://www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
The section that describes String Literals is 6.3.1.2. You may find it useful in future if you need to represent control characters or raw byte values in your DFDL models. |
Thanks a lot for your input. It works with %%.  |
Hey, just when I think my DFDL is complete, I get stuck with something else.
I am defining a variable length Amount field, which will be terminated by either '+' or '-'. I am able to use one of them at a time and it works, but how do I use both like an enumeration? |
|
Back to top |
|
 |
timber |
Posted: Wed Jan 01, 2020 4:51 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
That's an easy one. All DFDL delimiters (initiators, separators, terminators) are a whitespace-separated list of alternatives.
It is unusual for a data format to allow alternative delimiters with exactly the same meaning. Is there a specification for this format that you are trying to model? If so, you should check whether there are rules about when a + or a - are used. If you don't check, there is a risk that your DFDL model will fail to parse valid documents. |
|
Back to top |
|
 |
rahulk01 |
Posted: Thu Jan 02, 2020 5:37 am Post subject: |
|
|
Apprentice
Joined: 26 Dec 2019 Posts: 35
|
timber wrote: |
That's an easy one. All DFDL delimiters (initiators, separators, terminators) are a whitespace-separated list of alternatives.
It is unusual for a data format to allow alternative delimiters with exactly the same meaning. Is there a specification for this format that you are trying to model? If so, you should check whether there are rules about when a + or a - are used. If you don't check, there is a risk that your DFDL model will fail to parse valid documents. |
I agree to your point, but the requirement for me is to define a variable length amount field which would be followed by amount sign i.e. + or -. Padding of 0s at the front is also not accepted, so the only way to identify that the amount has ended is by the way of it's sign.
I have another issue now. I have been using the field 8 of my records as the record identifier. But now there are 2 different records which have the same value in field 8. So I need to put an additional check in the first record to not have the first field of the record's value as 082 (this is a hardcoded value in 1st field of 2nd record).
I have used the check as
{./RECORD_IDENTIFIER eq 'B' AND fn:contains(./ANTALL_BYTES, '082') ne TRUE}
but I am getting an error saying Xpath exression ... contains a path location that does not resolve to an element in th schema.
When I use the check as {./RECORD_IDENTIFIER eq 'B'} it works, but this check is just not enough for me.
I am trying to build a model for a very old Mainframe application consumption being used in a bank. They do not have a copybook for it, and as such does not follow much standards in their message. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jan 02, 2020 6:23 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Trust me if it is a mainframe application there is a copybook somewhere... you just have to unearth it...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Vitor |
Posted: Thu Jan 02, 2020 6:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
fjb_saper wrote: |
Trust me if it is a mainframe application there is a copybook somewhere... you just have to unearth it...  |
Especially if it's that old an application, no one back in the day wrote the kind of complex parsing code you're having to build in DFDL out of original OS COBOL. I speak as someone who was writing COBOL code back in the day.
There is a copybook or copybooks. Old COBOL code "does not follow much standards in their messages" - HA!
(I'll just repeat that - HA!)
Do you have any idea how hard it is to get OS/COBOL to write out free form text? What the COBOL equivalent of
Code: |
Console.WriteLine("Hello World!") |
looks like? Not following standards in ancient COBOL is like a vampire installing a sun bed.
They may not know where the copybook(s) is(are) but they exist. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
timber |
Posted: Thu Jan 02, 2020 8:26 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
I have used the check as
{./RECORD_IDENTIFIER eq 'B' AND fn:contains(./ANTALL_BYTES, '082') ne TRUE}
but I am getting an error saying Xpath exression ... contains a path location that does not resolve to an element in th schema. |
I think it is risky to write that expression without parentheses to force your intended evaluation order. I would write it thus:
Code: |
{./RECORD_IDENTIFIER eq 'B' AND (fn:contains(./ANTALL_BYTES, '082') ne TRUE)} |
Also, most programmers would use the NOT function instead of writing ' ne TRUE'. |
|
Back to top |
|
 |
|