Author |
Message
|
vishnurajnr |
Posted: Tue Sep 03, 2013 12:11 am Post subject: DFDL End of message parsing issue |
|
|
 Centurion
Joined: 08 Aug 2011 Posts: 134 Location: Trivandrum
|
I have a message structure (Flat file, tab delimited) to be modelled using DFDL.
Message structure is as follows:
Code: |
Message-Repeating
Header-Mandatory (Start with 'H' Followed by Tab)
Address(Office)-Mandatory (Start with 'A' followed by TAB Followed by 'O' followed by TAB)
Address(Home)-Optional (Start with 'A' followed by TAB Followed by 'H' followed by TAB) |
This model is parsing the message successfully, But expecting the 'New line- CR LF' at the end. How can I tell the parser to finish parsing if no new line or any other delimiter found ? Means, this DFDL is parsing successfully if i have the CRLF at the end of message. else can not parse the last record.
My DFDL Schema is as below:
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="../IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:complexType name="Address">
<xsd:sequence dfdl:initiator="" dfdl:separator="%HT;">
<xsd:element name="HouseNo" type="xsd:string"/>
<xsd:element name="Street" type="xsd:string"/>
<xsd:element name="Pin" type="xsd:string"/>
<xsd:element name="City" type="xsd:string"/>
<xsd:element name="State" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Header">
<xsd:sequence dfdl:separator="%HT;">
<xsd:element name="Fname" type="xsd:string"/>
<xsd:element name="Lname" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element ibmSchExtn:docRoot="true" name="EmployeeFlatMsg">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:initiator="" dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" name="Mesage">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:initiator="H%HT;" dfdl:terminator="%CR;%LF;" name="Header" type="Header"/>
<xsd:element dfdl:initiator="A%HT;O%HT;" dfdl:terminator="%CR;%LF;" name="OfficeAddress" type="Address"/>
<xsd:element dfdl:initiator="A%HT;H%HT;" dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;" minOccurs="0" name="HomeAddress" type="Address"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}" encoding="US-ASCII" escapeSchemeRef="recSepFieldsFmt:RecordEscapeScheme" occursCountKind="fixed" ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"/>
<dfdl:defineFormat name="test">
<dfdl:format/>
</dfdl:defineFormat>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema>
|
Any help is highly appreciated.
Last edited by vishnurajnr on Wed Sep 04, 2013 3:23 am; edited 2 times in total |
|
Back to top |
|
 |
kimbert |
Posted: Tue Sep 03, 2013 1:19 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
You need the special property 'documentFinalTerminatorCanBeMissing'. It does exactly what the name says. You can only put this property on a DFDL format block - not on an individual element. Which makes sense, because it applies to the entire document.
Here's my ( untested ) version of your xsd:
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="../IBMdefined/RecordSeparatedFieldFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format
ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"
byteOrder="{$dfdl:byteOrder}"
encoding="US-ASCII"
escapeSchemeRef="recSepFieldsFmt:RecordEscapeScheme"
occursCountKind="fixed"
documentFinalTerminatorCanBeMissing="yes"
/>
</xsd:appinfo>
</xsd:annotation>
<xsd:complexType name="Address">
<xsd:sequence dfdl:initiator="" dfdl:separator="%HT;">
<xsd:element name="HouseNo" type="xsd:string"/>
<xsd:element name="Street" type="xsd:string"/>
<xsd:element name="Pin" type="xsd:string"/>
<xsd:element name="City" type="xsd:string"/>
<xsd:element name="State" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:complexType name="Header">
<xsd:sequence dfdl:separator="%HT;">
<xsd:element name="Fname" type="xsd:string"/>
<xsd:element name="Lname" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
<xsd:element ibmSchExtn:docRoot="true"
name="EmployeeFlatMsg"
dfdl:terminator="%CR;%LF;">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:initiator="" dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" name="Mesage">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:initiator="H%HT;" dfdl:terminator="%CR;%LF;" name="Header" type="Header"/>
<xsd:element dfdl:initiator="A%HT;O%HT;" dfdl:terminator="%CR;%LF;" name="OfficeAddress" type="Address"/>
<xsd:element dfdl:initiator="A%HT;H%HT;" dfdl:occursCountKind="implicit" minOccurs="0" name="HomeAddress" type="Address"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema> |
btw, I cleaned it up using an auto-formatter in my text editor. That's why the spacing looks a little different.
I also moved the terminator from the trailer and put it onto the root element. It seemed like a more logical place - after all, the line break is terminating the entire document, not just the trailer.
Give that a try, and let me know if it worked. And don't forget to use DFDL trace if it doesn't work  _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
vishnurajnr |
Posted: Tue Sep 03, 2013 1:55 am Post subject: |
|
|
 Centurion
Joined: 08 Aug 2011 Posts: 134 Location: Trivandrum
|
Thanks a lot Kimbert...!!!
The 'documentFinalTerminatorCanBeMissing' property fixed my issue...
BTW, can we set this property using the DFDL editor (Not using the XSD editor)? _________________ -------
A man is great by deeds, not by birth...! |
|
Back to top |
|
 |
kimbert |
Posted: Tue Sep 03, 2013 2:12 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
|
Back to top |
|
 |
vishnurajnr |
Posted: Tue Sep 03, 2013 3:41 am Post subject: |
|
|
 Centurion
Joined: 08 Aug 2011 Posts: 134 Location: Trivandrum
|
Thanks again Kimbert..!!!  _________________ -------
A man is great by deeds, not by birth...! |
|
Back to top |
|
 |
vishnurajnr |
Posted: Wed Sep 04, 2013 3:26 am Post subject: |
|
|
 Centurion
Joined: 08 Aug 2011 Posts: 134 Location: Trivandrum
|
Sorry to ask again..
The above solution work perfectly for input message parsing.
How about if we want to remove the <CR><LF> from an outputmessage parsing?
The output message generated contains the newline at the end, but how to avoid that? _________________ -------
A man is great by deeds, not by birth...! |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Sep 04, 2013 6:31 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
output messages aren't parsed, they're serialized.
In MRM the property you want is called "suppress absent element delimiters".
I haven't bothered to look up what it's called in DFDL. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Sep 04, 2013 6:46 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
The output message generated contains the newline at the end, but how to avoid that? |
That changes the problem a little.
If you specify a terminator, the DFDL serializer will always write a terminator. So you will need to
- remove the terminators from all of the fields in your model.
- define an infix separator for every sequence group in your model.
The 'documentFinalTerminatorCanBeMissing' property is now redundant ( your model does not use terminators any more ). It is harmless, so you can leave it alone if you want to.
Please note: your model will now reject any input document that contains a trailing CR/LF. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Sep 04, 2013 6:49 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
In MRM the property you want is called "suppress absent element delimiters". |
Sort of. I can see that it looks similar from some angles. But there are no 'absent elements' here. SAED is really for removing trailing delimiters in e.g. CSV records. DFDL has its own way of doing that - separationSuppressionPolicy. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Sep 04, 2013 9:10 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
kimbert wrote: |
Quote: |
The output message generated contains the newline at the end, but how to avoid that? |
That changes the problem a little.
If you specify a terminator, the DFDL serializer will always write a terminator. So you will need to
- remove the terminators from all of the fields in your model.
- define an infix separator for every sequence group in your model.
The 'documentFinalTerminatorCanBeMissing' property is now redundant ( your model does not use terminators any more ). It is harmless, so you can leave it alone if you want to.
Please note: your model will now reject any input document that contains a trailing CR/LF. |
question: forgive my ignorance, but would it not work if the terminator is defined as
"%WSP*; %CR;%LF;" and leave the documentFinalTerminatorCanBeMissing alone ? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Sep 04, 2013 9:28 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
@dogorsy: I like that solution very much. If that's ignorance then please can we have more of it.
@vishnurajnr: dogorsy's solution works because the dfdl 'terminator' property is a space-separated list of alternative values but the DFDL serializer always uses the first. The entity %WSP*; is the first in dogorsy's list of terminators, and %WSP*; is always written ( by the DFDL serializer ) as an empty string. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Sep 04, 2013 9:34 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
kimbert wrote: |
@dogorsy: I like that solution very much. If that's ignorance then please can we have more of it.
@vishnurajnr: dogorsy's solution works because the dfdl 'terminator' property is a space-separated list of alternative values but the DFDL serializer always uses the first. The entity %WSP*; is the first in dogorsy's list of terminators, and %WSP*; is always written ( by the DFDL serializer ) as an empty string. |
thanks, I was wondering whether that would screw up everything else. To start with, any other records ( before the last one ) would not have a terminator, or would they ?
and then is the parsing, any white space would be considered as a terminator ?..
and yes, ignorance in this neck of the woods is in great supply !.. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Sep 04, 2013 9:57 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
You wouldn't want to set *all* the terminators in this way. But it's a useful technique for the specific scenario of an optional-on-input and not-wanted-on-output terminator.
I would make it the terminator of the document root element. That way it wouldn't get confused with the simpler terminators used by other fields in the data format. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
dogorsy |
Posted: Wed Sep 04, 2013 10:02 am Post subject: |
|
|
Knight
Joined: 13 Mar 2013 Posts: 553 Location: Home Office
|
kimbert wrote: |
You wouldn't want to set *all* the terminators in this way. But it's a useful technique for the specific scenario of an optional-on-input and not-wanted-on-output terminator.
I would make it the terminator of the document root element. That way it wouldn't get confused with the simpler terminators used by other fields in the data format. |
oh, I see, yes, forgot about that, still, would like to try it out when I migrate from 2.1.... I love Neon... |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Sep 04, 2013 11:06 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
dogorsy wrote: |
oh, I see, yes, forgot about that, still, would like to try it out when I migrate from 2.1.... I love Neon... |
You love Neon! Perhaps you should take the advice I gave earlier to Lancelotlinc and go and sit in a darkend room preferably until you see sense. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|