|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
DFDL parsing csv file error |
« View previous topic :: View next topic » |
Author |
Message
|
icyblue7 |
Posted: Fri Sep 02, 2016 5:28 am Post subject: DFDL parsing csv file error |
|
|
Newbie
Joined: 02 Sep 2016 Posts: 3
|
Hello,
I created a dfdl model with CommaSeparatedFormat to parse the sample csv file below
Code: |
"1467 27497 89062","A",116274,8,"17380923"
"1467 27497 89062","F","3","231333302 863000","","2"
"1467 27497 89062","F","2","231333325 051000","","2"
"1467 27497 89062","B","3","231333302 865000","","2","1"
"1467 27497 89062","Z","9360722 16233",6 |
Here is the XSD file
Code: |
<xsd:import namespace=".../CommaSeparatedFormat" schemaLocation="IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source=".../dfdl/">
<dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:outputNewLine="%LF;" ibmSchExtn:docRoot="true" name="SAMPLE">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;" dfdl:separatorSuppressionPolicy="anyEmpty">
<xsd:element name="A_Line">
<xsd:complexType>
<xsd:sequence dfdl:outputNewLine="%LF;" dfdl:separator=",">
<xsd:element default="" minOccurs="0" name="ACCT" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER4" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="idoc">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;" >
<xsd:element maxOccurs="unbounded" minOccurs="0" name="lines">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:choice dfdl:terminator="">
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',F,')}" dfdl:outputNewLine="%LF;" dfdl:terminator="" name="F_Line">
<xsd:complexType>
<xsd:sequence>
<xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Y,')}" dfdl:outputNewLine="%LF;" dfdl:terminator="" name="Y_Line">
<xsd:complexType>
<xsd:sequence>
<xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',B,')}" name="B_Line">
<xsd:complexType>
<xsd:sequence>
<xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Footer">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:choice>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Y,')}" name="Y_Line">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Z,')}" dfdl:outputNewLine="%LF;" name="Z_Line">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
|
And here is the exception:
Code: |
info: Calculating value of DFDL property 'initiator' using DFDL expression '{fn:concat(/SAMPLE/A_Line/ACCT,',F,')}'. The calculated value was '1467 27497 89062,F,'
[dfdl = /LIBRARY/SAMPLE.xsd, scd = #xscd(/schemaElement::SAMPLE/type::0/model::sequence/schemaElement::idoc/type::0/model::sequence/schemaElement::lines/type::0/model::sequence/model::choice/schemaElement::F_Line), 166]
error: CTDP3041E: Initiator '1467' not found at offset '43' for element '/SAMPLE[1]/idoc[1]/lines[1]/F_Line[1]'.
info: Offset: 43. Parser was unable to resolve data on the current branch and will evaluate the next available branch beginning at offset '43' owned by the 'choice' group contained within element 'sequence'.
[dfdl = /LIBRARY/SAMPLE.xsd, scd = #xscd(/schemaElement::SAMPLE/type::0/model::sequence/schemaElement::idoc/type:
:0/model::sequence/schemaElement::lines/type::0/model::sequence/model::choice), 209] |
|
|
Back to top |
|
 |
Vitor |
Posted: Fri Sep 02, 2016 5:37 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Why are you using initiators? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
icyblue7 |
Posted: Fri Sep 02, 2016 6:28 am Post subject: |
|
|
Newbie
Joined: 02 Sep 2016 Posts: 3
|
this is the structure of the input:
- first line is the header (record "A")
- detail contains 1..* line items (record "F") and 1..* additional Record "Y"
- summary record "B"
- last line is the trailer (record "Z") |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Sep 02, 2016 6:39 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
You should construct your model this way
- Acct#
- Choice Group
- 'A' header record
- 'F' record
- 'B record
- 'Z'record
The F record should be a complex type that repeats an unlimited # of times. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
icyblue7 |
Posted: Fri Sep 02, 2016 6:45 am Post subject: |
|
|
Newbie
Joined: 02 Sep 2016 Posts: 3
|
mqjeff wrote: |
You should construct your model this way
- Acct#
- Choice Group
- 'A' header record
- 'F' record
- 'B record
- 'Z'record
The F record should be a complex type that repeats an unlimited # of times. |
but my input is like this
"1467 27497 89062","A",116274,8,"17380923"
"1467 27497 89062","F","3","231333302 863000","","2"
"1467 27497 89062","F","2","231333325 051000","","2"
"1467 27497 89062","B","3","231333302 865000","","2","1"
"1467 27497 89062","Z","9360722 16233",6
so I tried add initiators like this fn:concat(/SAMPLE/A_Line/ACCT,',Y,') |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Sep 02, 2016 6:53 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
icyblue7 wrote: |
but my input is like this
"1467 27497 89062","A",116274,8,"17380923" |
So that's
"1467 27497 89062" == Acct #
'A" = record discriminator
116274,8,"17380923" = fields in Record Type A
The structure I showed always has a Acct # as the first field.
I forgot to indicate that the entire structure repeats an unlimited # of times (until end of file) _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
Vitor |
Posted: Fri Sep 02, 2016 7:09 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
icyblue7 wrote: |
but my input is like this
"1467 27497 89062","A",116274,8,"17380923" |
So that's
"1467 27497 89062" == Acct #
'A" = record discriminator
116274,8,"17380923" = fields in Record Type A
The structure I showed always has a Acct # as the first field.
I forgot to indicate that the entire structure repeats an unlimited # of times (until end of file) |
You don't need initiators to model this. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Sep 02, 2016 8:16 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
What you want to use and mqjeff has given you a huge hint are discriminators...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
timber |
Posted: Sat Sep 03, 2016 2:17 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
There are two ways to solve this problem:
1. Organise the CSV records into structures using DFDL. To do this you need to use discriminators in the model because the field that indicates the record type is not the first field of the record.
2. Parse the CSV as an unbounded list of records and do the record type recognition in the message flow logic.
Personally I would probably choose 2, unless
a) the input CSV is large, so that Records and Elements needs to be set to 'Parsed Record Sequence'
and
b) there might be a need to run more than one instance of the message flow
( because then it would not be possible to use a SHARED variable to hold the record type ). |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Sep 06, 2016 4:36 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
timber wrote: |
There are two ways to solve this problem:
1. Organise the CSV records into structures using DFDL. To do this you need to use discriminators in the model because the field that indicates the record type is not the first field of the record. |
I believe that my suggested model structure addressed this? The acct # is a fixed field and always occurs. The choice structure then resolves based on the first record of the remaining line? _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Tue Sep 06, 2016 6:55 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
@mqjeff: Consider these options:
a) Use the built-in CSV wizard. Accept that the message tree will be an unbounded list of records. The record type of each record must be detected by inspecting the value of its second child.
b) Use your suggested model structure. Accept that the message tree will be an unbounded list of records. The record type of each record must be detected by inspecting the name of its second child.
c) Use discriminators so that DFDL builds a structured message tree. Accept that the DFDL model generated by the CSV wizard will need to be heavily customized. On the other hand, Parsed Record Sequence can now be used to process a huge input file without requiring the message flow to hold the record type in a SHARED variable.
I think b) requires adjustments to the DFDL and produces a more complex message tree without offering any compensations. But I may be missing something - it wouldn't be the first time. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Sep 06, 2016 7:31 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Perhaps my structure is a bit too nested.
My intent is that the Acct # is a peer of the choice structure, as part of an ordered sequence. The ordered sequence would repeat until the end of file.
The choice structure would be resolved by using the first field of the remaining record (everything *after* the acct#) as a discriminator/indicator/whatever the correct DFDL name is.
Perhaps the A, B, and Z records could be removed from the choice, and modeled as records that do not repeat. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
shanson |
Posted: Wed Sep 07, 2016 9:32 am Post subject: |
|
|
 Partisan
Joined: 17 Oct 2003 Posts: 344 Location: IBM Hursley
|
Quote: |
Perhaps the A, B, and Z records could be removed from the choice, and modeled as records that do not repeat |
That's how I would do it. The choice model is flexible but if you want strict validation then it won't detect out-of-order records.
Quote: |
- detail contains 1..* line items (record "F") and 1..* additional Record "Y |
Do all the Fs appear before all the Ys or are they interleaved? If they all appear before, you don't need a choice at all. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|