MQSeries.net :: View topic - DFDL parsing csv file error

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parsing csv file error

DFDL parsing csv file error

« View previous topic :: View next topic »

Author

Message

icyblue7

Posted: Fri Sep 02, 2016 5:28 am Post subject: DFDL parsing csv file error

Newbie

Joined: 02 Sep 2016
Posts: 3

Hello,

I created a dfdl model with CommaSeparatedFormat to parse the sample csv file below

Code:

"1467 27497 89062","A",116274,8,"17380923"
"1467 27497 89062","F","3","231333302 863000","","2"
"1467 27497 89062","F","2","231333325 051000","","2"
"1467 27497 89062","B","3","231333302 865000","","2","1"
"1467 27497 89062","Z","9360722 16233",6

Here is the XSD file

Code:

<xsd:import namespace=".../CommaSeparatedFormat" schemaLocation="IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
   <xsd:appinfo source=".../dfdl/">
      <dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
   </xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:outputNewLine="%LF;" ibmSchExtn:docRoot="true" name="SAMPLE">
   <xsd:complexType>
      <xsd:sequence dfdl:separator="%LF;%WSP*;" dfdl:separatorSuppressionPolicy="anyEmpty">
<xsd:element name="A_Line">
            <xsd:complexType>
               <xsd:sequence dfdl:outputNewLine="%LF;" dfdl:separator=",">
                  <xsd:element default="" minOccurs="0" name="ACCT" type="xsd:string"/>
                  <xsd:element default="" minOccurs="0" name="FILLER1" type="xsd:string"/>
                  <xsd:element default="" minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element default="" minOccurs="0" name="FILLER4" type="xsd:string"/>
</xsd:sequence>
            </xsd:complexType>
         </xsd:element>
<xsd:element dfdl:terminator="" maxOccurs="unbounded" minOccurs="0" name="idoc">
            <xsd:complexType>
               <xsd:sequence dfdl:separator="%LF;%WSP*;" >
<xsd:element maxOccurs="unbounded" minOccurs="0" name="lines">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:choice dfdl:terminator="">
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',F,')}" dfdl:outputNewLine="%LF;" dfdl:terminator="" name="F_Line">
<xsd:complexType>
<xsd:sequence>
                  <xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
            </xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Y,')}" dfdl:outputNewLine="%LF;" dfdl:terminator="" name="Y_Line">
<xsd:complexType>
<xsd:sequence>
                  <xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',B,')}" name="B_Line">
<xsd:complexType>
<xsd:sequence>
                  <xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Footer">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
<xsd:choice>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Y,')}" name="Y_Line">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
                  <xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER3" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER4" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER5" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:initiator="{fn:concat(/SAMPLE/A_Line/ACCT,',Z,')}" dfdl:outputNewLine="%LF;" name="Z_Line">
<xsd:complexType>
<xsd:sequence dfdl:separator="%LF;%WSP*;">
                     <xsd:element minOccurs="0" name="FILLER1" type="xsd:string"/>
<xsd:element minOccurs="0" name="FILLER2" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:choice>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>

And here is the exception:

Code:

info: Calculating value of DFDL property 'initiator' using DFDL expression '{fn:concat(/SAMPLE/A_Line/ACCT,',F,')}'. The calculated value was '1467 27497 89062,F,'
[dfdl = /LIBRARY/SAMPLE.xsd, scd = #xscd(/schemaElement::SAMPLE/type::0/model::sequence/schemaElement::idoc/type::0/model::sequence/schemaElement::lines/type::0/model::sequence/model::choice/schemaElement::F_Line), 166]

error: CTDP3041E: Initiator '1467' not found at offset '43' for element '/SAMPLE[1]/idoc[1]/lines[1]/F_Line[1]'.

info: Offset: 43. Parser was unable to resolve data on the current branch and will evaluate the next available branch beginning at offset '43' owned by the 'choice' group contained within element 'sequence'.
[dfdl = /LIBRARY/SAMPLE.xsd, scd = #xscd(/schemaElement::SAMPLE/type::0/model::sequence/schemaElement::idoc/type:
:0/model::sequence/schemaElement::lines/type::0/model::sequence/model::choice), 209]

Vitor

Posted: Fri Sep 02, 2016 5:37 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Why are you using initiators?
_________________
Honesty is the best policy.
Insanity is the best defence.

icyblue7

Posted: Fri Sep 02, 2016 6:28 am Post subject:

Newbie

Joined: 02 Sep 2016
Posts: 3

this is the structure of the input:
- first line is the header (record "A")
- detail contains 1..* line items (record "F") and 1..* additional Record "Y"
- summary record "B"
- last line is the trailer (record "Z")

mqjeff

Posted: Fri Sep 02, 2016 6:39 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

You should construct your model this way

Acct#
- Choice Group
  - 'A' header record
  - 'F' record
    - 'Y' optional record
  - 'B record
  - 'Z'record

The F record should be a complex type that repeats an unlimited # of times.
_________________
chmod -R ugo-wx /

icyblue7

Posted: Fri Sep 02, 2016 6:45 am Post subject:

Newbie

Joined: 02 Sep 2016
Posts: 3

mqjeff wrote:

You should construct your model this way

Acct#
- Choice Group
  - 'A' header record
  - 'F' record
    - 'Y' optional record
  - 'B record
  - 'Z'record

The F record should be a complex type that repeats an unlimited # of times.

but my input is like this

"1467 27497 89062","A",116274,8,"17380923"
"1467 27497 89062","F","3","231333302 863000","","2"
"1467 27497 89062","F","2","231333325 051000","","2"
"1467 27497 89062","B","3","231333302 865000","","2","1"
"1467 27497 89062","Z","9360722 16233",6

so I tried add initiators like this fn:concat(/SAMPLE/A_Line/ACCT,',Y,')

mqjeff

Posted: Fri Sep 02, 2016 6:53 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

icyblue7 wrote:

but my input is like this

"1467 27497 89062","A",116274,8,"17380923"

So that's
"1467 27497 89062" == Acct #
'A" = record discriminator
116274,8,"17380923" = fields in Record Type A

The structure I showed always has a Acct # as the first field.

I forgot to indicate that the entire structure repeats an unlimited # of times (until end of file)
_________________
chmod -R ugo-wx /

Vitor

Posted: Fri Sep 02, 2016 7:09 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqjeff wrote:

icyblue7 wrote:

but my input is like this

"1467 27497 89062","A",116274,8,"17380923"

You don't need initiators to model this.
_________________
Honesty is the best policy.
Insanity is the best defence.

fjb_saper

Posted: Fri Sep 02, 2016 8:16 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

What you want to use and mqjeff has given you a huge hint are discriminators...

_________________
MQ & Broker admin

timber

Posted: Sat Sep 03, 2016 2:17 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

There are two ways to solve this problem:
1. Organise the CSV records into structures using DFDL. To do this you need to use discriminators in the model because the field that indicates the record type is not the first field of the record.
2. Parse the CSV as an unbounded list of records and do the record type recognition in the message flow logic.

Personally I would probably choose 2, unless
a) the input CSV is large, so that Records and Elements needs to be set to 'Parsed Record Sequence'
and
b) there might be a need to run more than one instance of the message flow
( because then it would not be possible to use a SHARED variable to hold the record type ).

mqjeff

Posted: Tue Sep 06, 2016 4:36 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

timber wrote:

I believe that my suggested model structure addressed this? The acct # is a fixed field and always occurs. The choice structure then resolves based on the first record of the remaining line?
_________________
chmod -R ugo-wx /

timber

Posted: Tue Sep 06, 2016 6:55 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

@mqjeff: Consider these options:
a) Use the built-in CSV wizard. Accept that the message tree will be an unbounded list of records. The record type of each record must be detected by inspecting the value of its second child.
b) Use your suggested model structure. Accept that the message tree will be an unbounded list of records. The record type of each record must be detected by inspecting the name of its second child.
c) Use discriminators so that DFDL builds a structured message tree. Accept that the DFDL model generated by the CSV wizard will need to be heavily customized. On the other hand, Parsed Record Sequence can now be used to process a huge input file without requiring the message flow to hold the record type in a SHARED variable.

I think b) requires adjustments to the DFDL and produces a more complex message tree without offering any compensations. But I may be missing something - it wouldn't be the first time.

mqjeff

Posted: Tue Sep 06, 2016 7:31 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Perhaps my structure is a bit too nested.

My intent is that the Acct # is a peer of the choice structure, as part of an ordered sequence. The ordered sequence would repeat until the end of file.

The choice structure would be resolved by using the first field of the remaining record (everything *after* the acct#) as a discriminator/indicator/whatever the correct DFDL name is.

Perhaps the A, B, and Z records could be removed from the choice, and modeled as records that do not repeat.
_________________
chmod -R ugo-wx /

shanson

Posted: Wed Sep 07, 2016 9:32 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

Quote:

Perhaps the A, B, and Z records could be removed from the choice, and modeled as records that do not repeat

That's how I would do it. The choice model is flexible but if you want strict validation then it won't detect out-of-order records.

Quote:

- detail contains 1..* line items (record "F") and 1..* additional Record "Y

Do all the Fs appear before all the Ys or are they interleaved? If they all appear before, you don't need a choice at all.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parsing csv file error

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP