ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexWebSphere Message Broker SupportDFDL parsing error when first array item is empty string

Post new topicReply to topic
DFDL parsing error when first array item is empty string View previous topic :: View next topic
Author Message
rekarm01
PostPosted: Wed Nov 02, 2016 8:17 pm Post subject: DFDL parsing error when first array item is empty string Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1400

I'm designing a DFDL schema for some legacy applications (using the wmb8007 toolkit). The simplified description of the message is that it's a sequence of records (pipe-delimited), where each record is a sequence of values (comma-delimited). The message structure looks something like this (labels and line-breaks added for readability):

Code:
recA[]: valueA1,valueA2,valueA3,...|
recB[]: valueB1,valueB2,valueB3,...|
recC[]: valueC1,valueC2,valueC3,...|
recD[]: valueD1,valueD2,valueD3,...|
...

Each record can have any number of values, (as determined by the number of comma-delimiters), and some of the values may be empty. Similarly, there can be a variable number of distinct records, (in theory, any number), and some of the records can also be empty (absent). The distinct records are not an array. Here is a simplified schema:

Code:
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">
    <xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>

   <xsd:element ibmSchExtn:docRoot="true" name="Records" type="Record.CONTENT"/>
   <xsd:complexType name="Record.CONTENT">
      <xsd:sequence dfdl:separator="|" dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
            <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
               maxOccurs="unbounded" minOccurs="0" name="recA" nillable="true" type="xsd:string"/>
         </xsd:sequence>
         <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
            <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
               maxOccurs="unbounded" minOccurs="0" name="recB" nillable="true" type="xsd:string"/>
         </xsd:sequence>
         <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
            <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
               maxOccurs="unbounded" minOccurs="0" name="recC" nillable="true" type="xsd:string"/>
         </xsd:sequence>
         <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
            <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
               maxOccurs="unbounded" minOccurs="0" name="recD" nillable="true" type="xsd:string"/>
         </xsd:sequence>
      </xsd:sequence>
   </xsd:complexType>

    <xsd:annotation>
    <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"/>
    </xsd:appinfo>
  </xsd:annotation>
</xsd:schema>

The first issue is that some of the records that should be absent are <nil> instead, but that's a minor problem. The second issue is more fatal: if the first value in a record is the empty string, then I get a parsing error. For example, with the input message "|valueB1,valueB2,|valueC1,,valueC3|,valueD2,valueD3", recA[1] should preferably be absent, not <nil>, and recD[2] should not cause a fatal error:

Quote:
fatal: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'sequence', xpath: '/Records[1]'.

How can I fix the schema to resolve these issues?
Back to top
View user's profile Send private message
shanson
PostPosted: Thu Nov 03, 2016 4:07 am Post subject: Reply with quote

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

You have defined all your elements as nillable="true" with dfdl:nilValue as the empty string. For your example, that causes the parser to assign <nil> to recA[1].

Not sure why an empty recD[1] is causing an error. You are using v8007 so it might be a bug that has been fixed. Please try on v10.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Thu Nov 03, 2016 7:04 pm Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1400

shanson wrote:
You have defined all your elements as nillable="true" with dfdl:nilValue as the empty string. For your example, that causes the parser to assign <nil> to recA[1].

Even with minOccurs="0"? The purpose of the <nil> occurrences is to preserve the indices of any non-empty occurrences that might follow, (as explained in the DFDL specification, "16.8 Sparse Arrays"). But when there aren't any occurrences that follow, then there's no need for the <nil> occurrence. Is there a better way to model that?

shanson wrote:
Not sure why an empty recD[1] is causing an error. You are using v8007 so it might be a bug that has been fixed. Please try on v10.

An empty recD[1] does not cause a fatal error in the IIB v10006 toolkit; it gets set to <nil>, as expected. So it probably is a bug in v8007. Unfortunately, even though we are in the process of migrating our brokers from wmb8 to iib10, it's going to take a little while longer. In the meantime, is there some sort of workaround we could implement for wmb8?
Back to top
View user's profile Send private message
shanson
PostPosted: Fri Nov 04, 2016 1:30 am Post subject: Reply with quote

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

The problem is that the '|' may mean something different at a lower level in the model. The DFDL parser can't assume that it is the next separator at the current level. So it descends into the sequence and it is only when it parses recA[1] that it can be sure about the '|'. (See DFDL spec 9.3.2.2). At that point, nil literal processing for recA[1] occurs (which has a higher priority than minOccurs checking).

I would change the model so it looks like:

Code:

Records
  RecordA
    recA
  RecordB
    recB
  ...   


Then you can take advantage of a new capability in DFDL in v10 which allows a complex element to be nil. Here, the DFDL parser explicitly checks if the next char is the separator at that level, and if it is assigns <nil> to the complex element. Again see 9.3.2.2. In your example you would then get:

Code:

Records
  RecordA : <nil>
  RecordB
    recB : valueB1
    recB : valueB2
  ...   


I suspect that adding the extra complex element in v8 will also fix the bug you are seeing with the empty recD[1]. I think the bug is due to the nested sequences.

There is a way in the DFDL spec to look ahead and decide whether an element exists or not. It's a discriminator with a 'testPattern' (ie, regex) instead of a 'test' (ie, XPath). But IBM DFDL does not support that yet.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Fri Nov 04, 2016 3:53 pm Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1400

shanson wrote:
I would change the model so it looks like:

Code:
Records
  RecordA
    recA
  RecordB
    recB
  ...

I don't see how adding an extra layer of elements helps with either issue.

shanson wrote:
Then you can take advantage of a new capability in DFDL in v10 which allows a complex element to be nil. ... In your example you would then get:

Code:
Records
  RecordA : <nil>
  RecordB
    recB : valueB1
    recB : valueB2
  ...

This seems to be trading one unnecessary <nil> element for another unnecessary <nil> element. The <nil> value for RecordA is not serving as a place-holder in an array, so I'd want it to be empty/absent/missing, for the same reasons as for RecA[1] in the previous example. But if there's currently no way to model that, then I'll have to set this issue aside for now.

shanson wrote:
I suspect that adding the extra complex element in v8 will also fix the bug you are seeing with the empty recD[1]. I think the bug is due to the nested sequences.

No, it doesn't fix the bug with the empty recD[1]. I added an element to the v8 schema for RecordD as suggested, (and also similar elements for RecordA, RecordB, and RecordC):

Code:
...
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="1" minOccurs="0" name="recordD">
   <xsd:complexType>
      <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
            maxOccurs="unbounded" minOccurs="0" name="recD" nillable="true" type="xsd:string"/>
      </xsd:sequence>
   </xsd:complexType>
</xsd:element>
...

But I still get the same error:

Code:
error: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'recordD', xpath: '/Records[1]/recordD[1]'.
Back to top
View user's profile Send private message
timber
PostPosted: Sat Nov 05, 2016 3:45 am Post subject: Reply with quote

Grand Master

Joined: 25 Aug 2015
Posts: 1160

Try modelling the pipe character as a terminator instead of a separator; that's what the CSV wizard does, and it avoids the 'unexpected postfix separator' error. You will probably need to set the 'documentFinalTerminatorCanBeMissing' property to make that work.
Back to top
View user's profile Send private message
rekarm01
PostPosted: Sat Nov 05, 2016 10:16 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1400

timber wrote:
Try modelling the pipe character as a terminator instead of a separator; that's what the CSV wizard does, and it avoids the 'unexpected postfix separator' error. You will probably need to set the 'documentFinalTerminatorCanBeMissing' property to make that work.

For the given input message "|valueB1,valueB2,|valueC1,,valueC3|,valueD2,valueD3", modelling the pipe delimiter as a terminator (for recA, recB, recC, and recD) does not seem to make a difference:

Code:
<xsd:sequence>
   <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix" dfdl:terminator="|">
      <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
         maxOccurs="unbounded" minOccurs="0" name="recA" nillable="true" type="xsd:string"/>
   </xsd:sequence>
   ...
</xsd:sequence>

wmb8 still produces the 'unexpected postfix separator' error, (while iib10 parses successfully):

Code:
fatal: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'sequence', xpath: '/Records[1]'.
Back to top
View user's profile Send private message
shanson
PostPosted: Mon Nov 07, 2016 12:25 am Post subject: Reply with quote

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

If you are modelling recA etc as nillable then I think you are stuck with the side-effect of getting a nil at some level in the tree. As I said, you need a discriminator with a testPattern but that is not yet in IBM DFDL.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Mon Nov 07, 2016 5:17 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

shanson wrote:
Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

....
Maybe...
....

ported to v9 is more likely, as there may be significant changes in the code between v8 and v10.
_________________
chmod -R ugo-wx /
Back to top
View user's profile Send private message
rekarm01
PostPosted: Tue Nov 08, 2016 2:58 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 1400

shanson wrote:
Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

I can raise a PMR for the error. In the meantime, it's looking like there's not a workaround for this. Thanks for your time.

shanson wrote:
If you are modelling recA etc as nillable then I think you are stuck with the side-effect of getting a nil at some level in the tree.

Then I'll have to set that issue aside for the time being.

mqjeff wrote:
ported to v9 is more likely

As we are in the process of migrating our brokers from wmb8 to iib10, I didn't test whether the error occurs in v9; it's possible that it's fixed there already.
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexWebSphere Message Broker SupportDFDL parsing error when first array item is empty string
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.