MQSeries.net :: View topic - DFDL parsing error when first array item is empty string

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parsing error when first array item is empty string

DFDL parsing error when first array item is empty string

« View previous topic :: View next topic »

Author

Message

rekarm01

Posted: Wed Nov 02, 2016 8:17 pm Post subject: DFDL parsing error when first array item is empty string

Grand Master

Joined: 25 Jun 2008
Posts: 1415

I'm designing a DFDL schema for some legacy applications (using the wmb8007 toolkit). The simplified description of the message is that it's a sequence of records (pipe-delimited), where each record is a sequence of values (comma-delimited). The message structure looks something like this (labels and line-breaks added for readability):

Code:

recA[]: valueA1,valueA2,valueA3,...|
recB[]: valueB1,valueB2,valueB3,...|
recC[]: valueC1,valueC2,valueC3,...|
recD[]: valueD1,valueD2,valueD3,...|
...

Each record can have any number of values, (as determined by the number of comma-delimiters), and some of the values may be empty. Similarly, there can be a variable number of distinct records, (in theory, any number), and some of the records can also be empty (absent). The distinct records are not an array. Here is a simplified schema:

Code:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat" schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd"/>

<xsd:element ibmSchExtn:docRoot="true" name="Records" type="Record.CONTENT"/>
<xsd:complexType name="Record.CONTENT">
   <xsd:sequence dfdl:separator="|" dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
      <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
            maxOccurs="unbounded" minOccurs="0" name="recA" nillable="true" type="xsd:string"/>
      </xsd:sequence>
      <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
            maxOccurs="unbounded" minOccurs="0" name="recB" nillable="true" type="xsd:string"/>
      </xsd:sequence>
      <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
            maxOccurs="unbounded" minOccurs="0" name="recC" nillable="true" type="xsd:string"/>
      </xsd:sequence>
      <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
         <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
            maxOccurs="unbounded" minOccurs="0" name="recD" nillable="true" type="xsd:string"/>
      </xsd:sequence>
   </xsd:sequence>
</xsd:complexType>

<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
</xsd:schema>

The first issue is that some of the records that should be absent are <nil> instead, but that's a minor problem. The second issue is more fatal: if the first value in a record is the empty string, then I get a parsing error. For example, with the input message "|valueB1,valueB2,|valueC1,,valueC3|,valueD2,valueD3", recA[1] should preferably be absent, not <nil>, and recD[2] should not cause a fatal error:

Quote:

fatal: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'sequence', xpath: '/Records[1]'.

How can I fix the schema to resolve these issues?

shanson

Posted: Thu Nov 03, 2016 4:07 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

You have defined all your elements as nillable="true" with dfdl:nilValue as the empty string. For your example, that causes the parser to assign <nil> to recA[1].

Not sure why an empty recD[1] is causing an error. You are using v8007 so it might be a bug that has been fixed. Please try on v10.

rekarm01

Posted: Thu Nov 03, 2016 7:04 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

shanson wrote:

You have defined all your elements as nillable="true" with dfdl:nilValue as the empty string. For your example, that causes the parser to assign <nil> to recA[1].

Even with minOccurs="0"? The purpose of the <nil> occurrences is to preserve the indices of any non-empty occurrences that might follow, (as explained in the DFDL specification, "16.8 Sparse Arrays"). But when there aren't any occurrences that follow, then there's no need for the <nil> occurrence. Is there a better way to model that?

shanson wrote:

Not sure why an empty recD[1] is causing an error. You are using v8007 so it might be a bug that has been fixed. Please try on v10.

An empty recD[1] does not cause a fatal error in the IIB v10006 toolkit; it gets set to <nil>, as expected. So it probably is a bug in v8007. Unfortunately, even though we are in the process of migrating our brokers from wmb8 to iib10, it's going to take a little while longer. In the meantime, is there some sort of workaround we could implement for wmb8?

shanson

Posted: Fri Nov 04, 2016 1:30 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

The problem is that the '|' may mean something different at a lower level in the model. The DFDL parser can't assume that it is the next separator at the current level. So it descends into the sequence and it is only when it parses recA[1] that it can be sure about the '|'. (See DFDL spec 9.3.2.2). At that point, nil literal processing for recA[1] occurs (which has a higher priority than minOccurs checking).

I would change the model so it looks like:

Code:

Records
RecordA
recA
RecordB
recB
...

Then you can take advantage of a new capability in DFDL in v10 which allows a complex element to be nil. Here, the DFDL parser explicitly checks if the next char is the separator at that level, and if it is assigns <nil> to the complex element. Again see 9.3.2.2. In your example you would then get:

Code:

Records
RecordA : <nil>
RecordB
recB : valueB1
recB : valueB2
...

I suspect that adding the extra complex element in v8 will also fix the bug you are seeing with the empty recD[1]. I think the bug is due to the nested sequences.

There is a way in the DFDL spec to look ahead and decide whether an element exists or not. It's a discriminator with a 'testPattern' (ie, regex) instead of a 'test' (ie, XPath). But IBM DFDL does not support that yet.

rekarm01

Posted: Fri Nov 04, 2016 3:53 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

shanson wrote:

I would change the model so it looks like:

Code:

Records
RecordA
recA
RecordB
recB
...

I don't see how adding an extra layer of elements helps with either issue.

shanson wrote:

Then you can take advantage of a new capability in DFDL in v10 which allows a complex element to be nil. ... In your example you would then get:

Code:

Records
RecordA : <nil>
RecordB
recB : valueB1
recB : valueB2
...

This seems to be trading one unnecessary <nil> element for another unnecessary <nil> element. The <nil> value for RecordA is not serving as a place-holder in an array, so I'd want it to be empty/absent/missing, for the same reasons as for RecA[1] in the previous example. But if there's currently no way to model that, then I'll have to set this issue aside for now.

shanson wrote:

I suspect that adding the extra complex element in v8 will also fix the bug you are seeing with the empty recD[1]. I think the bug is due to the nested sequences.

No, it doesn't fix the bug with the empty recD[1]. I added an element to the v8 schema for RecordD as suggested, (and also similar elements for RecordA, RecordB, and RecordC):

Code:

...
<xsd:element dfdl:occursCountKind="implicit" maxOccurs="1" minOccurs="0" name="recordD">
<xsd:complexType>
   <xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix">
      <xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
         maxOccurs="unbounded" minOccurs="0" name="recD" nillable="true" type="xsd:string"/>
   </xsd:sequence>
</xsd:complexType>
</xsd:element>
...

But I still get the same error:

Code:

error: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'recordD', xpath: '/Records[1]/recordD[1]'.

timber

Posted: Sat Nov 05, 2016 3:45 am Post subject:

Grand Master

Joined: 25 Aug 2015
Posts: 1292

Try modelling the pipe character as a terminator instead of a separator; that's what the CSV wizard does, and it avoids the 'unexpected postfix separator' error. You will probably need to set the 'documentFinalTerminatorCanBeMissing' property to make that work.

rekarm01

Posted: Sat Nov 05, 2016 10:16 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

timber wrote:

For the given input message "|valueB1,valueB2,|valueC1,,valueC3|,valueD2,valueD3", modelling the pipe delimiter as a terminator (for recA, recB, recC, and recD) does not seem to make a difference:

Code:

<xsd:sequence>
<xsd:sequence dfdl:separator="," dfdl:separatorPolicy="suppressedAtEndLax" dfdl:separatorPosition="infix" dfdl:terminator="|">
<xsd:element dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:occursCountKind="implicit" dfdl:useNilForDefault="yes"
maxOccurs="unbounded" minOccurs="0" name="recA" nillable="true" type="xsd:string"/>
</xsd:sequence>
...
</xsd:sequence>

wmb8 still produces the 'unexpected postfix separator' error, (while iib10 parses successfully):

Code:

fatal: CTDP3062E: An unexpected non-postfix separator ',' occurs in a postfix position at offset '35' in 'sequence' group contained within element 'sequence', xpath: '/Records[1]'.

shanson

Posted: Mon Nov 07, 2016 12:25 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

If you are modelling recA etc as nillable then I think you are stuck with the side-effect of getting a nil at some level in the tree. As I said, you need a discriminator with a testPattern but that is not yet in IBM DFDL.

mqjeff

Posted: Mon Nov 07, 2016 5:17 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

shanson wrote:

Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

....
Maybe...
....

ported to v9 is more likely, as there may be significant changes in the code between v8 and v10.
_________________
chmod -R ugo-wx /

rekarm01

Posted: Tue Nov 08, 2016 2:58 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 1415

shanson wrote:

Raise a PMR for the error. You have shown it's been fixed in v10 so there must be a fix that can be ported to v8.

I can raise a PMR for the error. In the meantime, it's looking like there's not a workaround for this. Thanks for your time.

shanson wrote:

If you are modelling recA etc as nillable then I think you are stuck with the side-effect of getting a nil at some level in the tree.

Then I'll have to set that issue aside for the time being.

mqjeff wrote:

ported to v9 is more likely

As we are in the process of migrating our brokers from wmb8 to iib10, I didn't test whether the error occurs in v9; it's possible that it's fixed there already.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parsing error when first array item is empty string

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP