MQSeries.net :: View topic - DFDL Parse error after delimiter

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL Parse error after delimiter

DFDL Parse error after delimiter

« View previous topic :: View next topic »

Author

Message

Mut1ey

Posted: Wed Oct 01, 2014 3:42 am Post subject: DFDL Parse error after delimiter

Acolyte

Joined: 07 Oct 2005
Posts: 74
Location: England

Using DFDL parser in v8.0.0.2 on Windows and AIX.

Having a simple model of text records that are pipe (|) 0x7c delimited and each element of implicit length, throws the following exception

<Catalog>BIPmsgs</Catalog>
<Severity>3</Severity>
<Number>5807</Number>
<Text>An error occurred whilst parsing with DFDL</Text>
<Insert>
<Type>5</Type>
<Text>CTDP3002E: Unexpected data found at offset '' after parsing completed. Data: '0x53...'.</Text>
</Insert>

when parsing the following record:

SFLT|132648000141|"B" BY BLACK TOWER WHITE|04069600014578

However if the " is not the first character following | the parser is happy.

Each record starts with 0x53 (the S char).

The CCSID of the incoming message is 923 or 859.

Thanks for any insights.

mqjeff

Posted: Wed Oct 01, 2014 5:12 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Post the model.

Can you also post the output from using the DFDL Test Parser in Toolkit on the same message data?

Mut1ey

Posted: Wed Oct 01, 2014 5:30 am Post subject:

Acolyte

Joined: 07 Oct 2005
Posts: 74
Location: England

mqjeff wrote:

Post the model.

Can you also post the output from using the DFDL Test Parser in Toolkit on the same message data?

Code:

<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions"
xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions"
xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat"
schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd" />
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}"
documentFinalTerminatorCanBeMissing="yes"
encoding="{$dfdl:encoding}"
escapeSchemeRef="recSepFieldsFmt:RecordEscapeScheme"
nilKind="logicalValue" nilValue="%SP;"
occursCountKind="implicit"
ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"
useNilForDefault="yes" />
</xsd:appinfo>
</xsd:annotation>
<xsd:element ibmSchExtn:docRoot="true" name="Article">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence />
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:occursCountKind="implicit"
dfdl:outputNewLine="%CR;%LF;" maxOccurs="unbounded"
minOccurs="1" name="body">
<xsd:complexType>
<xsd:sequence dfdl:separator="|"
dfdl:separatorPolicy="suppressedAtEndLax"
dfdl:terminator="%LF;">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence />
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:terminator=""
ibmDfdlExtn:sampleValue="body_value1" minOccurs="1"
name="supplier">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="8" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element ibmDfdlExtn:sampleValue="body_value2"
minOccurs="1" name="comm_code_i">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="14" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:nilValueDelimiterPolicy="none"
minOccurs="1" name="comm_desc" nillable="false">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="70" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:textNumberPattern="#0"
minOccurs="1" name="comm_code_s">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="70" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

Here is the section of the DFDL Test Trace:

Code:

1 Oct 2014 14:26:31 info: Offset: 483. Starting to process element 'comm_desc'.
   [dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::comm_desc), 61]

1 Oct 2014 14:26:31 error: CTDP3118E: No markup found after the escape block.

1 Oct 2014 14:26:31 info: Offset: 565. Parser was unable to resolve data on the current branch and will evaluate the next available branch beginning at offset '465' owned by the 'sequence' group contained within element 'Article'.
   [dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence), 224]

1 Oct 2014 14:26:31 info: Offset: 465. Occurrence '10' of element 'body' was not found in the data. occursCountKind is 'implicit' so no more occurrences of this element will be expected.
   [dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence/schemaElement::body), 169]

1 Oct 2014 14:26:31 info: Offset: 465. Finished processing element 'Article'.
   [dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article), 71]

1 Oct 2014 14:26:31 fatal: CTDP3002E: Unexpected data found at offset '465' after parsing completed. Data: '0x53...'.

Thanks

kimbert

Posted: Wed Oct 01, 2014 5:42 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Before I answer the question, I would like to point out that you are not following recommended best practice in developing/debugging your DFDL model.
You should:
- generate the model. I guess you used the wizard for that.
- test it using the DFDL Test perspective in the toolkit
- diagnose errors using the DFDL Trace
- deploy (and re-test) on the runtime when the model has been debugged

Diagnosis
If your model was generated by the wizard then it is probably using an escape schema with escapeKind="escapeBlock" and Escape Block Start set to " ( quote character ). If you use those settings then the DFDL specification says that if a field starts with " then it must end with " ( the entire field must be escaped ).

You now need to fix your DFDL model so that it accurately describes your data format. Questions that you need to answer include:
- will your data ever include a pipe character ( 0x7c) as part of a field value?
- will your data ever include a line feed character as part of a field value?
- if the answer to either of the previous questions is 'yes' then how will the 0x7c be 'escaped' so that it does not get interpreted as a delimiter?
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Mut1ey

Posted: Wed Oct 01, 2014 6:15 am Post subject:

Acolyte

Joined: 07 Oct 2005
Posts: 74
Location: England

kimbert wrote:

Appreciated - I did a combination of both - used the wizard to give a skeleton, and test the parsing against simple records. The tests were incomplete though, as they didn't include quotes. So the model was deployed and tested successfully until more realistic sample data was used.

What is the best practice for generating the model?

I will look for the escape schema, thanks.

In answer to the questions about content, yes, there is potential that pipe could be included in this description field, as it is relatively free form. I can't see how line feed characters could be introduced here though.

Is there a way to 'escape' certain byte/ character values? Or is there a way to not parse certain elements, in a similar way to BLOB messages?

kimbert

Posted: Wed Oct 01, 2014 6:32 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

What is the best practice for generating the model?

You found it already - use the wizard. But the wizard cannot do the whole job for you - you still need to test that the generated model is correct for your data format. Looks as if you are partway through that exercise already.

Quote:

yes, there is potential that pipe could be included in this description field, as it is relatively free form

So it will need to be escaped - otherwise the parser will stop at the first pipe character. Please post an example of a message that contains an 0x7C character as part of a field value.
Important question: Are you the author of the sending application? Or do you have any control over how it will escape pipe characters in the input data?

Quote:

I will look for the escape schema, thanks.

It's escape scheme ( not schema ). An escape scheme is a set of rules for how to 'hide' delimiters and other markup within a delimited field. DFDL supports a wide range of options - please see the DFDL specification for details:
http://www-01.ibm.com/support/knowledgecenter/SSKM8N_8.0.0/com.ibm.etools.mft.doc/df00110_.htm
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Mut1ey

Posted: Wed Oct 01, 2014 8:34 am Post subject:

Acolyte

Joined: 07 Oct 2005
Posts: 74
Location: England

Quote:

Are you the author of the sending application?

Unfortunately not; the process of creating an input message includes the addition of the field separators. So we would have to do something in broker before parsing into DFDL to try to detect additional pipes. Check the number of pipe hex values is divisible by the number expected per record (times the number of records detected). I wonder if it would be more effective to allow an exception if this happens rather than code to defend against it possibly happening. These are not hard-real-time systems so could be amended and re-submitted with the escape char added in these cases.

I have the Specification thanks, although I obviously haven't read and absorbed it as much as yourself...

Thanks for your help.

kimbert

Posted: Wed Oct 01, 2014 3:50 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

Unfortunately not; the process of creating an input message includes the addition of the field separators.

Well, of course it does. But that is not relevant to this discussion. I am asking what the sending application does when a pipe or a line feed occurs within a field value.

Quote:

So we would have to do something in broker before parsing into DFDL to try to detect additional pipes

I disagree. If the sending application is not taking any steps to escape delimiters when they appear within a field then the sending application is broken and the data cannot be parsed reliably. Unless a pipe character can only appear in the final field on each line.

You should talk to the authors of the sending application, or else locate a specification for the data format that it is writing.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL Parse error after delimiter

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP