Author |
Message
|
Mut1ey |
Posted: Wed Oct 01, 2014 3:42 am Post subject: DFDL Parse error after delimiter |
|
|
Acolyte
Joined: 07 Oct 2005 Posts: 74 Location: England
|
Using DFDL parser in v8.0.0.2 on Windows and AIX.
Having a simple model of text records that are pipe (|) 0x7c delimited and each element of implicit length, throws the following exception
<Catalog>BIPmsgs</Catalog>
<Severity>3</Severity>
<Number>5807</Number>
<Text>An error occurred whilst parsing with DFDL</Text>
<Insert>
<Type>5</Type>
<Text>CTDP3002E: Unexpected data found at offset '' after parsing completed. Data: '0x53...'.</Text>
</Insert>
when parsing the following record:
SFLT|132648000141|"B" BY BLACK TOWER WHITE|04069600014578
However if the " is not the first character following | the parser is happy.
Each record starts with 0x53 (the S char).
The CCSID of the incoming message is 923 or 859.
Thanks for any insights. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Oct 01, 2014 5:12 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Post the model.
Can you also post the output from using the DFDL Test Parser in Toolkit on the same message data? |
|
Back to top |
|
 |
Mut1ey |
Posted: Wed Oct 01, 2014 5:30 am Post subject: |
|
|
Acolyte
Joined: 07 Oct 2005 Posts: 74 Location: England
|
mqjeff wrote: |
Post the model.
Can you also post the output from using the DFDL Test Parser in Toolkit on the same message data? |
Code: |
<?xml version="1.0" encoding="utf-8"?>
<xsd:schema xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions"
xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions"
xmlns:recSepFieldsFmt="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordSeparatedFieldFormat"
schemaLocation="IBMdefined/RecordSeparatedFieldFormat.xsd" />
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format byteOrder="{$dfdl:byteOrder}"
documentFinalTerminatorCanBeMissing="yes"
encoding="{$dfdl:encoding}"
escapeSchemeRef="recSepFieldsFmt:RecordEscapeScheme"
nilKind="logicalValue" nilValue="%SP;"
occursCountKind="implicit"
ref="recSepFieldsFmt:RecordSeparatedFieldsFormat"
useNilForDefault="yes" />
</xsd:appinfo>
</xsd:annotation>
<xsd:element ibmSchExtn:docRoot="true" name="Article">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence />
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:occursCountKind="implicit"
dfdl:outputNewLine="%CR;%LF;" maxOccurs="unbounded"
minOccurs="1" name="body">
<xsd:complexType>
<xsd:sequence dfdl:separator="|"
dfdl:separatorPolicy="suppressedAtEndLax"
dfdl:terminator="%LF;">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:sequence />
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:terminator=""
ibmDfdlExtn:sampleValue="body_value1" minOccurs="1"
name="supplier">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="8" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element ibmDfdlExtn:sampleValue="body_value2"
minOccurs="1" name="comm_code_i">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="14" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:nilValueDelimiterPolicy="none"
minOccurs="1" name="comm_desc" nillable="false">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="70" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:textNumberPattern="#0"
minOccurs="1" name="comm_code_s">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="70" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
|
Here is the section of the DFDL Test Trace:
Code: |
1 Oct 2014 14:26:31 info: Offset: 483. Starting to process element 'comm_desc'.
[dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::comm_desc), 61]
1 Oct 2014 14:26:31 error: CTDP3118E: No markup found after the escape block.
1 Oct 2014 14:26:31 info: Offset: 565. Parser was unable to resolve data on the current branch and will evaluate the next available branch beginning at offset '465' owned by the 'sequence' group contained within element 'Article'.
[dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence), 224]
1 Oct 2014 14:26:31 info: Offset: 465. Occurrence '10' of element 'body' was not found in the data. occursCountKind is 'implicit' so no more occurrences of this element will be expected.
[dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article/type::0/model::sequence/schemaElement::body), 169]
1 Oct 2014 14:26:31 info: Offset: 465. Finished processing element 'Article'.
[dfdl = /DFDL_Interfaces/Article.xsd, scd = #xscd(/schemaElement::Article), 71]
1 Oct 2014 14:26:31 fatal: CTDP3002E: Unexpected data found at offset '465' after parsing completed. Data: '0x53...'.
|
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Wed Oct 01, 2014 5:42 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Before I answer the question, I would like to point out that you are not following recommended best practice in developing/debugging your DFDL model.
You should:
- generate the model. I guess you used the wizard for that.
- test it using the DFDL Test perspective in the toolkit
- diagnose errors using the DFDL Trace
- deploy (and re-test) on the runtime when the model has been debugged
Diagnosis
If your model was generated by the wizard then it is probably using an escape schema with escapeKind="escapeBlock" and Escape Block Start set to " ( quote character ). If you use those settings then the DFDL specification says that if a field starts with " then it must end with " ( the entire field must be escaped ).
You now need to fix your DFDL model so that it accurately describes your data format. Questions that you need to answer include:
- will your data ever include a pipe character ( 0x7c) as part of a field value?
- will your data ever include a line feed character as part of a field value?
- if the answer to either of the previous questions is 'yes' then how will the 0x7c be 'escaped' so that it does not get interpreted as a delimiter? _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Mut1ey |
Posted: Wed Oct 01, 2014 6:15 am Post subject: |
|
|
Acolyte
Joined: 07 Oct 2005 Posts: 74 Location: England
|
kimbert wrote: |
Before I answer the question, I would like to point out that you are not following recommended best practice in developing/debugging your DFDL model.
You should:
- generate the model. I guess you used the wizard for that.
- test it using the DFDL Test perspective in the toolkit
- diagnose errors using the DFDL Trace
|
Appreciated - I did a combination of both - used the wizard to give a skeleton, and test the parsing against simple records. The tests were incomplete though, as they didn't include quotes. So the model was deployed and tested successfully until more realistic sample data was used.
What is the best practice for generating the model?
I will look for the escape schema, thanks.
In answer to the questions about content, yes, there is potential that pipe could be included in this description field, as it is relatively free form. I can't see how line feed characters could be introduced here though.
Is there a way to 'escape' certain byte/ character values? Or is there a way to not parse certain elements, in a similar way to BLOB messages? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Oct 01, 2014 6:32 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
What is the best practice for generating the model? |
You found it already - use the wizard. But the wizard cannot do the whole job for you - you still need to test that the generated model is correct for your data format. Looks as if you are partway through that exercise already.
Quote: |
yes, there is potential that pipe could be included in this description field, as it is relatively free form |
So it will need to be escaped - otherwise the parser will stop at the first pipe character. Please post an example of a message that contains an 0x7C character as part of a field value.
Important question: Are you the author of the sending application? Or do you have any control over how it will escape pipe characters in the input data?
Quote: |
I will look for the escape schema, thanks. |
It's escape scheme ( not schema ). An escape scheme is a set of rules for how to 'hide' delimiters and other markup within a delimited field. DFDL supports a wide range of options - please see the DFDL specification for details:
http://www-01.ibm.com/support/knowledgecenter/SSKM8N_8.0.0/com.ibm.etools.mft.doc/df00110_.htm _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Mut1ey |
Posted: Wed Oct 01, 2014 8:34 am Post subject: |
|
|
Acolyte
Joined: 07 Oct 2005 Posts: 74 Location: England
|
Quote: |
Are you the author of the sending application? |
Unfortunately not; the process of creating an input message includes the addition of the field separators. So we would have to do something in broker before parsing into DFDL to try to detect additional pipes. Check the number of pipe hex values is divisible by the number expected per record (times the number of records detected). I wonder if it would be more effective to allow an exception if this happens rather than code to defend against it possibly happening. These are not hard-real-time systems so could be amended and re-submitted with the escape char added in these cases.
I have the Specification thanks, although I obviously haven't read and absorbed it as much as yourself...
Thanks for your help. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Oct 01, 2014 3:50 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Unfortunately not; the process of creating an input message includes the addition of the field separators. |
Well, of course it does. But that is not relevant to this discussion. I am asking what the sending application does when a pipe or a line feed occurs within a field value.
Quote: |
So we would have to do something in broker before parsing into DFDL to try to detect additional pipes |
I disagree. If the sending application is not taking any steps to escape delimiters when they appear within a field then the sending application is broken and the data cannot be parsed reliably. Unless a pipe character can only appear in the final field on each line.
You should talk to the authors of the sending application, or else locate a specification for the data format that it is writing. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
|