|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Fixedwidth file processing error with DFDL parser |
« View previous topic :: View next topic » |
Author |
Message
|
mauryamirzapuri |
Posted: Wed Jun 01, 2022 3:16 am Post subject: Fixedwidth file processing error with DFDL parser |
|
|
Newbie
Joined: 01 Jun 2022 Posts: 5
|
Dear Team,
I am trying to parse a fixed width file as below with 5 fields :-
856000 FRIGPARTS 55752001 BB0227
below are field lengths
First Field - 3
Second Field - 7
Third Field - 15
Fourth field - 15
Fifth Field- 30
i have created DFDL with 5 fields to accommodate above. But the problem here is 5th field length is 30, but we are not getting all 30 bytes/char in input file as the file generating application is having some limitation. Whenever i am trying to parse i am getting error.
in addition to above, there could be more fields, but those values are not being populated by the application, hence that is an addition to above problem.
I am not able to post the definition here, as portal is not allowing me. Please let me know how can i post the definition, as i am new to mqseries. |
|
Back to top |
|
 |
timber |
Posted: Mon Jun 06, 2022 2:15 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
But the problem here is 5th field length is 30, but we are not getting all 30 bytes/char in input file as the file generating application is having some limitation |
DFDL can solve that problem.
Quote: |
Whenever i am trying to parse i am getting error. |
Please quote the error.
Quote: |
in addition to above, there could be more fields, but those values are not being populated by the application, hence that is an addition to above problem. |
You need to be much more precise in your problem description.
Please supply example input messages for all scenarios. |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Jun 06, 2022 4:08 am Post subject: Re: Fixedwidth file processing error with DFDL parser |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
mauryamirzapuri wrote: |
But the problem here is 5th field length is 30, but we are not getting all 30 bytes/char in input file as the file generating application is having some limitation. Whenever i am trying to parse i am getting error. |
As Timber suggests, please be precise in your error description.
Exactly how many bytes of the 5th field are you getting?
Exactly what error are you getting? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mauryamirzapuri |
Posted: Mon Jun 06, 2022 10:41 pm Post subject: |
|
|
Newbie
Joined: 01 Jun 2022 Posts: 5
|
Thanks timber and bruce. i will post the DFDL and error in next post.
Last edited by mauryamirzapuri on Mon Jun 06, 2022 10:48 pm; edited 1 time in total |
|
Back to top |
|
 |
mauryamirzapuri |
Posted: Mon Jun 06, 2022 10:45 pm Post subject: |
|
|
Newbie
Joined: 01 Jun 2022 Posts: 5
|
5th filed length can be anything from 1 till complete length of field, which is 30.
In the supplied file, it is of 6 length. As per the error i believe it is saying it found the end line character, but it was not expecting it till the complete length of 5th field as defined in DFDL.
Below is DFDL.
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format encoding="{$dfdl:encoding}" escapeSchemeRef="" occursCountKind="fixed" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:lengthKind="delimited" ibmSchExtn:docRoot="true" name="sas">
<xsd:complexType>
<xsd:sequence dfdl:separator="%CR;%LF;%WSP*;" dfdl:separatorSuppressionPolicy="anyEmpty">
<xsd:element dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" name="body">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/"/>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence dfdl:terminator="%CR;%LF;">
<xsd:element dfdl:length="3" ibmDfdlExtn:sampleValue="body_valu1" name="body_elem1" type="xsd:string"/>
<xsd:element dfdl:length="7" ibmDfdlExtn:sampleValue="body_valu2" name="body_elem2" type="xsd:string"/>
<xsd:element dfdl:length="15" ibmDfdlExtn:sampleValue="body_valu3" name="body_elem3" type="xsd:string"/>
<xsd:element dfdl:length="15" ibmDfdlExtn:sampleValue="body_value4" name="body_elem4" type="xsd:string"/>
<xsd:element dfdl:length="30" dfdl:lengthKind="explicit" dfdl:lengthUnits="characters" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:nilValueDelimiterPolicy="none" dfdl:terminator="" dfdl:textPadKind="padChar" dfdl:useNilForDefault="no" ibmDfdlExtn:sampleValue="body_value5" name="body_elem5" nillable="true">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="0"/>
<xsd:minLength value="0"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Below is the error trace.
********************* DFDL Parser Setup Starting *********************
Schema = /asdsa/sas.xsd
**********************************************************************
********************* DFDL Parser Starting *********************
Data = C:\Users\maurysat\Desktop\new 8.txt
Message = sas (/asdsa/sas.xsd)
****************************************************************
Jun 7, 2022, 2:41:50 AM info: Offset: 0. Parsing will start from root element 'sas'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas), 62]
Jun 7, 2022, 2:41:50 AM info: The default value of '%LF;' was assigned to variable 'outputNewLine' in namespace 'http://www.ogf.org/dfdl/dfdl-1.0/'.
[dfdl = /asdsa/sas.xsd, scd = , 133]
Jun 7, 2022, 2:41:50 AM info: Offset: 0. Starting to process element 'sas'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas), 53]
Jun 7, 2022, 2:41:51 AM info: Offset: 0. Up to 'unbounded' occurrences of element 'body' will be expected because occursCountKind='implicit' and maxOccurs='unbounded'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body), 145]
Jun 7, 2022, 2:41:51 AM info: Offset: 0. Starting to process element 'body'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body), 54]
Jun 7, 2022, 2:41:51 AM info: Offset: 0. Starting to process element 'body_elem1'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 60]
Jun 7, 2022, 2:41:51 AM info: Offset: 0. Found specified length value '856' for element 'body_elem1'. The length was 3 bytes.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 104]
Jun 7, 2022, 2:41:51 AM info: Offset: 3. Finished processing element 'body_elem1'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem1), 60]
Jun 7, 2022, 2:41:51 AM info: Offset: 3. Starting to process element 'body_elem2'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 60]
Jun 7, 2022, 2:41:51 AM info: Offset: 3. Found specified length value '000' for element 'body_elem2'. The length was 7 bytes.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 104]
Jun 7, 2022, 2:41:51 AM info: Offset: 10. Finished processing element 'body_elem2'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem2), 61]
Jun 7, 2022, 2:41:51 AM info: Offset: 10. Starting to process element 'body_elem3'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]
Jun 7, 2022, 2:41:51 AM info: Offset: 10. Found specified length value 'FRIGPARTS' for element 'body_elem3'. The length was 15 bytes.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 112]
Jun 7, 2022, 2:41:51 AM info: Offset: 25. Finished processing element 'body_elem3'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem3), 61]
Jun 7, 2022, 2:41:51 AM info: Offset: 25. Starting to process element 'body_elem4'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem4), 61]
Jun 7, 2022, 2:41:51 AM info: Offset: 25. Found specified length value '55752001' for element 'body_elem4'. The length was 15 bytes.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem4), 111]
Jun 7, 2022, 2:41:51 AM info: Offset: 40. Finished processing element 'body_elem4'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem4), 61]
Jun 7, 2022, 2:41:51 AM info: Offset: 40. Starting to process element 'body_elem5'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem5), 61]
Jun 7, 2022, 2:41:51 AM error: CTDP3000E: Unexpected end of data at byte offset '48' while parsing element 'body_elem5'. The parser encountered the end of the data stream or the end of a parent element.
Jun 7, 2022, 2:41:51 AM fatal: CTDP3000E: Unexpected end of data at byte offset '48' while parsing element 'body_elem5'. The parser encountered the end of the data stream or the end of a parent element. |
|
Back to top |
|
 |
timber |
Posted: Tue Jun 07, 2022 1:31 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Thanks - that confirms what I suspected about your initial question. The problem is here:
Code: |
<xsd:element dfdl:length="30" dfdl:lengthKind="explicit" dfdl:lengthUnits="characters" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:nilValueDelimiterPolicy="none" dfdl:terminator="" dfdl:textPadKind="padChar" dfdl:useNilForDefault="no" ibmDfdlExtn:sampleValue="body_value5" name="body_elem5" nillable="true"> |
You are setting lengthKind to 'explicit' which means 'use the specified length'.
You need to set it to 'delimited', so that the DFDL parser uses the separator/terminator to find the end of the field.
But... before you make that change, you must ask yourself two important questions.
1. Is this element 'body' always the final element in the record?
2. If the answer to 1. is 'no', is there any mandatory element after 'body_elem5' in the same record?
In the DFDL schema that you posted, the answer to 1. is 'yes' so you can safely make the change. But you also said
Quote: |
in addition to above, there could be more fields, but those values are not being populated by the application, hence that is an addition to above problem. |
If these 'more fields' can come after 'body_elem5' then you may have a problem. |
|
Back to top |
|
 |
mauryamirzapuri |
Posted: Tue Jun 07, 2022 1:50 am Post subject: |
|
|
Newbie
Joined: 01 Jun 2022 Posts: 5
|
Thanks timber. It resolved the problem. And related to the your query:-
I checked with the file generating application. What they said, only last field will be having this limitation, though they are not very sure. But we are good as of now.
Many thanks again for your inputs. |
|
Back to top |
|
 |
mauryamirzapuri |
Posted: Fri Jun 10, 2022 6:24 am Post subject: |
|
|
Newbie
Joined: 01 Jun 2022 Posts: 5
|
I am back with the issue in addition to what already resolved. In my earlier posts, I had five fields. Now I can have 10 fields. But values after 5th fields may or may not come. If the value for 1 field is coming, it will of defined length, unless it is a last field(which was the case in my first issue). To test this I tried adding 2 more fields after 5th field. When i am testing with below data and DFDL. It is working fine.
856000 FRIGPARTS 55752001 BB024567
DFDL
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:recFixLengthFieldsFmt="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat">
<xsd:import namespace="http://www.ibm.com/dfdl/RecordFixLengthFieldFormat" schemaLocation="IBMdefined/RecordFixLengthFieldFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format encoding="{$dfdl:encoding}" escapeSchemeRef="" occursCountKind="fixed" ref="recFixLengthFieldsFmt:RecordFixLengthFieldsFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element dfdl:lengthKind="delimited" ibmSchExtn:docRoot="true" name="sas">
<xsd:complexType>
<xsd:sequence dfdl:separator="%CR;%LF;%WSP*;" dfdl:separatorSuppressionPolicy="anyEmpty">
<xsd:element dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" dfdl:terminator="" maxOccurs="unbounded" name="body">
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/"/>
</xsd:annotation>
<xsd:complexType>
<xsd:sequence dfdl:terminator="%CR;%LF;">
<xsd:element dfdl:length="3" ibmDfdlExtn:sampleValue="body_valu1" name="body_elem1" type="xsd:string"/>
<xsd:element dfdl:length="7" ibmDfdlExtn:sampleValue="body_valu2" name="body_elem2" type="xsd:string"/>
<xsd:element dfdl:length="15" ibmDfdlExtn:sampleValue="body_valu3" name="body_elem3" type="xsd:string"/>
<xsd:element dfdl:length="15" ibmDfdlExtn:sampleValue="body_value4" name="body_elem4" type="xsd:string"/>
<xsd:element dfdl:emptyValueDelimiterPolicy="none" dfdl:length="5" dfdl:lengthKind="explicit" dfdl:nilKind="literalValue" dfdl:nilValue="%ES;" dfdl:terminator="" dfdl:textPadKind="padChar" dfdl:textTrimKind="none" dfdl:useNilForDefault="yes" ibmDfdlExtn:sampleValue="body_value5" name="body_elem5" nillable="true">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:minLength value="1"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:length="3" dfdl:lengthKind="explicit" dfdl:occursCountKind="implicit" minOccurs="0" name="field1">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="3"/>
<xsd:minLength value="0"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
<xsd:element dfdl:lengthKind="delimited" dfdl:occursCountKind="implicit" minOccurs="0" name="field2">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="3"/>
</xsd:restriction>
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
|
But when i am testing with below data, it is failing. My requirement is to parse this kind of data.
856000 FRIGPARTS 55752001 BB0
see the 5th field representation from debug XML view below
Code: |
<?xml version="1.0" encoding="UTF-8" ?>
- <sas xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
- <body>
<body_elem1 xsi:type="xs:string">856</body_elem1>
<body_elem2 xsi:type="xs:string">000</body_elem2>
<body_elem3 xsi:type="xs:string">FRIGPARTS</body_elem3>
<body_elem4 xsi:type="xs:string">55752001</body_elem4>
<body_elem5 xsi:type="xs:string">BB0
</body_elem5>
</body>
</sas>
|
and the below is error.
Code: |
Jun 10, 2022, 10:19:36 AM info: Offset: 40. Found specified length value 'BB0%CR;%LF;' for element 'body_elem5'. The length was 5 bytes.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem5), 113]
Jun 10, 2022, 10:19:36 AM info: Offset: 45. Finished processing element 'body_elem5'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence/schemaElement::body_elem5), 61]
Jun 10, 2022, 10:19:36 AM info: Offset: 45. Did not find terminator for 'sequence'. Expected terminator list is '%CR;%LF;'.
[dfdl = /asdsa/sas.xsd, scd = #xscd(/schemaElement::sas/type::0/model::sequence/schemaElement::body/type::0/model::sequence), 100]
Jun 10, 2022, 10:19:36 AM error: CTDP3061E: Terminator '%CR;%LF;' not found at offset '45' for sequence or choice within element '/sas[1]/body[1]'.
Jun 10, 2022, 10:19:36 AM fatal: CTDP3061E: Terminator '%CR;%LF;' not found at offset '45' for sequence or choice within element '/sas[1]/body[1]'.
|
please help me here. |
|
Back to top |
|
 |
timber |
Posted: Tue Jun 14, 2022 4:42 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Is this file really a fixed-width format? It looks as if there is a single space between each field.
* is there always exactly one space between the fields?
* can the fields ever be separated by a newline or carriage return instead of a space? |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|