MQSeries.net :: View topic - DFDL parser exception on Linux but not on Windows

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parser exception on Linux but not on Windows

DFDL parser exception on Linux but not on Windows

« View previous topic :: View next topic »

Author

Message

mattynorm

Posted: Fri Apr 25, 2014 12:57 am Post subject: DFDL parser exception on Linux but not on Windows

Acolyte

Joined: 06 Jun 2003
Posts: 52

couple of people were very helpful on here in getting a flow processing a large file (5.6m rows) with sensible memory allocation. Thread can be found here

http://www.mqseries.net/phpBB2/viewtopic.php?t=67032

and on my Windows VM it works fine (takes 41ish minutes, but that should be ok as when live it should be running at 3 in the morning).

Have now deployed the code to a Linux server, and the tests were working fine, until I tried to run a full size file, when the flow fails DFDL parsing (usually about 2.8m records in) and throws an error.

Quote:

BIP5807E: The DFDL parser signalled that a processing error occurred. The message from the DFDL parser is: CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33...'.

Have looked at the offset in Notepad++, can't see anything untoward about the line in question. Copied that line (and several around it) to a test file, ran that through and it worked fine.

If it helps, the Parser Exception being generated is

Code:

<ParserException>
<File>/build/slot1/S900_P/src/MTI/MTIforBroker/DfdlParser/ImbDFDLErrorHandler.cpp</File>
<Line>174</Line>
<Function>ImbDFDLErrorHandler::handleParserErrors</Function>
<Type>ComIbmFileInputNode</Type>
<Name>AvailableStockToWCS#FCMComposite_1_5</Name>
<Label>AvailableStockToWCS.File Input1</Label>
<Catalog>BIPmsgs</Catalog>
<Severity>3</Severity>
<Number>5807</Number>
<Text>An error occurred whilst parsing with DFDL</Text>
<Insert>
<Type>5</Type>
<Text>CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33...'.</Text>
</Insert>
</ParserException>

Sample file

Code:

Header1,Header2,Header3
30040611006,0770,0
30040611006,0775,1
30040611006,0780,1
30040611006,0785,2
30040699999,0790,0
30040611006,0795,1
30040611006,0800,1
30040611006,0805,1
30040611006,0810,1

And the dfdl message it is parsing against is here

Code:

<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:csv="http://www.ibm.com/dfdl/CommaSeparatedFormat" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

<xsd:import namespace="http://www.ibm.com/dfdl/CommaSeparatedFormat" schemaLocation="../../../../../IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
   <xsd:appinfo source="http://www.ogf.org/dfdl/">
      <dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
   </xsd:appinfo>
</xsd:annotation>

<xsd:element ibmSchExtn:docRoot="true" name="StockDB_Webstock_Stock">
   <xsd:complexType>
      <xsd:sequence dfdl:separator="">
         <xsd:element dfdl:terminator="%CR;%LF;%WSP*;" name="header">
            <xsd:complexType>
               <xsd:sequence>
                  <xsd:element ibmDfdlExtn:sampleValue="head_value1" name="head_ArticleID" type="xsd:string"/>
                  <xsd:element ibmDfdlExtn:sampleValue="head_value2" name="head_SAPStoreID" type="xsd:string"/>
                  <xsd:element ibmDfdlExtn:sampleValue="head_value3" name="head_AvailableStock" type="xsd:string"/>
               </xsd:sequence>
            </xsd:complexType>
         </xsd:element>
         <xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;%WSP*;" maxOccurs="unbounded" minOccurs="0" name="record">
            <xsd:complexType>
               <xsd:sequence>
                  <xsd:element dfdl:textNumberPattern="#0" name="ArticleID" type="xsd:string"/>
                  <xsd:element dfdl:textNumberPattern="#0" name="SAPStoreID" type="xsd:string"/>
                  <xsd:element dfdl:textNumberPattern="#0" name="AvailableStock" type="xsd:integer"/>
               </xsd:sequence>
            </xsd:complexType>
         </xsd:element>
      </xsd:sequence>
   </xsd:complexType>
</xsd:element>

The only thing I could think of is a difference between how Linux and Windows interprets the terminator, but that doesn't explain why it will happily process 2m records before failing.

Any ideas?

kimbert

Posted: Fri Apr 25, 2014 2:01 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

This error:

Code:

CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33

is issued when the DFDL parser finishes parsing the entire message ( gets to the end of the message model ) and it has not consumed the entire input bitstream.

Quote:

The only thing I could think of is a difference between how Linux and Windows interprets the terminator, but that doesn't explain why it will happily process 2m records before failing.

Well, it might explain it. After all, the DFDL parser will not know that there is data 'left over' until it gets to the end of the model.

I notice that you are setting the terminator thus:

Code:

dfdl:terminator="%CR;%LF;%WSP*;"

In English, that means 'a cr/lf pair followed by optional white space'. Is that what you are expecting in your Linux file?
If you need to tolerate any type of line terminator then you could use this instead:

Code:

dfdl:terminator="%NL;%WSP*;"

or, if you don't need to skip blank lines, this:

Code:

dfdl:terminator="%NL;"

But make sure that you repeat your performance tests after any such changes.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

mattynorm

Posted: Fri Apr 25, 2014 2:18 am Post subject:

Acolyte

Joined: 06 Jun 2003
Posts: 52

Thanks kimbert, I was a bit confused by the '0x33' as that's a 51 in ASCII, and there wasn't one around where it was failing.

That offset happens roughly 2.8m records into a 5.6m record file, is there any reason why the parser might think it has encountered the end of the data there (when the max occurs count is set to unbounded)? And could the offset figure be different in Notepad++ than the Broker (apologies for ignorance if this is a stupid question).

If it makes a difference, I get the Whole File (which is roughly 95mb) in from the FileInput, as I am trying to get the best performance out of the flow whilst not using huge amounts of memory (currently peaks at about 230mb for the EG) but putting it out to the FileOutput line by line (Propagate inside a While Loop).

Thanks for the suggestions though, I will give them a try.

kimbert

Posted: Fri Apr 25, 2014 3:57 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

Yes. I assume that occursCountKind is 'implicit' for the array? If you get a processing error while parsing any occurrence of the array then the DFDL parser will assume that the array has finished and will move on to the 'next' item in the model. In this case, I guess there is no 'next item' so the DFDL parser issues the error.

So why would DFDL issue an error? Hard to say - but maybe there is something wrong with that record and it is causing DFDL to throw a processing error. That error would not be fatal - because the array occurrence was optional ( > minOccurs ). As far as DFDL is concerned, the error simply indicates that the array has ended and the data was supposed to belong to the 'next' item in the model.

However...

Quote:

That offset happens roughly 2.8m records into a 5.6m record file, is there any reason why the parser might think it has encountered the end of the data there (when the max occurs count is set to unbounded)? And could the offset figure be different in Notepad++ than the Broker (apologies for ignorance if this is a stupid question).

If the data in Notpad++ is in hex-ascii format ( two characters per byte ) then the byte offset from broker will need to be doubled in order to get the equivalent character offset when viewed in NotePad++
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » DFDL parser exception on Linux but not on Windows

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP