|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
DFDL parser exception on Linux but not on Windows |
« View previous topic :: View next topic » |
Author |
Message
|
mattynorm |
Posted: Fri Apr 25, 2014 12:57 am Post subject: DFDL parser exception on Linux but not on Windows |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
couple of people were very helpful on here in getting a flow processing a large file (5.6m rows) with sensible memory allocation. Thread can be found here
http://www.mqseries.net/phpBB2/viewtopic.php?t=67032
and on my Windows VM it works fine (takes 41ish minutes, but that should be ok as when live it should be running at 3 in the morning).
Have now deployed the code to a Linux server, and the tests were working fine, until I tried to run a full size file, when the flow fails DFDL parsing (usually about 2.8m records in) and throws an error.
Quote: |
BIP5807E: The DFDL parser signalled that a processing error occurred. The message from the DFDL parser is: CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33...'. |
Have looked at the offset in Notepad++, can't see anything untoward about the line in question. Copied that line (and several around it) to a test file, ran that through and it worked fine.
If it helps, the Parser Exception being generated is
Code: |
<ParserException>
<File>/build/slot1/S900_P/src/MTI/MTIforBroker/DfdlParser/ImbDFDLErrorHandler.cpp</File>
<Line>174</Line>
<Function>ImbDFDLErrorHandler::handleParserErrors</Function>
<Type>ComIbmFileInputNode</Type>
<Name>AvailableStockToWCS#FCMComposite_1_5</Name>
<Label>AvailableStockToWCS.File Input1</Label>
<Catalog>BIPmsgs</Catalog>
<Severity>3</Severity>
<Number>5807</Number>
<Text>An error occurred whilst parsing with DFDL</Text>
<Insert>
<Type>5</Type>
<Text>CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33...'.</Text>
</Insert>
</ParserException> |
Sample file
Code: |
Header1,Header2,Header3
30040611006,0770,0
30040611006,0775,1
30040611006,0780,1
30040611006,0785,2
30040699999,0790,0
30040611006,0795,1
30040611006,0800,1
30040611006,0805,1
30040611006,0810,1
|
And the dfdl message it is parsing against is here
Code: |
<?xml version="1.0" encoding="UTF-8"?><xsd:schema xmlns:csv="http://www.ibm.com/dfdl/CommaSeparatedFormat" xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/" xmlns:ibmDfdlExtn="http://www.ibm.com/dfdl/extensions" xmlns:ibmSchExtn="http://www.ibm.com/schema/extensions" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:import namespace="http://www.ibm.com/dfdl/CommaSeparatedFormat" schemaLocation="../../../../../IBMdefined/CommaSeparatedFormat.xsd"/>
<xsd:annotation>
<xsd:appinfo source="http://www.ogf.org/dfdl/">
<dfdl:format documentFinalTerminatorCanBeMissing="yes" encoding="{$dfdl:encoding}" escapeSchemeRef="csv:CSVEscapeScheme" ref="csv:CommaSeparatedFormat"/>
</xsd:appinfo>
</xsd:annotation>
<xsd:element ibmSchExtn:docRoot="true" name="StockDB_Webstock_Stock">
<xsd:complexType>
<xsd:sequence dfdl:separator="">
<xsd:element dfdl:terminator="%CR;%LF;%WSP*;" name="header">
<xsd:complexType>
<xsd:sequence>
<xsd:element ibmDfdlExtn:sampleValue="head_value1" name="head_ArticleID" type="xsd:string"/>
<xsd:element ibmDfdlExtn:sampleValue="head_value2" name="head_SAPStoreID" type="xsd:string"/>
<xsd:element ibmDfdlExtn:sampleValue="head_value3" name="head_AvailableStock" type="xsd:string"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element dfdl:occursCountKind="implicit" dfdl:terminator="%CR;%LF;%WSP*;" maxOccurs="unbounded" minOccurs="0" name="record">
<xsd:complexType>
<xsd:sequence>
<xsd:element dfdl:textNumberPattern="#0" name="ArticleID" type="xsd:string"/>
<xsd:element dfdl:textNumberPattern="#0" name="SAPStoreID" type="xsd:string"/>
<xsd:element dfdl:textNumberPattern="#0" name="AvailableStock" type="xsd:integer"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
|
The only thing I could think of is a difference between how Linux and Windows interprets the terminator, but that doesn't explain why it will happily process 2m records before failing.
Any ideas? |
|
Back to top |
|
 |
kimbert |
Posted: Fri Apr 25, 2014 2:01 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
This error:
Code: |
CTDP3002E: Unexpected data found at offset '49581665' after parsing completed. Data: '0x33 |
is issued when the DFDL parser finishes parsing the entire message ( gets to the end of the message model ) and it has not consumed the entire input bitstream.
Quote: |
The only thing I could think of is a difference between how Linux and Windows interprets the terminator, but that doesn't explain why it will happily process 2m records before failing. |
Well, it might explain it. After all, the DFDL parser will not know that there is data 'left over' until it gets to the end of the model.
I notice that you are setting the terminator thus:
Code: |
dfdl:terminator="%CR;%LF;%WSP*;" |
In English, that means 'a cr/lf pair followed by optional white space'. Is that what you are expecting in your Linux file?
If you need to tolerate any type of line terminator then you could use this instead:
Code: |
dfdl:terminator="%NL;%WSP*;" |
or, if you don't need to skip blank lines, this:
Code: |
dfdl:terminator="%NL;" |
But make sure that you repeat your performance tests after any such changes. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
mattynorm |
Posted: Fri Apr 25, 2014 2:18 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Thanks kimbert, I was a bit confused by the '0x33' as that's a 51 in ASCII, and there wasn't one around where it was failing.
That offset happens roughly 2.8m records into a 5.6m record file, is there any reason why the parser might think it has encountered the end of the data there (when the max occurs count is set to unbounded)? And could the offset figure be different in Notepad++ than the Broker (apologies for ignorance if this is a stupid question).
If it makes a difference, I get the Whole File (which is roughly 95mb) in from the FileInput, as I am trying to get the best performance out of the flow whilst not using huge amounts of memory (currently peaks at about 230mb for the EG) but putting it out to the FileOutput line by line (Propagate inside a While Loop).
Thanks for the suggestions though, I will give them a try. |
|
Back to top |
|
 |
kimbert |
Posted: Fri Apr 25, 2014 3:57 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
That offset happens roughly 2.8m records into a 5.6m record file, is there any reason why the parser might think it has encountered the end of the data there (when the max occurs count is set to unbounded)? |
Yes. I assume that occursCountKind is 'implicit' for the array? If you get a processing error while parsing any occurrence of the array then the DFDL parser will assume that the array has finished and will move on to the 'next' item in the model. In this case, I guess there is no 'next item' so the DFDL parser issues the error.
So why would DFDL issue an error? Hard to say - but maybe there is something wrong with that record and it is causing DFDL to throw a processing error. That error would not be fatal - because the array occurrence was optional ( > minOccurs ). As far as DFDL is concerned, the error simply indicates that the array has ended and the data was supposed to belong to the 'next' item in the model.
However...
Quote: |
That offset happens roughly 2.8m records into a 5.6m record file, is there any reason why the parser might think it has encountered the end of the data there (when the max occurs count is set to unbounded)? And could the offset figure be different in Notepad++ than the Broker (apologies for ignorance if this is a stupid question). |
If the data in Notpad++ is in hex-ascii format ( two characters per byte ) then the byte offset from broker will need to be doubled in order to get the equivalent character offset when viewed in NotePad++ _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|