|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Parsing Problem in double-byte char data using Data Pattern |
« View previous topic :: View next topic » |
Author |
Message
|
james.h |
Posted: Mon Mar 15, 2010 12:55 am Post subject: Parsing Problem in double-byte char data using Data Pattern |
|
|
Newbie
Joined: 14 Mar 2010 Posts: 5
|
I have a flat file in following format and use MRM TDS Data Pattern to parse it. No delimiter among 3 record structure.
Header[1...1]
Detail[1...n]
Trailer[1...1]
Type "Detail" 's data pattern: D[\w\W]{13}, every Detail record should be fixed length 14 bytes.
When I send data without double-byte characters, the parsing is OK,when I sent data double-byte characters, error occurred.
For example, there are 2 double-byte characters in detail record, but the trace shows that '16' bytes were matched using data pattern:
Quote: |
2010-03-14 14:58:45.569366 3176 >> } CPUnicodeConverter::toUnicode
2010-03-14 14:58:45.569393 3176 >> NXDBranchResolver::doesBitstreamMatchDataPattern file:F:\build\S610_P\src\cpi\pwf\nxd\nxdbranchresolver.cpp line:1178 message:5616.BIPv610 'Matched Data Pattern' , 16, 8, 'D[\w\W]{13}', 'Detail',
2010-03-14 14:58:45.569393 3176 >> UserTrace BIP5616I: '16' bytes from byte '8' were matched using data pattern ''D[\w\W]{13}'' for ''Detail''.
2010-03-14 14:58:45.569404 3176 >> } NXDBranchResolver::doesBitstreamMatchDataPattern , true, 'D[\w\W]{13}', |
It was to cause error in the next detail record parsing:
Quote: |
2010-03-14 14:58:45.653903 3176 >> ImbTraceNode::evaluate file:F:\build\S610_P\src\DataFlowEngine\ImbTraceNode.cpp line:341 message:2230.BIPv610 MyApplicationFlow#FCMComposite_1_3 ComIbmTraceNode 'Caught exception and rethrowing' , MyApplicationFlow.Trace
2010-03-14 14:58:45.653903 3176 >> RecoverableException BIP2230E: Error detected whilst processing a message in node 'MyApplicationFlow.Trace'.
The message broker detected an error whilst processing a message in node 'MyApplicationFlow.Trace'. An exception has been thrown to cut short the processing of the message.
See the following messages for details of the error.
2010-03-14 14:58:45.653919 3176 >> MtiImbParser::parseRightSibling file:F:\build\S610_P\src\MTI\MTIforBroker\MtiImbParser2\MtiImbParser.cpp line:731 message:5285.BIPv610 MyApplicationFlow#FCMComposite_1_4 ComIbmFileInputNode 'ImbRecoverableException caught from worker->parseNext.' , 'TestDataPattern', 1, 'Text1', '/BuyerDataWrapper/Detail/element1', MyApplicationFlow.FileInput
2010-03-14 14:58:45.653919 3176 >> ParserException BIP5285E: Parsing errors have occurred.
Message set name: 'TestDataPattern'
Message format: 'Text1'
Message type path: '/BuyerDataWrapper/Detail/element1'
Review other error messages to find the cause of the errors.
2010-03-14 14:58:45.653930 3176 >> NXDWorker::parseNext file:F:\build\S610_P\src\cpi\pwf\nxd\nxdworker.cpp line:462 message:5421.BIPv610 'TDS General Error' , 'BuyerDataWrapper', '/BuyerDataWrapper/Detail(1 of unbounded)', 23,
2010-03-14 14:58:45.653930 3176 >> ParserException BIP5421S: Tagged/Delimited String Format (TDS) parsing error
Current message : ''BuyerDataWrapper''
Path to current element : ''/BuyerDataWrapper/Detail(1 of unbounded)''
Offset from start of message : 23
See following errors for more details.
2010-03-14 14:58:45.653942 3176 >> MtiImbDictionaryIterator::validateTypeContent file:F:\build\S610_P\src\MTI\MTIforBroker\MtiImbParser2\MtiImbDictionaryIterator.cpp line:486 message:5371.BIPv610 'Invalid Member for typeContent closed' , '', '/', 'SEQUENCE',
2010-03-14 14:58:45.653942 3176 >> ParserException BIP5371E: There was a message validation error. Element or attribute '''' failed to validate. The path to the element is ''/''. The element is defined as a child of a complex type or group that has a content validation setting of 'Closed' and composition ''SEQUENCE''.
Possible causes could be:
1. The child has not been defined as a member of its parent complex type or group.
2. The child has been created out of order in the logical tree, for the ordered compositions ('Ordered Set' or 'Sequence').
3. The child is a duplicate, which is not allowed for a composition of 'Ordered Set' or 'Unordered Set'.
As appropriate:
Modify the message set and redeploy it to the broker.
Modify the message flow and redeploy it to the broker.
Modify the input message and resubmit it to the broker. |
Can anybody help me to address this problem?[/code] |
|
Back to top |
|
 |
kimbert |
Posted: Mon Mar 15, 2010 2:20 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Type "Detail" 's data pattern: D[\w\W]{13}, every Detail record should be fixed length 14 bytes.
When I send data without double-byte characters, the parsing is OK,when I sent data double-byte characters, error occurred.
For example, there are 2 double-byte characters in detail record, but the trace shows that '16' bytes were matched using data pattern: |
a) Did you mean 'Every detail record should be 14 *characters*? Bytes are not the same as characters. 12 single-byte characters plus two double-byte characters would give 16 bytes.
b) What is the code page of the incoming data?
c) What code page is the message flow using? |
|
Back to top |
|
 |
james.h |
Posted: Mon Mar 15, 2010 5:36 am Post subject: |
|
|
Newbie
Joined: 14 Mar 2010 Posts: 5
|
Quote: |
a) Did you mean 'Every detail record should be 14 *characters*? Bytes are not the same as characters. 12 single-byte characters plus two double-byte characters would give 16 bytes. |
No, I mean 14 Bytes. If message is encoded by DBCS, the maximum is 7 double-byte characters, if is SBCS, max 14 single-byte characters. I set Length Unit to 'Bytes' in Physical properties of the fixed length element under the 'Detail', the element was parsed correctly but it seemed that the Data Pattern parsed message base on Characters not Bytes. I don't know if it is possible to also change Length Unit used by Data Pattern to 'Bytes'
Quote: |
b) What is the code page of the incoming data? |
it should be Chinese DBCS code page, ccsid = 1381 or 935, which allows for mixed 1-byte and 2-byte character text
Quote: |
c) What code page is the message flow using? |
it should be UTF-8, ccsid = 1208 |
|
Back to top |
|
 |
kimbert |
Posted: Mon Mar 15, 2010 5:49 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
it seemed that the Data Pattern parsed message base on Characters not Bytes. I don't know if it is possible to also change Length Unit used by Data Pattern to 'Bytes' |
No, I'm afraid it's not possible. I can see what you are trying to do, but you will never get a regular expression language to work on bytes. Regular expressions were invented to describe text - any attempt to make them describe binary data is likely to fail.
Quote: |
I set Length Unit to 'Bytes' in Physical properties of the fixed length element under the 'Detail' |
...and presumably you set the 'Length' property to '14' as well. You expected that the TDS parser would limit the length of the data available to the data pattern to 14 bytes.
It was a reasonable assumption, but not correct. TDS is rigidly guided by the 'Data Element Separation' property. If you say the length is determined by the data pattern, then the 'Length' property is ignored.
Your answers to b) and c) were surprising. If the data is in code page 1381 then the TDS parser should be using code page 1381.
Maybe you interpreted my question as 'what is the code page for fields with type 'CHARACTER' stored in the message tree?'. If so, the correct answer is 'UTF-16. Code page 1200'.
I'll wait for your response before trying to guess at a solution. |
|
Back to top |
|
 |
james.h |
Posted: Mon Mar 15, 2010 6:30 am Post subject: |
|
|
Newbie
Joined: 14 Mar 2010 Posts: 5
|
Thank you for your feedback so quickly.
Quote: |
If the data is in code page 1381 then the TDS parser should be using code page 1381. |
Yes, TDS parser uses 1381, but it required to be converted to 1208.
The message which I want to parse has three types of record, layout of message is as follows:
Code: |
Message
Header
indicator char ( 2 ) value is '00'
data char ( 8 )
Detail
indicator char ( 4 ) value is 'CIS1' or 'CIS2'
data char ( 10 )
Trailer
indicator char ( 2 ) value is '99'
count int ( 4 )
|
and the sample input message is as below:
Code: |
00ABCDEFGHCIS1AAaaAAaaAACIS2BBbbBBbbBB990002 |
Because occurrence of Detail is unbounded, I use Data Pattern to identify the Detail record. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Mar 15, 2010 7:50 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Have you considered using Tagged Fixed Length for this? |
|
Back to top |
|
 |
james.h |
Posted: Mon Mar 15, 2010 5:19 pm Post subject: |
|
|
Newbie
Joined: 14 Mar 2010 Posts: 5
|
Quote: |
Have you considered using Tagged Fixed Length for this? |
Yes, I have. but I can not use tag, the reasons are:
1) the length of tag is not fixed
2) there is no delimiter between tag and data in message |
|
Back to top |
|
 |
kimbert |
Posted: Tue Mar 16, 2010 1:23 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I can not use tag, the reasons are:
1) the length of tag is not fixed
2) there is no delimiter between tag and data in message |
If that was the whole story then you would certainly need data patterns. But it's not.
Earlier in this thread you described the message format as:
Code: |
Message
Header
indicator char ( 2 ) value is '00'
data char ( 8 )
Detail
indicator char ( 4 ) value is 'CIS1' or 'CIS2'
data char ( 10 )
Trailer
indicator char ( 2 ) value is '99'
count int ( 4 ) |
You need to define three groups in your message definition. One for the header, with Tag Length set to 2. The second for the repeating Detail record, with Tag Length set to 4. The third for the Trailer record with Tag Length set to 2.
It's your choice whether to use complex elements ( and set Tag Length on the complex types) or use groups, and set Tag Length on the groups. The answer will depend on whether you want Header, Detail and Trailer to be under the same parent node in the message tree. |
|
Back to top |
|
 |
james.h |
Posted: Tue Mar 16, 2010 7:13 am Post subject: |
|
|
Newbie
Joined: 14 Mar 2010 Posts: 5
|
Thank you for your solution, I think it makes sense. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|