MQSeries.net :: View topic - Complicated message parsing

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Complicated message parsing

Complicated message parsing

« View previous topic :: View next topic »

Author

Message

jsp_ejb

Posted: Sun Jun 27, 2004 8:17 pm Post subject: Complicated message parsing

Novice

Joined: 27 Jun 2004
Posts: 21
Location: Chicago

The message I am dealing with have the following format: (WBIMB 5.0.3 WIN 2K)
A-Complex type, type id: 0006, followed by data with length fixed (13) ,
B-Complex type, type id: 0007, followed by data with fixed length (20),
C-Complex type, type id: 0008, followed by data with fixed length (39) ,

Before each A, B, C type, there is a header with the following format:
Header id (fixed, 0004), type id, number of repeating times for A,B,C.

So, for type A, the sub-message could look like:
0004000600020006A0006A (0004 is header id, 0006 is id for type A, 0002 means A repeats twice, 0006 is type A id, A could be any alpha-numeric data)

For type B:
0004000700030007B0007B0007B (Type B repeating 3 times)

Example message: (Please note: A,B,C are actually complex type, data could be any alpha-numeric value, but fixed length)
Message 1: 0004000600020006A0006A0004000700030007B0007B0007B0004000800010008C (A repeat 2 times, B three times and C once).

Message 2: 0004000600020006A0006A0004000800010008C (A twice, B - 0 times, C once)

If A,B,C always shown on the message, I could easily use CWF with repeating elements. But, any type could be missing from the message and there is no header information about it. So, it seems that TDS is the way to go.

I tried to use repeating choice of tagged fixed length. For message 1 & 2, the parsing stopped at 0004000600020006A0006A. The first 0006 is the tag for choosing A or B or C. Once the choice A is decided, the sub-message(00020006A0006A) itself is modeled as a tagged fixed length(0006 is the tag). Parsing is successful if this is the end of message. For message 1 & 2, another type is following 0004000..... , thus, the parser got confused and treated 0004 as an invalid tag for the sub-message.

I am really confused. Is this the WBIMB (5.0.3) defect similar to http://www-1.ibm.com/support/docview.wss?rs=849&context=SSKM8N&uid=swg1IY53576&loc=en_US&cs=utf-8&lang=en? Or I am completely off the track?

shanson

Posted: Mon Jun 28, 2004 4:49 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

Try using TDS but with Data Element Separation of 'Use Data Pattern' for
both the headers and the bodies. 'Use Data Pattern' matches the actual data to the pattern you supply. Given that each of your structures uniquely
identifies itself, and each header identifies the following body, I think
this should work ok.

jsp_ejb

Posted: Tue Jun 29, 2004 8:44 pm Post subject:

Novice

Joined: 27 Jun 2004
Posts: 21
Location: Chicago

Shanson,

Thanks for your reply. I did not try 'Use Data Pattern' match for TDS before since the number of repeats is set on the header and all types (A,B,C) are fixed length. 'Use Data Pattern' dynamically figures this out, is there any performance implications?

Nervertheless, I first tried to use 'Use Data Pattern' as data element seperation for individual A,B,C type. And Still kept using repeating choice of tagged fixed length (buffer id as the tag to decide on either A or B or C) to seperate different types. You know what -- it worked. I can parse message 1&2 without any problem.

Anybody use 'Use Data Pattern'? Can you share your experience related to performance ?

Thanks,

shanson

Posted: Wed Jun 30, 2004 12:53 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

All I can say is that Use Data Pattern is usually the slowest of the TDS Data Element Separations It is difficult to provide exact figures, as there are all sorts of factors that influence TDS messages. The rule to go by is the more the TDS parser has to scan the message bit stream for character(s) in order to match the data to the model, the slower it will be. So we can say with confidence that Fixed Length is the quickest (no scanning), Use Data Pattern is usually the slowest (lots of scanning), and the use of delimiters and/or tag data separators slows the parser too, but nothing more concrete.

jefflowrey

Posted: Wed Jun 30, 2004 4:40 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

On a different standard of consideration, however, Use Data Pattern may be more "intuitive" for someone who has worked with regular expressions before.

TDS tagging and fixed-length processing is just complicated to get right the first time you've done it.
_________________
I am *not* the model of the modern major general.

jsp_ejb

Posted: Wed Jun 30, 2004 9:09 pm Post subject:

Novice

Joined: 27 Jun 2004
Posts: 21
Location: Chicago

As the performance is the key factor, do you think "TDS tagging and fixed-length" is doable for me to handle the cases listed above?

shanson

Posted: Thu Jul 01, 2004 1:12 am Post subject:

Partisan

Joined: 17 Oct 2003
Posts: 344
Location: IBM Hursley

jsp_ejb wrote:

Nervertheless, I first tried to use 'Use Data Pattern' as data element seperation for individual A,B,C type. And Still kept using repeating choice of tagged fixed length (buffer id as the tag to decide on either A or B or C) to seperate different types. You know what -- it worked. I can parse message 1&2 without any problem.

I need to understand more clearly what you did. Can you show me the model you have created, in simple terms, so that I can see the complex types you are using and their Data Element Separation? Thanks.

Last edited by shanson on Thu Jul 01, 2004 3:29 am; edited 1 time in total

wooda

Posted: Thu Jul 01, 2004 3:13 am Post subject:

Master

Joined: 21 Nov 2003
Posts: 265
Location: UK

jsp_ejb your original example is confusing.

you state that A,B and C contain an Id followed by a fixed length of data of 13,20 and 39 bytes respectively.

However your example data appears to have only 1 byte of data for each.

Please clarify.

wooda

Posted: Thu Jul 01, 2004 4:38 am Post subject:

Master

Joined: 21 Nov 2003
Posts: 265
Location: UK

This can be modelled without Data Pattern.
There are at least two (and probalby more) ways of doing it.

I'd suggest a few things.

1. Combine the 0004 and the typeid to make one tag. So you have a sequence of 3 optional top level elements 00040006, 00040007 and 00040008 inside a Tagged Fixed Length type with length of tag=8.

2. Inside each element you again use Tagged Fixed Length this time with legnth of tag=4. Model the 4 byte count field as an embedded fixed length group with a single child (because it's not tagged). Model the repeating part as an element with a tag (eg. 0006) with a length defined (eg 13 characters for 0006). Set the max occurs of the repeating field to 9999 to indicate the maximum number of repeats you could have (this is the largest integer that the count could have)

The only issue would be that you cannot validate that the repeat count matches the number of repeats. TDS doesn't need the count to parse and so if it is wrong no error will be thrown. You could check this in ESQL.

This works.

However as I say there are other ways to do it. The choice is yours.

Regards,

Alex

jsp_ejb

Posted: Thu Jul 01, 2004 9:00 pm Post subject:

Novice

Joined: 27 Jun 2004
Posts: 21
Location: Chicago

<wooda>

Quote:

you state that A,B and C contain an Id followed by a fixed length of data of 13,20 and 39 bytes respectively.
However your example data appears to have only 1 byte of data for each.

</wooda>
A, B and C are indeed fixed length (13,20,39 respectively). In the example data, I use A to indicate the 13-char length data so that the message patterns are easily identified.

<shanson>

Quote:

I need to understand more clearly what you did.

</shanson>

Message 1: 0004000600020006A0006A0004000700030007B0007B0007B0004000800010008C

<response>

Quote:

(Data element separation -- tagged fixed length)
(Composition -- Choice)
(Max Occ -- 3, min occ -- 1)

Quote:

(Data element separation -- Fixed length)

Quote:

(length = 4)

<data>

Quote:

(Data element separation -- use data pattern)

Quote:

(regexp: 0006)

Quote:

(regexp: .{13})

</A>
<A>
<header>0006</header>
<body>A</body>
</A>
</data>
</Type A>
...

I will try wooda's excellent suggestion next -- point #2 is actually where I started with modelling the repeats -- failed for me first time.

Thanks

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Complicated message parsing

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP