MQSeries.net :: View topic - Solved: TDS Data Pattern Problem

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Solved: TDS Data Pattern Problem

Solved: TDS Data Pattern Problem

« View previous topic :: View next topic »

Author

Message

Decky

Posted: Wed Nov 23, 2005 7:52 am Post subject: Solved: TDS Data Pattern Problem

Novice

Joined: 16 May 2005
Posts: 16
Location: London UK

Hi,

I'm trying to create a message set in WBIMB to replace a NEON format. I think I'm on the right track but can't quite get it. The data comes in from a file that has been split into separate messages for each line with return characters removed. Basically there are two types of record:

HEADER - can consist of data such as 'START-OF-FILE', 'PROGRAMNAME=getdata', 'DATEFORMAT=yyyymmdd', '# Security Description', 'ANY_FIELD_NAME', 'FIELDNAME' - to sum up it's a non-fixed length, non-delimited field

DATA RECORD (2 types) - Pipe delimited data record ie: Field1|Field2|...lastField|

I am using my main compound type with Composition set to 'Choice' and in the TDS layer section Data Element Separation is set to 'Use Data Pattern'. Underneath this I have two elements - a simple string element for the header and a data element with a repeating child delimited by '|'.

I have made this work by using several different elements for each type of header each with a different data pattern. ie: 'START_OF.*', '.*=.*', '#.*'. But when I try to generalise these into one element and regex it all starts going wrong. When I make changes one time the data records will parse and then on another the header records or else the data records appear in the header element. I can't get the two to live in harmony. The main difference between the header and data records is that data records all contain the word 'Equity' and have pipes '|' as delimiters - a closing pipe also appears in the record.

At the moment I have ([A-Za-z_ -#=]+[^\|]$) as my data pattern for the header and (.*Equity.*) for the records - I have also tried '.*\|$' and various other combinations. I'm guessing that as the parser has to choose between the elements it tries to match them in the order they appear? And then if they don't match will just try and parse with the last choice regardless? Correct me if I'm wrong, I'm not 100% sure how it works. Hopefully one of you can spot something as regexs aren't my strength

Cheers

Last edited by Decky on Wed Nov 23, 2005 9:04 am; edited 1 time in total

jefflowrey

Posted: Wed Nov 23, 2005 7:54 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Is your header only going to contain one piece of data ("START-OF-FILE","PROGRAMNAME=getdata", etc)? Or more than one?

Well, thinking about it, regardless you should model it as a group. Then either have it as a choice - if it can only contain one, or an unordered set if it can contain many. Then have fields for each type of header, each with their own data pattern.
_________________
I am *not* the model of the modern major general.

Decky

Posted: Wed Nov 23, 2005 8:03 am Post subject:

Novice

Joined: 16 May 2005
Posts: 16
Location: London UK

Thanks for your reply, I have made it work in a similar way to your suggestion but the client would prefer only 2 elements unfortunately.

A stripped example of the data would be

START-OF-FILE
PROGRAMNAME=getdata
DATEFORMAT=yyyymmdd

START-OF-FIELDS
# Security Description
TICKER
EXCH_CODE
NAME
COUNTRY
CRNCY
SECURITY_TYP
PAR_AMT
EQY_PRIM_EXCH
EQY_PRIM_EXCH_SHRT

# Industry Classification
EQY_SIC_CODE
EQY_SIC_NAME
INDUSTRY_GROUP
INDUSTRY_SUBGROUP
INDUSTRY_SECTOR

END-OF-FIELDS

TIMESTARTED=Tue Mar 1 19:17:10 EST 2005
START-OF-DATA
XXXXX YY Equity|0|field|field|......field|
END-OF-DATA
DATARECORDS=1
TIMEFINISHED=Tue Mar 1 19:51:18 EST 2005
END-OF-FILE

Note: each line is coming in as a separate message, unfortunately that is the architecture the client uses

Decky

Posted: Wed Nov 23, 2005 8:05 am Post subject:

Novice

Joined: 16 May 2005
Posts: 16
Location: London UK

I think my main problem is finding distinguishing data patterns that don't overlap

wooda

Posted: Wed Nov 23, 2005 8:25 am Post subject:

Master

Joined: 21 Nov 2003
Posts: 265
Location: UK

Hi Decky,

Quote:

At the moment I have ([A-Za-z_ -#=]+[^\|]$) as my data pattern for the header and (.*Equity.*) for the records - I have also tried '.*\|$' and various other combinations. I'm guessing that as the parser has to choose between the elements it tries to match them in the order they appear? And then if they don't match will just try and parse with the last choice regardless? Correct me if I'm wrong, I'm not 100% sure how it works. Hopefully one of you can spot something as regexs aren't my strength

In a choice the parser will attempt to match each choice option in turn until it finds a match. So if your HEADER occurs before your DATA RECORD in your choice defintion and the header pattern matches the DATA RECORD then it will be parsed as a HEADER.

As you appear to want to match anything as a header unless it specificly matches the DATA RECORD pattern put the DATA RECORD first in your choice with a suitably unique pattern.

As an aside it would appear that you are not fully utilizing the meta data in your message . Were you to build a more specifc model using all the meta ata you have available. You may be able to avoid using data patterns altogther. For example START-OF-DATA<CR><LF> could be used as a TAG or Group Indicator for the the start of your DATA RECORD group. Use of choice and data patterns implies no order in how the records occur.

Decky

Posted: Wed Nov 23, 2005 8:30 am Post subject:

Novice

Joined: 16 May 2005
Posts: 16
Location: London UK

Thanks for the reply wooda I'll try swapping the elements around, as for using the metadata/tags - this isn't possible as each message is a single record and there is no <CR><LF> delimter, ie: as far as the flow is concerned a data record does not exist when it has found a header.

kimbert

Posted: Wed Nov 23, 2005 8:32 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

wooda:
Note that an individual message contains only one line from the example 'message'. Bizarre, but I think that's what Decky said. So his problem is distinguishing between headers and records.
Decky:
As wooda said in his last post, a data pattern which matches a data record but not a header record should do the trick. Make sure it matches *all* of the data record - otherwise you'll end up with bitstream left over, and that will produce a parsing exception.

wooda

Posted: Wed Nov 23, 2005 8:36 am Post subject:

Master

Joined: 21 Nov 2003
Posts: 265
Location: UK

Also your patterns

Quote:

([A-Za-z_ -#=]+[^\|]$)

and

Quote:

.*\|$

appear to be attempting to use $ to anchor the pattern to the end of the sequence.
Message set data patterns follow the XML schema defintion for regular expressions. In which the use of '^' and '$' to anchor an expression to the start/end of the string is not supported.
All patterns are implictly anchored to the start/end of the bitstream they are parsing against.
So ^ and $ in this context are treated as literal characters.

Decky

Posted: Wed Nov 23, 2005 9:02 am Post subject:

Novice

Joined: 16 May 2005
Posts: 16
Location: London UK

Thanks for all your help guys, I've got it nailed. Basically I swapped around the order and put the data record first with a pattern of (.*\|.*\|)
and then it was as simple as using .* for the header pattern.

Thanks Again,

Dec

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Solved: TDS Data Pattern Problem

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP