Author |
Message
|
er_pankajgupta84 |
Posted: Tue Nov 17, 2009 2:12 pm Post subject: Data pattern problem in message set |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
What would be equivalent of the following pattern for a Message Set - Data Pattern.
0001.*?<>
If the input string is 0001jjkfjw<>0001nhwfeff<>0001kfnkwenf<> then after parsing the records should be
0001jjkfjw<>
0001nhwfeff<>
0001kfnkwenf<>
I have tried giving the above regular expression in data patterns in message set but its given invalid pattern and if i try - 0001.*<>
then it given just one record after parsing i.e. the entire string.
Parsers does a greedy parsing for "*" so it takes the entire string as one record.
Any pointer is appreciated. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Nov 17, 2009 2:26 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Two important points here:
1. It's not just message broker that interprets regular expressions in this way. Any reg ex parser will do the same. Message broker's is actually based on the Xerces reg exp. engine.
2. I think you might be asking the wrong question anyway. You are asking us to make your solution work. You might get a better result if you describe the message format and how you want it to be parsed. It is quite possible that 'Use Data Pattern' is a really roundabout way to solve your problem. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Nov 17, 2009 3:25 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Thanks for your reply..
My actual problem is quite complex and i doubt on explaining the same in words..
I agree with you that it depends on the regex engine how to parse RE but it is not working as expected in a message set.
i need to generate a regular expression that can accept any character upto a given delimiter. In this case it is "<>". I am able to create a regular expression for that and that is working in java but not in message set.
I will try to explain my problem in a short:
I have a message that has 2 complex type record having 2 fields each. This records can occurs n number of times
1. First field is fixed length : say 5 characters and
2. Second field is of variable length.
So i have to use data patterns to decide the type of record and the "variable length delimited" data element separation at the record level to get the fields.
input is: 00001nlkvkjehnvjhnejvnnnsdk<>00001vbdhcv<>00002bccjkdcj<>00002jkjd<>
Now it has 2 occurences of each type of record (00001, 00002)
I have given Data element separation as "Use data pattern" on message level and "variable length delimited" at record level.
I cannot use TAG delimited as the TAG (00001 or 00002) is not fixed it could be (11111 and 11112). Its like it ends with 1 or 2 but can have any character for first 4 places.
so the regular exp that i mentioned earlier, in actual, will like:
[0-9]{4}1.*?<>
unfortunately it works in java but not in mset. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Nov 17, 2009 4:13 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I think the problem is with the "?" which we add to make an expression non greedy. Message set is not recognizing this character and generating an exception "Invalid pattern". |
|
Back to top |
|
 |
kimbert |
Posted: Tue Nov 17, 2009 6:02 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Thanks - that's clear enough now.
The . (period) is too greedy because it matches *any* character, including the ones which are supposed to terminate the match.
You need to say
'not <, or < not followed by >'*
The reg ex that you need is
Code: |
[0-9]{4}1([^<]|(<[^>]))*<> |
or, if you are certain that '<' cannot occur in the data, you could shorten it to
|
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Tue Nov 17, 2009 6:33 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Thanks ..
you got the problem right... the data may have the delimiter as part of it so i cannot use the second pattern suggested by you...
[0-9]{4}1([^<]|(<[^>]))*<> might work well but what if the delimiter is <||>
I specify only a part of my problem earlier...
in actual i have the delimiter as <||> and strings like <|, ||, |>, |||, <||, ||> make come in the data.
i have tried following expressions:
[0-9]{4}1.[^(<\|\|>)]*<\|\|>
but did nt work
i would appreciate any kind of pointers... |
|
Back to top |
|
 |
kimbert |
Posted: Wed Nov 18, 2009 12:52 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Welcome to the wonderful world of regular expressions! You need an even more complex regex which looks like this:
Code: |
[0-9]{4}1[^<]|(<[^|])|(<|[^|])|(<||[^>])*<||> |
The important bit is this
Code: |
[^<]|(<[^|])|(<|[^|])|(<||[^>]) |
which matches:
- not < OR
- < not followed by | OR
- <| not followed by | OR
- <|| not followed by > |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Wed Nov 18, 2009 6:18 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
I tried this but this was not working for all inputs ...
for ex: if my input has some partial random order of delimiter then it is not identifying the records correctly.
input:
00001hcjdhfnsd<|| > <<< \> || ||>nhjkh <||>00002hcjdhfnsd<|| > <<< \> || ||>nhjkh <||>
its failing for this kind of input. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Nov 18, 2009 7:40 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
its failing for this kind of input. |
Not surprising really - I haven't tested this. I was assuming that you would get the idea and work out a solution.
I suspect that you need to escape the literal | ( pipe ) characters in in the regex:
Code: |
[0-9]{4}1[^<]|(<[^\|])|(<\|[^\|])|(<\|\|[^>])*<\|\|> |
If that doesn't work, you'll have to do some testing / research to work out what's going wrong. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Wed Nov 18, 2009 7:47 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
i have already taken care of escaping.
while testing in java use \\ to escape | and in mset used \ to escape |.
<0x3E>||<<0x3F> is the delimiter i have used in my message set. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Wed Nov 18, 2009 8:34 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
we need to find something like this..
read upto you do not encounter a <||> i.e [^(<||>)] but unfortunately this does not behaves as group. Basically when we put some character in a () then it behaves as group but if we put a ^ sign before () then it becomes individual characters.
I am not able to form any regular exp that can read upto <||> for a message set. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Wed Nov 18, 2009 2:04 pm Post subject: Solved |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
Finally after some r n d it got solved.
This expression works for me:
[0-9]{4}1([^<]|(<[^\|])|(<\|[^\|])|(<\|\|[^>]))*<\|\|>
It just need the () on top of all the negate expressions.
Thanks kimbert for all of your pointers. I really appreciate your help. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Wed Mar 03, 2010 3:44 pm Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
This regular expression is blowing up the execution group sometimes.
Quote: |
[0-9]{4}1([^<]|(<[^\|])|(<\|[^\|])|(<\|\|[^>]))*<\|\|> |
This was the regular expression i used to parse records in message set.
Now,
records like 12341nf;jkhe;jkrgf;rejkg;jekrgj;kerhfvhfbvlhbv<||>
are getting thru but when i increase the size of this record then its abending the execution group. For example 12341nf;jkhe;jkrgf;rejkg;jekrgj;kerhfvhfbvlhbvhrjherbghbhljer.....2000 more bytes here<||>
No reason is given in the user and service trace.
I replicate the scenario in java node as well. This pattern is blowing up the JVM in java itself if I try to parse longer input string.
Can anyone validate this regular expression (Data pattern).
Basically I am trying to read as many characters till i encounter <||> |
|
Back to top |
|
 |
kimbert |
Posted: Thu Mar 04, 2010 12:22 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
You have hit a known problem with regular expressions and long messages. It's possible that there is a fix for this available, so you *could* open a PMR and see what IBM says.
However...that does not solve today's problem. I'm fairly sure that you can model this format without using data patterns. The first 4 characters can be modelled as a fixed-length field. The rest of the line up to the <> can be modelled as a tagged delimited group.
If you want to try that, let me know and I'll post the details. |
|
Back to top |
|
 |
er_pankajgupta84 |
Posted: Thu Mar 04, 2010 6:49 am Post subject: |
|
|
 Master
Joined: 14 Nov 2008 Posts: 203 Location: charlotte,NC, USA
|
yes if some other way is possible then definitely i will try. Here are some more detail that might help you in guiding me.
Message looks like this:
00083773(||)00000(||)jjkjdks(||)8737384(||)jfufuifuief[||]
00083774773(||)00100(||)jjjfjkfkjdks(||)873847384(||)ncnjfufuifuief[||]
00084574773(||)00200(||)jjjfjkfkjdks(||)873847384(||)ncnjfufuifuief[||]
...so on..
Where [||] is the delimiter between records and (||) is the delimiter between fields. Each field other than 2nd field of each record is of variable length. Second field of each record will identify the record type. For example : "00000", "00100", "00200" etc.
Please note that there is no carriage return with in the message. I have added it for readability.
Once we retrieve the record we can retrieve the fields by using "All elements delimited" as data element separation at record level. But the problem is in retrieving the records.
So I used "Use data Pattern" as DES(data element separation) at message level.
I used following data pattern:
First record:
[ 0-9]+\(||\)00000([^\[]|(\[[^\|])|(\[\|[^\|])|(\[\|\|[^\]]))*\[\|\|\]
Second record:
[ 0-9]+\(||\)00100([^\[]|(\[[^\|])|(\[\|[^\|])|(\[\|\|[^\]]))*\[\|\|\]
and so on..
"\" is the escape character used for escaping [ and ( in regular expression.
Let me know if you think that this message can be modeled without using data patterns. |
|
Back to top |
|
 |
|