Author |
Message
|
balaji83it |
Posted: Thu Jun 13, 2013 5:57 am Post subject: Message Parsing using CWF and TDS |
|
|
Acolyte
Joined: 20 Jul 2007 Posts: 72
|
Hello,
I have a message that needs to be parsed. The format is as specified below:
Block_1
======
Field1
Field2
...
Block_2
=========
Field1
Field2
Field3
..
Field n
Both blocks are binary data taken from a COBOL Copy book.
The required message that needs to be parsed is a group of these blocks
1
2
2
2
1
2
2
1
2
2
Each number represents a block.(1- Block_1, 2- Block_2)
i.e. it will be single block_1 followed by multiple block2s.
As a separator, we have Field3 = 99 of the last block2.
We have CWF format from copy book to parse individual blocks. What we need is to parse the multiple blocks as given above.
Any help is greatly appreciated.
Thanks,
Konijeti Balaji. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 13, 2013 6:00 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
So you have a message model. It describes the structure of your data.
It includes information on how many times each piece of that data can repeat. |
|
Back to top |
|
 |
balaji83it |
Posted: Thu Jun 13, 2013 6:15 am Post subject: |
|
|
Acolyte
Joined: 20 Jul 2007 Posts: 72
|
What option of TDS or CWF format let us to specify the value of 99 to function as a delimiter.
Thanks |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Jun 13, 2013 6:18 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The one that indicates that a specific structure in your model has finished repeating. |
|
Back to top |
|
 |
balaji83it |
Posted: Thu Jun 13, 2013 10:28 pm Post subject: |
|
|
Acolyte
Joined: 20 Jul 2007 Posts: 72
|
I undertood the message model and structure but I want to know the impplementation details.
In TDS, we can specify a delimiter if there is one apart from the data. But here the content of the data itself acts like a delimiter. So how can we achieve this?
We can't use CWF as we do not know how many times block2 can get repeated.
So Do we need to use any data patterns or any other similar stuff? If so, please tell us some details on it.
Thanks |
|
Back to top |
|
 |
kimbert |
Posted: Fri Jun 14, 2013 1:09 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
But here the content of the data itself acts like a delimiter. |
That is an important piece of information. You will need to use data patterns to parse this format. That means that you will need to convert your CWF format into a TDS format ( CWF cannot use data patterns ).
Something like this should work:
Assume that Block1 is 10 bytes long and Block2 is 20 bytes long
Code: |
element name='message'
complexType DataElementSeparation='Use Data Pattern'
element name='Block1' minOccurs=1 maxOccurs=1 dataPattern='.*[10]'
....
sequence dataPattern='((.*[20])*)(.*[18])99' DataElementSeparation='Fixed length'
element name='Block2' minOccurs=0 maxOccurs=unbounded
....
|
The sequence group around Block2 is defining a 'box' for all of the Block2 occurrences. This is not tested, and the data pattern may well be incorrect, but hopefully you get the idea. I have used .* ( any character, any number of times) but I suggest that you refine those patterns to match the expected contents of Block1 and Block2. |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Jun 14, 2013 2:56 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I guess I would tend to make field3 optional and set the value of the group terminator to 99, but I'm sure there's some case that I've not considered for why this doesn't work. |
|
Back to top |
|
 |
kimbert |
Posted: Fri Jun 14, 2013 3:33 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
That would work too. I did consider that approach - but Field3 is not, in fact, an optional field in the user's model so I decided to go for the data pattern approach.
As always, there's more than one way to do it. And DFDL would probably do it more easily, but WMB v8 or IIB v9 would be required for that. |
|
Back to top |
|
 |
balaji83it |
Posted: Mon Jun 17, 2013 3:02 am Post subject: |
|
|
Acolyte
Joined: 20 Jul 2007 Posts: 72
|
Good idea Kimbert.
But I have unicode characters in my data. So it parses the individual blocks in CWF nicely. But since I changed the parser to TDS, it does not parse the data properly.
What kind of regex should we use?
I tried few scenarios of regex for unicode char but it fails and the threads are running in infine loops.
Iam killing the DataFlow engine to do a redeployment.
Thanks,
Konijeti Balaji. |
|
Back to top |
|
 |
kimbert |
Posted: Mon Jun 17, 2013 4:31 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
But I have unicode characters in my data. So it parses the individual blocks in CWF nicely. |
Presumably because you have set lengthUnits to 'characters' in your CWF physical format?
Quote: |
But since I changed the parser to TDS, it does not parse the data properly. |
Please provide full information on exactly what is going wrong.
You could always try mqjeff's suggestion - and it might perform better than using data patterns. |
|
Back to top |
|
 |
|