Author |
Message
|
NiceGuy |
Posted: Tue Aug 24, 2010 8:11 pm Post subject: Parse a line containing only <CR><LF> into a mes |
|
|
Apprentice
Joined: 11 Jun 2009 Posts: 37
|
Hi community,
Does anyone have an idea how I can enable my message set to parse/accept lines containing only the newline (<CR><LF>) into a member in my message set.
For Example:
Code: |
<CR><LF>
<CR><LF>
<CR><LF>
|
You'll note the newline characters are by themselves and have no other characters in their respective lines. This occurs at various places in my message input.
Currently I have created an element in my message set called NEWLINE
that has its "Data Element Separation" set to "Use Data Pattern"
The Data Pattern for the element I put simply as: [\n]
OF course the element repeats, that is, Min Occurs=1 .. Max Occur=-1
Unfortunately this does not appear to work .. any ideas.
I'm a junior .. be gentle
Thanks community |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Aug 24, 2010 8:16 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Check out the TDS parser in the infocenter.
You might have to use "<CR><LF>" as a pattern instead of "\n"
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Aug 25, 2010 1:42 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
You should only use Data Patterns if you can't use other features of TDS to match the data, they are significantly slower.
Can you explain further what you are trying to see in your logical tree from your model? Are you trying to see empty but existing elements for each of these blank lines? Are you trying to see elements that contain the value "<CR><LF>"? Are you trying to NOT have any elements added to your tree for these blank lines? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Aug 25, 2010 3:48 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
Currently I have created an element in my message set called NEWLINE
that has its "Data Element Separation" set to "Use Data Pattern"
The Data Pattern for the element I put simply as: [\n] |
The Data Element Separation property applies to the children of the complex type/group. So if you have set the Data Pattern on the NEWLINE element it will be ignored unless Data Element Separation='Use Data Pattern' on its parent group/type.
Other than that, mqjeff is corrrect - we need to understand what your input looks like, what message tree structure you want to obtain, and why. |
|
Back to top |
|
 |
NiceGuy |
Posted: Wed Aug 25, 2010 6:52 am Post subject: |
|
|
Apprentice
Joined: 11 Jun 2009 Posts: 37
|
Thanks everyone this far for helping out,
K allow me to clarify further. Let me start out by presenting a larger segment of my input message. Hopefully this helps explain further.
Code: |
DETAIL_LINE 1(min occurs) -1(max occurs)
CRLF 1(min occurs) -1(max occurs)
|
Code: |
DETAIL_LINE:
-detailLine1
-productNumber
-open
-shipped
-order
-tax
-price
-uom
-extended
-detailLine2
-descriptionline1
-detailLine3
-descriptionLine2
-detailLine4
-emptyline
CRLF
|
Input Message:
Code: |
05710610155 1 1 0 Y 35.74 CS 35.74<CR><LF>
FORK PLASTIC SILVER (600) <CR><LF>
REFLECTIONS <CR><LF>
<CR><LF>
<CR><LF>
<CR><LF>
<CR><LF>
|
*please note the last three <CR><LF> can vary in frequency, that is, 3,4,5 could appear in theory. So I show only three here for brevity.
First Line (detailLine1):
The the first line starting with 05710610155 represents first line in the invoice detail. This generally parses fine I've configured this segment as tagged delimited with a group terminator of <CR><LF>.
Second Line (detailLine2):
The second line also parses fine .. again I've configured this segment as tagged delimited with a group terminator of <CR><LF>.
Third Line (detailLine3):
Same as first two.
Fourth Line (detailLine4):
The fourth line is a long empty line of spaces followed by a <CR><LF>.
This line is set to Data Element Separation: Use Data Pattern. The only element inside EmptyLine has its Data Pattern: [ ]+. The parent detailLine4 has its Group Terminator: <CR><LF>
CRLF
This is where the majority of my problems reside, well in theory it could be the parsing transition from detailLine4 to CRLF so my problem could reside in either of the two. I've tried two variations of this
one by adding an element to CRLF, that essentially tries to swallow the newline .. the element name (call it "newline") uses the Data Pattern: [\n]+. The parent CRLF has its Group Terminator: left blank.
The second variation was simply removing the ("newline") element inside
the CRLF parent and simply setting the CRLF Group Terminator: <CR><LF>. The CRLF then was set to repeating that in the message, that is, 1 (min occurs) and -1 (max occurs).
I apologize for the lengthy post .. never realized how difficult it is to explain something I.T related in writing.
Regardless, I hope I did a better job this time explaining my problem.
Thanks again for helping out. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Aug 25, 2010 7:22 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
If nothing else works well, you could drop into a Java Compute Node and parse the data in the JCN. On the MQInput node, if you choose this method, choose "BLOB" (ie. no parsing from one of Broker's default parsers). _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
kimbert |
Posted: Wed Aug 25, 2010 7:57 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Thanks - much clearer now. I've got more questions, though.
a) will detailLine4 always consist entirely of spaces, with at least one space being present always?
b) Why do you get these trailing <CR><LF>s - do they represent empty records? If so, are you 100% certain that those records will always be empty?
Is 5 the absolute maximum number that you will ever get?
Once I know the answers to those questions, I'll have some suggestions. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Aug 25, 2010 7:58 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
If nothing else works well, you could drop into a Java Compute Node and parse the data in the JCN |
Sometimes that's the correct approach, but not this time. This is a simple enough format. |
|
Back to top |
|
 |
NiceGuy |
Posted: Wed Aug 25, 2010 8:22 am Post subject: |
|
|
Apprentice
Joined: 11 Jun 2009 Posts: 37
|
Thanks once again for your assistance,
Let's see if I can clarify ....
Quote: |
1) will detailLine4 always consist entirely of spaces, with at least one space being present always?
2) Why do you get these trailing <CR><LF>s - do they represent empty records? If so, are you 100% certain that those records will always be empty?
3)Is 5 the absolute maximum number that you will ever get? |
Answers:
A1) Yes detiailLine4 is a long line consisting of entire spaces followed by a CFLF. Though the amount of spaces is not definitive, in theory, yes at least one space is assumed/expected before a <CR><LF> is reached.
A2) To be honest .. the lines containing the <CR><LF> alone, like those following the detiailLine4 are meaningless/garbage. If i had to give my opinion, they are put there to act as a transition into the second part of the invoice message input (I have purposely left out the bottom portion in my posting since its irrelevant at this point.) They do not represent anything meaningful I suppose, except to separate Invoice Page 1 from Invoice Page 2.
For example:
Code: |
Invoice Top Portion
<CR><LF>
<CR><LF>
<CR><LF>
Invoice Bottom Portion
|
A5) No in fact there is no definitive value on the number of <CR><LF> that could follow after all the INVOICE_DETAILS have been parsed. Expect the unexpected sort of speaking.
I guess my goal is to simply allow my message set to expect/absorb these <CR><LF> before processing the bottom portion of the invoice.
I realized that someone might suggest a JaveComputeNode but, being only a junior and given my relative inexperience ... still I already feel pretty comfortable saying that, that approach, seem's like more logic than perhaps necessary.
Hoping that I clarified better, please forward any other questions
you desire .. at this point .. I am pretty much at a standstill.
Thanks again |
|
Back to top |
|
 |
kimbert |
Posted: Thu Aug 26, 2010 1:34 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Good answers. Here are my suggestions:
1. Remove all data patterns from your model. This is a fairly basic line-oriented format, so All Elements Delimiter or Tagged Delimited can do the job, and will be faster.
2. Treat the <CR><LF>s between the top portion and bottom portion as markup, so that they don't appear as elements in the message tree. I've sketched out a possible model below. I haven't tested it, but I think something like this will work:
Code: |
element name='Invoice'
complexType DataElementSeparation='All Elements Delimited' Delimiter='<CR><LF>'
element name='TopPortion' minOccurs='1' maxOccurs='1'
complexType DataElementSeparation='All Elements Delimited' Delimiter='<CR><LF>' GroupTerminator='<CR><LF>'
element name='detailLine1'
element name='detailLine2'
element name='detailLine3'
element name='detailLine4'
sequence GroupIndicator='<CR><LF>' minOccurs='0' maxOccurs='unbounded'
<no members for this sequence group>
element name='BottomPortion' minOccurs='1' maxOccurs='1'
<content of bottom portion>
|
|
|
Back to top |
|
 |
|