ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexWebSphere Message Broker (ACE) SupportParse data at runtime,the delimeter only known at runtime

Post new topicReply to topic
Parse data at runtime,the delimeter only known at runtime View previous topic :: View next topic
Author Message
matuwe
PostPosted: Tue Oct 25, 2011 1:44 pm Post subject: Parse data at runtime,the delimeter only known at runtime Reply with quote

Master

Joined: 05 Dec 2007
Posts: 296

Hi

Please advice, I am not sure how to handle this flow. I am retriving data from a DB, which is a BLOB. I need to Convert to CHAR and the data then looks like
Quote:

Name|Surname|age
help|Test|23


When retrieving the data part of the data will give me the record delimiter and the row delimiter. So I need to take all this information and do an insert into another database.

The delimiter can be anything, a comma, pipe

I wrote a JAVA compute and used Stringtoken to find rows and columns, but it just takes for ever to do 50 000 records. Is there anyway that I can do this parsing?
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Tue Oct 25, 2011 1:54 pm Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

If you are on V7, the DatabaseInput node may work for you. By handling the db rows as they come in rather than in batch mode, you may be able to tolerate the longer latency for parsing and conversion.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
matuwe
PostPosted: Tue Oct 25, 2011 2:06 pm Post subject: Reply with quote

Master

Joined: 05 Dec 2007
Posts: 296

Owww thanks a million for the quick response.

I am on V7. Can the DBInput accept delimeters at runtime . My other issue is the database that I writing to only accepts JDBC and want me to call their Stored Proc and insert record at the time.

I am not sure I understand correctly, can the the db insert accept SQL statement at runtime too?

Thanking you in advance
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Tue Oct 25, 2011 2:18 pm Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

mqjeff, kimbert or Vitor may be better resources for this situation. You can do all those things you mentioned, but the parsing part your still needing to be done in a compute node, either ESQL or Java. Probably Java would be more flexible, but the three people aforementioned could offer you some more accurate advice to parse out the blob the way you want in the fastest way possible.

DatabaseInput node starts processing when it senses a new record in the table it is monitoring.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
matuwe
PostPosted: Tue Oct 25, 2011 2:24 pm Post subject: Reply with quote

Master

Joined: 05 Dec 2007
Posts: 296

Owww cool thanks a million. I shall wait for more feedback hopefully some one will give me a more robust solution to handle this..
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Oct 25, 2011 2:40 pm Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Seems to be "parser week" on this forum...
Quote:
When retrieving the data part of the data will give me the record delimiter and the row delimiter
The MRM parser *can* cope with varying delimiters, but it is not easy, and there are limits to what it can handle.
However, if you answer the following questions about the data format I can give some specific advice:
Q1: Does your data format have a well-known name. Like 'HL7' or 'EDIFACT' or 'ACORD'?
Q2: How *exactly* are the delimiters specified in the data? Is there some kind of header records with a special field that describes the delimiters?
Q3: You said 'The delimiter can be anything, a comma, pipe'. I guess that in the real world, one or two of those are used in 99% of messages. If you know ( or can find out ) the most common cases then you can optimize your flow to check for the common cases. But that requires a proper understanding of your data format, so I can't tell you how to do that until you answer Q1 and/or Q2.
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Oct 25, 2011 2:44 pm Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Oh...and before somebody else jumps in and says it, the other answer is:

This would be easy if you could use DFDL. Unfortunately WMB v8 isn't available just yet...
Back to top
View user's profile Send private message
matuwe
PostPosted: Tue Oct 25, 2011 2:55 pm Post subject: Reply with quote

Master

Joined: 05 Dec 2007
Posts: 296

Oww thanks for the response..

Currently all my 50 interfaces uses | pipe delimiter and the rows uses \n colon deliminator. All the data will be handles a CHAR datatype.

The first row on my payload tells me what delimiter to use, the second row shows me the insert statement for the destination and the last row is what contains the BLOB with the rows and columns.


I only know at runtime what the delimiter is plus how many columns. This is why I could not work out how I can create a message set to accommodate dynamic number of columns depending on the data.

And Lastly, no I do not have a header row.
Back to top
View user's profile Send private message
matuwe
PostPosted: Tue Oct 25, 2011 3:18 pm Post subject: Reply with quote

Master

Joined: 05 Dec 2007
Posts: 296

Just for more design question..

Is it better to have one java compte node to strip the records from the payload and propagate the record, then have another Java compute to do the actual insert into the database?

Or is it safer to strip all messages and build them into the message tree then propagate to the note that will then insert this data into the DB.

Is there a way of making builk insert??? or is this not recommended? Currently I am building a JDBC component, and I will need to build the ODBC version. So that is why I am trying to split the insert node so I can reuse the parsing on both.

Code looks like this

Code:


String columnseparator = "|" ; //datasourcename.getNextSibling();
String rowsaperator = "\n" ;
String commitrows = "0" ;//rowsaperator.getNextSibling();
                  
String tokens[] = DBJStringManupilator.tokenizer(data,rowsaperator);
                  
         ///TODO: - use the row separator from the database.
         // Build the output area for the row data
         MbElement rowData = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "ROW", "");
         
         int numTokens = tokens.length;
         MbElement recordCount = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "RECORDCOUNT", "");
         recordCount.createElementAsLastChild(MbElement.TYPE_VALUE, "recordNum", numTokens);
         int count = 1;   
                     
            for (int i = 0; i < numTokens; i++) {
               
               // TODO:  Remove the following hardcoded column delimiter - Replace with the
               // correct one from the configuration
               
               // Split into separate columns
               //String columns[] =  StringManupilator.tokenizer(tokens[i],"\\"+columnseparator.getValueAsString());
               String columns[] = tokens[i].split("\\"+columnseparator);
                 int numCols = columns.length;
                 String batch_id;
                 // Copy each field into the output area
                 count=1;
                 for (int j = 0; j < numCols; j++) {
                    
//                    if (j==0){
//                       
//                       batch_id = String.valueOf(Math.round((float)Math.random()*100000));
//                       
//                    }else{
                       
                       if (columns[j] == null || columns[j].equals("")){
                          batch_id= "";
                       }else{
                          batch_id=columns[j];
                       }
//                    }
                    MbElement field = rowData.createElementAsLastChild(MbElement.TYPE_VALUE, "field"+count, batch_id);              

                    count = count + 1;
                    
               }
                 
                 out.propagate(outAssembly);
                 
                 rowData.detach();
                 rowData = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "ROW", "");
               }               
                  

      } finally {
         // clear the outMessage
         outMessage.clearMessage();
      }
Back to top
View user's profile Send private message
MrSmith
PostPosted: Wed Oct 26, 2011 12:43 am Post subject: Reply with quote

Master

Joined: 20 Mar 2008
Posts: 215

Although it would mean a message set with multiple Formats if you know that your delimiter is only ever going to be a handful of those regularly used you could do the parsing in the ESQL and apply the Format part in the next node

i.e Input -------> (Node determines delimiter from 1st row then does inline parse in ESQL defining format at this point but same message set as the elements are still relevant)

It does however mean that you may for the main part have a few redundant wire formats or TDS formats but you could cover most bases with the main ones
_________________
-------- *
“Outside of a dog, a book is man's best friend. Inside of a dog it's too dark to read.”
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Oct 26, 2011 1:41 am Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you know that the delimiter is always going to be unique within the message, i.e. that you don't have issues like having quotes within fields that are delimited by quotes... Then you can create a single message set with a single delimiter value, and then use ESQL to read the first row with the delimiter and REPLACE that value with the value used by the message set and then parse with the message set.

But you appear to have a message structure that requires an unknown number of repeating elements that are not distinguished very well from each other. So it may be a complicated message set to produce.
Back to top
View user's profile Send private message
kimbert
PostPosted: Wed Oct 26, 2011 7:00 am Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

You've had a lot of advice there. Here's a summary of the options:
A. Create a single message set with two or more TDS physical formats. Use one physical format per 'common' delimiter style. The message definition will consist of a message containing a single repeating record containing a single repeating field. Use ESQL or Java to select the correct physical format for each set of incoming records.
B. Do it all in Java. Write some highly-efficient Java code ( yours doesn't look highly efficient, btw ) to do the parsing of each record. You could use specialized methods for the common delimiting styles - but the Java JIT might make that unnecessary.

I'm not an expert on the database stuff - but it sounds as if there may be some gains available there. As always, you should *measure* to find the bottlenecks before you carry out any performance tuning.
Back to top
View user's profile Send private message
mattfarney
PostPosted: Wed Oct 26, 2011 8:12 am Post subject: Reply with quote

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

While I've never done Broker into a DB, the basic insert command looks like vanilla SQL to me.

As another suggestion, instead of truly parsing the input, you could simply convert the input records directly into the insert statement.

If you store off the two delimiters, your insert statement would look like

insert into <sometable> (replace(row[2],delimiter,",") VALUES
(replace(row[x],delimiter,",");

You would need to loop through the input (at least the data section) based on delimiter #2.

Just a suggestion
-mf
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexWebSphere Message Broker (ACE) SupportParse data at runtime,the delimeter only known at runtime
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.