MQSeries.net :: View topic - Parse data at runtime,the delimeter only known at runtime

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parse data at runtime,the delimeter only known at runtime

Parse data at runtime,the delimeter only known at runtime

« View previous topic :: View next topic »

Author

Message

matuwe

Posted: Tue Oct 25, 2011 1:44 pm Post subject: Parse data at runtime,the delimeter only known at runtime

Master

Joined: 05 Dec 2007
Posts: 296

Hi

Please advice, I am not sure how to handle this flow. I am retriving data from a DB, which is a BLOB. I need to Convert to CHAR and the data then looks like

Quote:

Name|Surname|age
help|Test|23

When retrieving the data part of the data will give me the record delimiter and the row delimiter. So I need to take all this information and do an insert into another database.

The delimiter can be anything, a comma, pipe

I wrote a JAVA compute and used Stringtoken to find rows and columns, but it just takes for ever to do 50 000 records. Is there anyway that I can do this parsing?

lancelotlinc

Posted: Tue Oct 25, 2011 1:54 pm Post subject:

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

If you are on V7, the DatabaseInput node may work for you. By handling the db rows as they come in rather than in batch mode, you may be able to tolerate the longer latency for parsing and conversion.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER

matuwe

Posted: Tue Oct 25, 2011 2:06 pm Post subject:

Master

Joined: 05 Dec 2007
Posts: 296

Owww thanks a million for the quick response.

I am on V7. Can the DBInput accept delimeters at runtime

. My other issue is the database that I writing to only accepts JDBC and want me to call their Stored Proc and insert record at the time.

I am not sure I understand correctly, can the the db insert accept SQL statement at runtime too?

Thanking you in advance

lancelotlinc

Posted: Tue Oct 25, 2011 2:18 pm Post subject:

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

mqjeff, kimbert or Vitor may be better resources for this situation. You can do all those things you mentioned, but the parsing part your still needing to be done in a compute node, either ESQL or Java. Probably Java would be more flexible, but the three people aforementioned could offer you some more accurate advice to parse out the blob the way you want in the fastest way possible.

DatabaseInput node starts processing when it senses a new record in the table it is monitoring.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER

matuwe

Posted: Tue Oct 25, 2011 2:24 pm Post subject:

Master

Joined: 05 Dec 2007
Posts: 296

Owww cool thanks a million. I shall wait for more feedback hopefully some one will give me a more robust solution to handle this..

kimbert

Posted: Tue Oct 25, 2011 2:40 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Seems to be "parser week" on this forum...

Quote:

When retrieving the data part of the data will give me the record delimiter and the row delimiter

The MRM parser *can* cope with varying delimiters, but it is not easy, and there are limits to what it can handle.
However, if you answer the following questions about the data format I can give some specific advice:
Q1: Does your data format have a well-known name. Like 'HL7' or 'EDIFACT' or 'ACORD'?
Q2: How *exactly* are the delimiters specified in the data? Is there some kind of header records with a special field that describes the delimiters?
Q3: You said 'The delimiter can be anything, a comma, pipe'. I guess that in the real world, one or two of those are used in 99% of messages. If you know ( or can find out ) the most common cases then you can optimize your flow to check for the common cases. But that requires a proper understanding of your data format, so I can't tell you how to do that until you answer Q1 and/or Q2.

kimbert

Posted: Tue Oct 25, 2011 2:44 pm Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Oh...and before somebody else jumps in and says it, the other answer is:

This would be easy if you could use DFDL. Unfortunately WMB v8 isn't available just yet...

matuwe

Posted: Tue Oct 25, 2011 2:55 pm Post subject:

Master

Joined: 05 Dec 2007
Posts: 296

Oww thanks for the response..

Currently all my 50 interfaces uses | pipe delimiter and the rows uses \n colon deliminator. All the data will be handles a CHAR datatype.

The first row on my payload tells me what delimiter to use, the second row shows me the insert statement for the destination and the last row is what contains the BLOB with the rows and columns.

I only know at runtime what the delimiter is plus how many columns. This is why I could not work out how I can create a message set to accommodate dynamic number of columns depending on the data.

And Lastly, no I do not have a header row.

matuwe

Posted: Tue Oct 25, 2011 3:18 pm Post subject:

Master

Joined: 05 Dec 2007
Posts: 296

Just for more design question..

Is it better to have one java compte node to strip the records from the payload and propagate the record, then have another Java compute to do the actual insert into the database?

Or is it safer to strip all messages and build them into the message tree then propagate to the note that will then insert this data into the DB.

Is there a way of making builk insert??? or is this not recommended? Currently I am building a JDBC component, and I will need to build the ODBC version. So that is why I am trying to split the insert node so I can reuse the parsing on both.

Code looks like this

Code:

String columnseparator = "|" ; //datasourcename.getNextSibling();
String rowsaperator = "\n" ;
String commitrows = "0" ;//rowsaperator.getNextSibling();

String tokens[] = DBJStringManupilator.tokenizer(data,rowsaperator);

      ///TODO: - use the row separator from the database.
      // Build the output area for the row data
      MbElement rowData = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "ROW", "");

      int numTokens = tokens.length;
      MbElement recordCount = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "RECORDCOUNT", "");
      recordCount.createElementAsLastChild(MbElement.TYPE_VALUE, "recordNum", numTokens);
      int count = 1;

         for (int i = 0; i < numTokens; i++) {

            // TODO: Remove the following hardcoded column delimiter - Replace with the
            // correct one from the configuration

            // Split into separate columns
            //String columns[] = StringManupilator.tokenizer(tokens[i],"\\"+columnseparator.getValueAsString());
            String columns[] = tokens[i].split("\\"+columnseparator);
      int numCols = columns.length;
      String batch_id;
      // Copy each field into the output area
      count=1;
      for (int j = 0; j < numCols; j++) {

//          if (j==0){
//
//             batch_id = String.valueOf(Math.round((float)Math.random()*100000));
//
//          }else{

            if (columns[j] == null || columns[j].equals("")){
               batch_id= "";
            }else{
               batch_id=columns[j];
            }
//          }
         MbElement field = rowData.createElementAsLastChild(MbElement.TYPE_VALUE, "field"+count, batch_id);

         count = count + 1;

            }

      out.propagate(outAssembly);

      rowData.detach();
      rowData = outAssembly.getLocalEnvironment().getRootElement().createElementAsLastChild(MbElement.TYPE_NAME, "ROW", "");
         }


   } finally {
      // clear the outMessage
      outMessage.clearMessage();
   }

MrSmith

Posted: Wed Oct 26, 2011 12:43 am Post subject:

Master

Joined: 20 Mar 2008
Posts: 215

Although it would mean a message set with multiple Formats if you know that your delimiter is only ever going to be a handful of those regularly used you could do the parsing in the ESQL and apply the Format part in the next node

i.e Input -------> (Node determines delimiter from 1st row then does inline parse in ESQL defining format at this point but same message set as the elements are still relevant)

It does however mean that you may for the main part have a few redundant wire formats or TDS formats but you could cover most bases with the main ones
_________________
-------- *
â€œOutside of a dog, a book is man's best friend. Inside of a dog it's too dark to read.â€

mqjeff

Posted: Wed Oct 26, 2011 1:41 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you know that the delimiter is always going to be unique within the message, i.e. that you don't have issues like having quotes within fields that are delimited by quotes... Then you can create a single message set with a single delimiter value, and then use ESQL to read the first row with the delimiter and REPLACE that value with the value used by the message set and then parse with the message set.

But you appear to have a message structure that requires an unknown number of repeating elements that are not distinguished very well from each other. So it may be a complicated message set to produce.

kimbert

Posted: Wed Oct 26, 2011 7:00 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

You've had a lot of advice there. Here's a summary of the options:
A. Create a single message set with two or more TDS physical formats. Use one physical format per 'common' delimiter style. The message definition will consist of a message containing a single repeating record containing a single repeating field. Use ESQL or Java to select the correct physical format for each set of incoming records.
B. Do it all in Java. Write some highly-efficient Java code ( yours doesn't look highly efficient, btw ) to do the parsing of each record. You could use specialized methods for the common delimiting styles - but the Java JIT might make that unnecessary.

I'm not an expert on the database stuff - but it sounds as if there may be some gains available there. As always, you should *measure* to find the bottlenecks before you carry out any performance tuning.

mattfarney

Posted: Wed Oct 26, 2011 8:12 am Post subject:

Disciple

Joined: 17 Jan 2006
Posts: 167
Location: Ohio

While I've never done Broker into a DB, the basic insert command looks like vanilla SQL to me.

As another suggestion, instead of truly parsing the input, you could simply convert the input records directly into the insert statement.

If you store off the two delimiters, your insert statement would look like

insert into <sometable> (replace(row[2],delimiter,",") VALUES
(replace(row[x],delimiter,",");

You would need to loop through the input (at least the data section) based on delimiter #2.

Just a suggestion
-mf

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Parse data at runtime,the delimeter only known at runtime

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP