Author |
Message
|
nk |
Posted: Thu Aug 30, 2012 5:57 am Post subject: processing csv file |
|
|
Novice
Joined: 05 Jul 2012 Posts: 19
|
I'm processing csv and converting into txt format. for eliminating duplicate entries based on certain field comparing i need to create a record after mapping all the fields. And from the record i need to do checking for duplicate entries .
I've done for the mapping part but dont hv any idea how to proceed further. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Aug 30, 2012 6:00 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
|
Back to top |
|
 |
mqjeff |
Posted: Thu Aug 30, 2012 6:07 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Clearly you need to write code that deletes duplicates.
This means you need to know how to identify a given record, and then determine if that record is a duplicate of another record or not.
If your records are sorted, you can tell that the current record is a duplicate because it is the same as the previous record. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Aug 30, 2012 6:10 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
@nk: The normal way to process CSV in message broker depends on the version that you are using:
- v6/v7 : Use the MRM parser
- v8 : Use the DFDL parser. There is a CSV wizard that will generate the correct DFDL schema for you.
Either way, once you have a message tree you can do whatever you like with it.
I did not understand your description of the de-duplication logic. If you want help with that part then you will need to explain the requirements in more detail.
@lancelotinc: Why is a Java Compute node the correct answer? |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Aug 30, 2012 6:17 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
kimbert wrote: |
@lancelotinc: Why is a Java Compute node the correct answer? |
Its not the only answer.
In response to:
nk wrote: |
[I] dont [have] any idea how to proceed further. |
Its a suggested way to move forward. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Aug 30, 2012 6:19 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
lancelotlinc wrote: |
kimbert wrote: |
@lancelotinc: Why is a Java Compute node the correct answer? |
Its not the only answer.
In response to:
nk wrote: |
[I] dont [have] any idea how to proceed further. |
Its a suggested way to move forward. |
I agree, there's no *correct* answer to "How do I move forward".
There's a process to follow, not an answer to be provided. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Aug 30, 2012 6:24 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
mqjeff wrote: |
There's a process to follow, not an answer to be provided. |
Which is really at the heart of the OP's dilemma. Postulate a possible software routine that would de-duplicate, code that possible solution, test the possible solution, modify the code based on the test results. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
nathanw |
Posted: Thu Aug 30, 2012 6:28 am Post subject: |
|
|
 Knight
Joined: 14 Jul 2004 Posts: 550
|
million monkey + million typewriters = works of shakespeare
the way forward is mainly worked out by the developers area of expertise _________________ Who is General Failure and why is he reading my hard drive?
Artificial Intelligence stands no chance against Natural Stupidity.
Only the User Trace Speaks The Truth  |
|
Back to top |
|
 |
nk |
Posted: Thu Aug 30, 2012 11:01 pm Post subject: |
|
|
Novice
Joined: 05 Jul 2012 Posts: 19
|
@lancelotlinc : I'll hv to use only compute node.
@kimbert: I'm using V7 :MRM parser
for ex i got txt record after mapping is
10 TSI SRQ 10 A 2012-09-122012-01-02 2 SHZ SPQ
20 API SPQ 12 B 2012-12-122012-01-20 3 SHZ SLQ
30 TST LTQ 21 L 2012-02-122012-01-31 4 SHZ SHZ
40 TLI RNQ 55 D 2012-08-122012-08-05 5 SHZ TPM
In the above 3rd row is having last two columns(SHZ SHZ) equal so i need to skip tht row and the subsequent row 4 will be having previous columns value 4 instead of 5
@mqjeff : How to sort the MRM record? |
|
Back to top |
|
 |
mqsiuser |
Posted: Fri Aug 31, 2012 12:05 am Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
nk wrote: |
@mqjeff : How to sort the MRM record? |
You parse the records (as Kimbert explained)
On the resulting logical (type) tree (it's relativly independent of Parsers (e.g. MRM or DFDL) from here on):
I am providing quicksort, you can use that to pre-process... and then (easily) remove duplicates. But that will result in a sorted output (that probably/likely doesn't matter, but also isn't what you really want).
There are at least 3 ways (that I see) to remove duplicates.
Probably you try the move ref where function(s) first.
... ofc. ... probably you might need a different/custom solution  _________________ Just use REFERENCEs |
|
Back to top |
|
 |
|