Author |
Message
|
Glass |
Posted: Tue Jan 19, 2010 8:40 am Post subject: Eliminate Duplicate Input Message |
|
|
Acolyte
Joined: 02 Mar 2006 Posts: 56
|
Hi,
I am trying to figure out a way to eliminate processing duplicate messages, whether it comes in via a queue or a file input node. We usually get messasges from various sources with various messages structures and I was wondering if there is a generic way to catch and stop processing any duplicate messages (not necessarily duplicate file name since the source may have changed the content but kept the file name same and resent).
We are using Toolkit v6.1.0.5.
Cheers! |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jan 19, 2010 9:05 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Glass wrote: |
the source may have changed the content but kept the file name same and resent). |
If it's content that determines duplicity (as opposed to message id or file name) presumably the "original" data is stored someplace you could check.
IMHO if the source has changed the content but kept the file name (i.e. it's not duplicate content even though it's a duplicate file name) most of the automated techniques are doomed to fail...  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Glass |
Posted: Tue Jan 19, 2010 10:15 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2006 Posts: 56
|
Thanks for the response.
Well, I was trying to avoid inserting the original data (all the data) somewhere just for the check. A lot of times the data is passed and we just 'transfer' it to some other system.
Is there something that can be done using javanode or hash keys? I am not too familiar with this.
Cheers! |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jan 19, 2010 10:46 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Glass wrote: |
Is there something that can be done using javanode or hash keys? I am not too familiar with this. |
Well that's 2 of us - Java is not my thing at all.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mlafleur |
Posted: Tue Jan 19, 2010 2:57 pm Post subject: |
|
|
Acolyte
Joined: 19 Feb 2004 Posts: 73
|
Quote: |
Is there something that can be done using javanode or hash keys? I am not too familiar with this. |
Yes, you could use a hash algorithm in Java (eg. SHA or MD5) to do this. |
|
Back to top |
|
 |
Glass |
Posted: Wed Jan 20, 2010 6:50 am Post subject: |
|
|
Acolyte
Joined: 02 Mar 2006 Posts: 56
|
Thanks mlafleur.
I was doing initial lookup on SHA1 and MD5 and was wondering if any of these methods are good for getting the signature for large files like 10MB or even higher. Can this process handle large data or does it become slow and inefficient after a certain file/message size? The examples I have seen till now tend to be using this to get the signature for a small string as opposed to a large file.
Just curious, has anybody out there used this (SHA1 or MD5) in broker for validating duplicate messages or for any other purpose?
Cheers! |
|
Back to top |
|
 |
francoisvdm |
Posted: Wed Sep 07, 2011 8:46 pm Post subject: |
|
|
Partisan
Joined: 09 Aug 2001 Posts: 332
|
I know this is an old thread.... but anybody with new comments? I'm having the same situation now and would like to get some advice on what works well in MB V7 on AIX with Oracle DB available. _________________ If you do not know the answer or you get the urge to answer with "RTFM" or "Search better in this forum", please refrain from doing so, just move on to the next question. Much appreciated.
Francois van der Merwe |
|
Back to top |
|
 |
smdavies99 |
Posted: Wed Sep 07, 2011 10:37 pm Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
One problem I see with using a hashtag is that you might get different results BUT the data inside the message is the same, if you calc the hashtag from the raw input message.
Eh? I hear you saying.
If you take an XML formatted message the values between the tags are what is important. The whitespace between tags is irrelevant.
Consider this example
Code: |
test$ printf "<a><b>Data></b></a>" > m1.xml
test$ printf "<a><b>Data></b> </a>" > m2.xml
test$ cksum m1.xml
1508444018 19 m1.xml
test$ cksum m2.xml
3010512380 20 m2.xml
test:~ sdavi$
|
Case proved I think.
IF you are going to use a hashtag then you have to serialise the parsed message before creating the hashtag.
I'd be more inclined to look for some unique Identifier in the message (perhaps a combination of fields) and store the data in a table using the unique ID fields of the message as a primary key.
Then if the insert fails with a PK violation you know that you have a dupe
I'd also make the table do an contact admin of old data via a trigger.
This is my off the cuff reply. Without knowing the exact details inc data formats and volumes etc I can't give you a more certain answer. Besides, that would cost you $$$ £££ €€€ etc.  _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
kash3338 |
Posted: Wed Sep 07, 2011 11:14 pm Post subject: |
|
|
Shaman
Joined: 08 Feb 2009 Posts: 709 Location: Chennai, India
|
Another solution can be to upload the data in DB and set the column to Primary Key. Duplicates will be caught. |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Sep 08, 2011 1:20 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
kash3338 wrote: |
Another solution can be to upload the data in DB and set the column to Primary Key. Duplicates will be caught. |
Errr????
Isn't that what I said in my previous post?
To Quote:
Quote: |
I'd be more inclined to look for some unique Identifier in the message (perhaps a combination of fields) and store the data in a table using the unique ID fields of the message as a primary key.
Then if the insert fails with a PK violation you know that you have a dupe
I'd also make the table do an contact admin of old data via a trigger.
|
_________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
|