ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Reg: EDIFACT messages

Post new topic  Reply to topic Goto page Previous  1, 2
 Reg: EDIFACT messages « View previous topic :: View next topic » 
Author Message
kimbert
PostPosted: Tue Nov 27, 2012 5:09 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Yes, if this lookup table is reference data then it should be loaded rarely. But the OP is claiming that it is the lookups that are taking the time - not the loading of the data into the table.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Nov 27, 2012 5:18 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

kimbert wrote:
Why would querying a large lookup table stored in a SHARED ROW be faster than querying a large lookup table stored in the Environment tree? Or am I missing the point?




The performance impact seems to come more from making repeated selects, and less from how the data being selected from is stored...

The general way to optimize this kind of scenario is to cache the results of the select in a faster-to-lookup manner. There's no indication here if this can actually be done in this specific case. That is, there's nothing that says that a given select is actually executed more than once, or that there's an easy way to assemble an indicator of the select to be performed to use as a cache lookup key.

Other things to consider are :
  • significantly reducing the size of the lookup table by precalculating portions of it based on known factors
  • reorganizing the structure of the lookup to optimize it based on the search criteria (alter the normalization or alter the lookup key hierarchy)
  • reorganizing the structure of the message processing such that more operations are performed in parallel
  • offload the lookup processing to a real database using a stored procedure
  • as kimbert says, move the lookup table into a java ArrayList and use optimized searching on that
  • spread the workload over more instances of the message flow and more EGs and more brokers
  • independently develop and implement a quantum computer to perform non-deterministic processing of the transformation in O(1)...
  • other less feasible ideas...

But I don't think that any of these ideas can be reasonably examined through this forum without ravilegolas providing a *lot* more information that is probably confidential to his current site.

But I will say that ravilegolas appears to be making all of the right steps in determining that performance is a concern, where it is a concern, and what kinds of steps can be taken to address it.
Back to top
View user's profile Send private message
Esa
PostPosted: Tue Nov 27, 2012 6:14 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

mqjeff wrote:
[*]as kimbert says, move the lookup table into a java ArrayList and use optimized searching on that


If the flow is issuing several lookups with different sets of criteria, I would suggest creating several java hashtables with different sets of keys.
Back to top
View user's profile Send private message
ravilegolas
PostPosted: Tue Nov 27, 2012 11:36 am    Post subject: Reply with quote

Novice

Joined: 04 Sep 2012
Posts: 15

The look up table that my flow is refering to contains around 800 rows. It has 4 columns with 2 columns having codes and 2 having their expansion. For now what I have proposed is to move the look up table in target environment which is a data warehouse. Whenever they fetch values from the table, they can refer the look up tables and populate the expansion of the codes accordingly. Is there any difference in using select query on environment variable as compared to row variable. My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow. BTW thanks everyone for your suggestions, I will try to implement one of them in my code.
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Tue Nov 27, 2012 11:50 am    Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

ravilegolas wrote:
Is there any difference in using select query on environment variable as compared to row variable.


SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.

Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.

http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html

If you have budget, the best solution is solidDb or eXtremeScale.
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
mqjeff
PostPosted: Tue Nov 27, 2012 11:52 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

lancelotlinc wrote:
ravilegolas wrote:
Is there any difference in using select query on environment variable as compared to row variable.


SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.

Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.

http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html

If you have budget, the best solution is solidDb or eXtremeScale.


You don't need budget for eXtremeScale if you have 8.0.0.1....
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Tue Nov 27, 2012 12:20 pm    Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

mqjeff wrote:
lancelotlinc wrote:
ravilegolas wrote:
Is there any difference in using select query on environment variable as compared to row variable.


SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.

Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.

http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html

If you have budget, the best solution is solidDb or eXtremeScale.


You don't need budget for eXtremeScale if you have 8.0.0.1....


Seems true for Development environment only; but as I read the license agreement, a $$ license $$ is needed for production. Am I right? Liberty license is only good for Developer interaction.

Also, WMB 8.0.0.1 is not required if you interface with the eXtremeScale API through JCN?
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
kimbert
PostPosted: Tue Nov 27, 2012 12:46 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow.

I'm not surprised that the flow was very slow. I agree that you should choose your caching technology carefully ( extremeScale, SolidDB, SHARED ROW etc ). However, you should be even more careful about choosing your data structures and search algorithms. If you choose wrongly you will end up with performance that is orders of magnitude below what is achievable - regardless of which type of cache you use.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Nov 27, 2012 12:46 pm    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

All license questions are only properly answered by an IBM Software Sales Representative.

It is my understanding that all use of the Global Cache feature of WMB is covered by the license for WMB itself, and does not require a separate license for eXtremeScale for any configuration or in any type of environment.

Yes, if you use a JavaCompute node, you can access eXtremeScale using it's Java APIs from any version of Broker that supports the JCN - but then you *do* need a license for eXtremeScale, because that JCN has to connect to *something*.
Back to top
View user's profile Send private message
lancelotlinc
PostPosted: Tue Nov 27, 2012 1:11 pm    Post subject: Reply with quote

Jedi Knight

Joined: 22 Mar 2010
Posts: 4941
Location: Bloomington, IL USA

mqjeff wrote:
All license questions are only properly answered by an IBM Software Sales Representative.

It is my understanding that all use of the Global Cache feature of WMB is covered by the license for WMB itself, and does not require a separate license for eXtremeScale for any configuration or in any type of environment.

Yes, if you use a JavaCompute node, you can access eXtremeScale using it's Java APIs from any version of Broker that supports the JCN - but then you *do* need a license for eXtremeScale, because that JCN has to connect to *something*.


IBM is offering a trial license for pilot applications at no charge:

http://www.ibm.com/developerworks/downloads/ws/wsdg/
_________________
http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER
Back to top
View user's profile Send private message Send e-mail
mqsiuser
PostPosted: Tue Nov 27, 2012 10:13 pm    Post subject: Reply with quote

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

lancelotlinc wrote:
SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.

Is that really true? Where are the problems? It is so easy to use. And it is native ESQL (so it must be great ). You just "DECLARE myRow SHARED ROW;" and then happily assign tree-data (like on Input- and OutputRoot <- I hope these 2 guys are (still) fine !?)... What should be the difference between a SHARED ROW and Input/OutputRoot (except for the lifetime)?

It is so nice to distinguish (and point to the importance of) the following:
kimbert wrote:
I agree that you should choose your caching technology carefully ( extremeScale, SolidDB, SHARED ROW etc ).

And on projects you can really (totally) screw brokers performance with not implementing properly:
kimbert wrote:
However, you should be even more careful about choosing your data structures and search algorithms. If you choose wrongly you will end up with performance that is orders of magnitude below what is achievable.

You can use a hash-map/array/list. That's theortically O(1) (but with concerns/considerations on the "hashing-function").
You could implement a balanced tree (for your 800 entries) (based on the (2) keys). Then traversing (going into / getting a single element) in O(logn) (which is ~10 in your case).

Or you just do (like I see on many projects) something dead easy:
1. Use a shared row
Code:
DECLARE cache SHARED ROW;
SET cache = PASSTHRU("SELECT key1, key2, value1, value2 FROM myTable");

2. Highly consider using curly braces (and within that the key(s)):
Code:
SET Environment.lookupData = PASSTHRU("SELECT key1, key2, value1, value2 FROM myTable");
DECLARE cache SHARED ROW;
// WITHIN a loop (over Environment.lookupData rearange the data:
  SET cache.{rLookupDataCurRow.key1}_{rLookupDataCurRow.key2} = rLookupDataCurRow;

Because then you can lookup the values (with the keys) in a convenient way as follows:
Code:
SET Environment.value1_value2 = cache.{key1}_{key2};


That is O(n) (but broker is really good at it)

Now there is a break-even (for the size of n):
If n is bigger than the break even then (usually) O(1) or O(logn) is better.
If n is smaller than the break even then (usually) O(n) is better.

Lets say the break even is 1000 ...
Lets say complexity (of your dataStructure and searchAlgorithm) also counts
_________________
Just use REFERENCEs
Back to top
View user's profile Send private message
Esa
PostPosted: Wed Nov 28, 2012 1:10 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

ravilegolas wrote:
The look up table that my flow is refering to contains around 800 rows.
...
My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow


Have you considered rewriting your code instead of caching the lookup table? Instead of hitting the lookup table once or twice for each iteration of a while loop you could perhaps replace the loop with a combined query. Something like this:

Code:
SET outputRef.Objects[] = SELECT A.ID, A.NAME, B.VALUE1, C.VALUE2 FROM InputRef AS A, LookupRef AS B, LookupRef AS C WHERE A.CODE1 = B.KEY1 AND A.CODE2 = C.KEY2


I think this is the use case ESQL was originally developed for...

My example code certainly contains syntax errors and you will have to refine it to make the target element names correct, for example. I wrote it from scratch just to try to give you the idea.

I guess if you do it like this it won't make an awful lot of difference is you query the lookup table from a database or from an Environment or ROW tree, not even if the ROW is cached or not. I mean compared to the case where you hit the lookup table 20 k times.

If the problem is that the SELECT query puts the elements in wrong order, you can always make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards.
Back to top
View user's profile Send private message
mqsiuser
PostPosted: Wed Nov 28, 2012 3:31 am    Post subject: Reply with quote

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Esa wrote:
make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards.

Do you have performance measures for that ? I am interested.
_________________
Just use REFERENCEs
Back to top
View user's profile Send private message
Esa
PostPosted: Wed Nov 28, 2012 4:52 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

mqsiuser wrote:
Esa wrote:
make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards.

Do you have performance measures for that ? I am interested.


Unfortunately not. I have never needed to implement that.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Reg: EDIFACT messages
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.