Author |
Message
|
kimbert |
Posted: Tue Nov 27, 2012 5:09 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Yes, if this lookup table is reference data then it should be loaded rarely. But the OP is claiming that it is the lookups that are taking the time - not the loading of the data into the table. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Nov 27, 2012 5:18 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
kimbert wrote: |
Why would querying a large lookup table stored in a SHARED ROW be faster than querying a large lookup table stored in the Environment tree? Or am I missing the point? |
The performance impact seems to come more from making repeated selects, and less from how the data being selected from is stored...
The general way to optimize this kind of scenario is to cache the results of the select in a faster-to-lookup manner. There's no indication here if this can actually be done in this specific case. That is, there's nothing that says that a given select is actually executed more than once, or that there's an easy way to assemble an indicator of the select to be performed to use as a cache lookup key.
Other things to consider are : - significantly reducing the size of the lookup table by precalculating portions of it based on known factors
- reorganizing the structure of the lookup to optimize it based on the search criteria (alter the normalization or alter the lookup key hierarchy)
- reorganizing the structure of the message processing such that more operations are performed in parallel
- offload the lookup processing to a real database using a stored procedure
- as kimbert says, move the lookup table into a java ArrayList and use optimized searching on that
- spread the workload over more instances of the message flow and more EGs and more brokers
- independently develop and implement a quantum computer to perform non-deterministic processing of the transformation in O(1)...
- other less feasible ideas...
But I don't think that any of these ideas can be reasonably examined through this forum without ravilegolas providing a *lot* more information that is probably confidential to his current site.
But I will say that ravilegolas appears to be making all of the right steps in determining that performance is a concern, where it is a concern, and what kinds of steps can be taken to address it. |
|
Back to top |
|
 |
Esa |
Posted: Tue Nov 27, 2012 6:14 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
mqjeff wrote: |
[*]as kimbert says, move the lookup table into a java ArrayList and use optimized searching on that |
If the flow is issuing several lookups with different sets of criteria, I would suggest creating several java hashtables with different sets of keys. |
|
Back to top |
|
 |
ravilegolas |
Posted: Tue Nov 27, 2012 11:36 am Post subject: |
|
|
Novice
Joined: 04 Sep 2012 Posts: 15
|
The look up table that my flow is refering to contains around 800 rows. It has 4 columns with 2 columns having codes and 2 having their expansion. For now what I have proposed is to move the look up table in target environment which is a data warehouse. Whenever they fetch values from the table, they can refer the look up tables and populate the expansion of the codes accordingly. Is there any difference in using select query on environment variable as compared to row variable. My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow. BTW thanks everyone for your suggestions, I will try to implement one of them in my code. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Tue Nov 27, 2012 11:50 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
ravilegolas wrote: |
Is there any difference in using select query on environment variable as compared to row variable. |
SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.
Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html
If you have budget, the best solution is solidDb or eXtremeScale. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Nov 27, 2012 11:52 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
lancelotlinc wrote: |
ravilegolas wrote: |
Is there any difference in using select query on environment variable as compared to row variable. |
SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.
Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html
If you have budget, the best solution is solidDb or eXtremeScale. |
You don't need budget for eXtremeScale if you have 8.0.0.1.... |
|
Back to top |
|
 |
lancelotlinc |
Posted: Tue Nov 27, 2012 12:20 pm Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
mqjeff wrote: |
lancelotlinc wrote: |
ravilegolas wrote: |
Is there any difference in using select query on environment variable as compared to row variable. |
SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance.
Using a SINGLETON is the cheapest, fastest alternative as kimbert suggested, using Java ArrayList or HashTable.
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Hashtable.html
If you have budget, the best solution is solidDb or eXtremeScale. |
You don't need budget for eXtremeScale if you have 8.0.0.1.... |
Seems true for Development environment only; but as I read the license agreement, a $$ license $$ is needed for production. Am I right? Liberty license is only good for Developer interaction.
Also, WMB 8.0.0.1 is not required if you interface with the eXtremeScale API through JCN? _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
kimbert |
Posted: Tue Nov 27, 2012 12:46 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow. |
I'm not surprised that the flow was very slow. I agree that you should choose your caching technology carefully ( extremeScale, SolidDB, SHARED ROW etc ). However, you should be even more careful about choosing your data structures and search algorithms. If you choose wrongly you will end up with performance that is orders of magnitude below what is achievable - regardless of which type of cache you use. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Nov 27, 2012 12:46 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
All license questions are only properly answered by an IBM Software Sales Representative.
It is my understanding that all use of the Global Cache feature of WMB is covered by the license for WMB itself, and does not require a separate license for eXtremeScale for any configuration or in any type of environment.
Yes, if you use a JavaCompute node, you can access eXtremeScale using it's Java APIs from any version of Broker that supports the JCN - but then you *do* need a license for eXtremeScale, because that JCN has to connect to *something*. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Tue Nov 27, 2012 1:11 pm Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
mqjeff wrote: |
All license questions are only properly answered by an IBM Software Sales Representative.
It is my understanding that all use of the Global Cache feature of WMB is covered by the license for WMB itself, and does not require a separate license for eXtremeScale for any configuration or in any type of environment.
Yes, if you use a JavaCompute node, you can access eXtremeScale using it's Java APIs from any version of Broker that supports the JCN - but then you *do* need a license for eXtremeScale, because that JCN has to connect to *something*. |
IBM is offering a trial license for pilot applications at no charge:
http://www.ibm.com/developerworks/downloads/ws/wsdg/ _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
mqsiuser |
Posted: Tue Nov 27, 2012 10:13 pm Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
lancelotlinc wrote: |
SHARED ROW is the least preferred method of sharing information between message flows. It is the most problematic, both in terms of implementation as well as performance. |
Is that really true? Where are the problems? It is so easy to use. And it is native ESQL (so it must be great ). You just "DECLARE myRow SHARED ROW;" and then happily assign tree-data (like on Input- and OutputRoot <- I hope these 2 guys are (still) fine !?)... What should be the difference between a SHARED ROW and Input/OutputRoot (except for the lifetime)?
It is so nice to distinguish (and point to the importance of) the following:
kimbert wrote: |
I agree that you should choose your caching technology carefully ( extremeScale, SolidDB, SHARED ROW etc ). |
And on projects you can really (totally) screw brokers performance with not implementing properly:
kimbert wrote: |
However, you should be even more careful about choosing your data structures and search algorithms. If you choose wrongly you will end up with performance that is orders of magnitude below what is achievable. |
You can use a hash-map/array/list. That's theortically O(1) (but with concerns/considerations on the "hashing-function").
You could implement a balanced tree (for your 800 entries) (based on the (2) keys). Then traversing (going into / getting a single element) in O(logn) (which is ~10 in your case).
Or you just do (like I see on many projects) something dead easy:
1. Use a shared row
Code: |
DECLARE cache SHARED ROW;
SET cache = PASSTHRU("SELECT key1, key2, value1, value2 FROM myTable"); |
2. Highly consider using curly braces (and within that the key(s)):
Code: |
SET Environment.lookupData = PASSTHRU("SELECT key1, key2, value1, value2 FROM myTable");
DECLARE cache SHARED ROW;
// WITHIN a loop (over Environment.lookupData rearange the data:
SET cache.{rLookupDataCurRow.key1}_{rLookupDataCurRow.key2} = rLookupDataCurRow; |
Because then you can lookup the values (with the keys) in a convenient way as follows:
Code: |
SET Environment.value1_value2 = cache.{key1}_{key2}; |
That is O(n) (but broker is really good at it)
Now there is a break-even (for the size of n):
If n is bigger than the break even then (usually) O(1) or O(logn) is better.
If n is smaller than the break even then (usually) O(n) is better.
Lets say the break even is 1000 ...
Lets say complexity (of your dataStructure and searchAlgorithm) also counts  _________________ Just use REFERENCEs |
|
Back to top |
|
 |
Esa |
Posted: Wed Nov 28, 2012 1:10 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
ravilegolas wrote: |
The look up table that my flow is refering to contains around 800 rows.
...
My flow was selecting values from the look up table more than 20000 times for each message which made the flow very slow |
Have you considered rewriting your code instead of caching the lookup table? Instead of hitting the lookup table once or twice for each iteration of a while loop you could perhaps replace the loop with a combined query. Something like this:
Code: |
SET outputRef.Objects[] = SELECT A.ID, A.NAME, B.VALUE1, C.VALUE2 FROM InputRef AS A, LookupRef AS B, LookupRef AS C WHERE A.CODE1 = B.KEY1 AND A.CODE2 = C.KEY2 |
I think this is the use case ESQL was originally developed for...
My example code certainly contains syntax errors and you will have to refine it to make the target element names correct, for example. I wrote it from scratch just to try to give you the idea.
I guess if you do it like this it won't make an awful lot of difference is you query the lookup table from a database or from an Environment or ROW tree, not even if the ROW is cached or not. I mean compared to the case where you hit the lookup table 20 k times.
If the problem is that the SELECT query puts the elements in wrong order, you can always make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards. |
|
Back to top |
|
 |
mqsiuser |
Posted: Wed Nov 28, 2012 3:31 am Post subject: |
|
|
 Yatiri
Joined: 15 Apr 2008 Posts: 637 Location: Germany
|
Esa wrote: |
make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards. |
Do you have performance measures for that ? I am interested. _________________ Just use REFERENCEs |
|
Back to top |
|
 |
Esa |
Posted: Wed Nov 28, 2012 4:52 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
mqsiuser wrote: |
Esa wrote: |
make an ESQL wrapper for some java code that sorts the elements with a Collection afterwards. |
Do you have performance measures for that ? I am interested. |
Unfortunately not. I have never needed to implement that. |
|
Back to top |
|
 |
|