MQSeries.net :: View topic - Generic ESQL Utilities

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Links » Generic ESQL Utilities

Generic ESQL Utilities

« View previous topic :: View next topic »

Author

Message

mqsiuser

Posted: Wed Mar 24, 2010 12:22 pm Post subject: Generic ESQL Utilities

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Hello,

when you use Message Broker you might find out that ESQL-Code probably is the best utility to use when processing messages. It is specialized and well fitted for the task, it is quick in transforming messages due to it's c-code-base and changes are easier trackable in source control than with graphical- or XML-based nodes. And ...most importantly... it allows you to code around and debug the tricky parts which other transformation nodes wouldn't let you do that easily.

If you agree with that then there only are some integration specific functions missing that you'd just like to have in ESQL itself. I have set up an open source project Generic-ESQL-Utilities to help define these functions. Feel free to use any of gits possibilities, e.g. just download (some of the) code or create a branch, send me push requests and the like.

cheers,
mqsiuser

Last edited by mqsiuser on Sun Jan 08, 2012 3:38 am; edited 8 times in total

mqsiuser

Posted: Mon Jun 07, 2010 8:38 am Post subject: moveRef(rIn, 'InputRoot.MRM.order.header.orderDate')

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

I have moved the code to github:

https://github.com/mqsiuser/Generic-ESQL-Utilities/wiki

Please have a look there if you are interested.

Last edited by mqsiuser on Wed Dec 14, 2011 8:14 am; edited 9 times in total

mqjeff

Posted: Mon Jun 07, 2010 10:37 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

It's not clear why DETACH and ATTACH don't meet the requirements you're looking for with this complicated MOVE REF code.

It's not clear why the EmailOutput node that comes with v6 and v7 does not support your requirements.

I agree that SORT would be useful.

Given that aggregation requires multiple input messages from multiple input nodes, it's not clear how you could achieve that within a single node or why the current aggregation is insufficient or how you can't achieve your requirement using Environment tree.

It's not clear what a lot of your suggested functions would do? What is xml2env? and why? And several of your functions are straightforward to do with the existing functions in ESQL.

mqsiuser

Posted: Tue Jun 08, 2010 12:09 am Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Hello mqjeff,

mqjeff wrote:

It's not clear why DETACH and ATTACH don't meet the requirements you're looking for with this complicated MOVE REF code.

The moveRef(...) function will create "output-Elements" and not just end up on the (input/output)root when elements are missing. It is a replacement for "CREATE (LAST/FIRST)CHILD and MOVE REF and not for DETACHing parts of the tree (usually from the input-root) and ATTACHing parts of the tree (usually to the output-root).

Quote:

It's not clear why the EmailOutput node that comes with v6 and v7 does not support your requirements.

There should be easier ways to attach a message. There should be the possibility to have "eMail-Groups" to be able to retrieve To/Cc/Bcc through a db-lookup to make recipients easily configurable/changeable

Quote:

I agree that SORT would be useful.

I am providing quick sort in the forum, but the generic sort should at least be extended with moveRef-Capabilities.

Quote:

Given that aggregation requires multiple input messages from multiple input nodes, it's not clear how you could achieve that within a single node or why the current aggregation is insufficient or how you can't achieve your requirement using Environment tree.

aggregate(...) addresses aggregation within a single message. Another way to do this is as you state "using the Environment tree". I think that this is not straight forward, flexible and transparent enough. That is why I suggest to use aggregate(...).

Quote:

It's not clear what a lot of your suggested functions would do? What is xml2env? and why? And several of your functions are straightforward to do with the existing functions in ESQL.

xml2env is for loading simple xml-strings (elements only) from a db and put it into the environment for further processing.

I have clarified my previous posts and added code. I hope that someone might find it useful and I am happy about discussions.

Last edited by mqsiuser on Wed Dec 14, 2011 8:09 am; edited 3 times in total

mqsiuser

Posted: Thu Jul 01, 2010 12:48 pm Post subject: Examples for use of aggregate(...)

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

I have added the transformation-code for the 3 examples to the git repository to show how to use the aggregate(...)-Method.

Compared to "aggregation in the environment" this method is

(1) More direct: Directly mapping from the input- to the output-root
(2) More declarative: Exactly states the aggregation-criteria "bla.blub.blub"
(3) More transparent: It is easy to understand the aggregation-criteria at the first glance.

All in all the method is more flexible: You can have several different aggregations, e.g. one on the oders-level, one on the lines-level and one on the subpositions-level (three) or even several different (distinguished with "if") on each level (more than 3) in one flow, without loosing the overview and the ability to relatively quickly change the aggregation criteria at a later point in time. This transparency is what is lacking with aggregation in the environment.

Last edited by mqsiuser on Wed Dec 14, 2011 8:07 am; edited 3 times in total

mqjeff

Posted: Thu Jul 01, 2010 1:12 pm Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I still don't understand what you are "aggregating" or why.

I also see a number of things in there that look very specific to one message tree, and thus not "generic" at all.

Are you attempting to collect records of a certain type in the same place in the same logical tree - i.e. put all "orders" in sequence within the message tree, so that order 2 is the next sibling of order 1 - in the manner that a "Group by" clause on an ESQL Select would do this?

It would be significantly faster and better to assemble an intermediate result, that created a REFERENCE equivalent to a C language "array of pointers", that you would then reorder. This would allow you to walk the input tree once, walk your array of pointers enough to sort them, and then use those references to detach/attach from Input to Output.

But again, it's still really not clear what you're trying to do from this code, or how it is "generic" rather than at least problem-domain specific. What if I don't have messages that contain "orders", but instead contain patient records or transaction records or or or...

mqsiuser

Posted: Thu Jul 01, 2010 9:29 pm Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

mqjeff wrote:

I still don't understand what you are "aggregating" or why.

I also see a number of things in there that look very specific to one message tree, and thus not "generic" at all.

o.k., the examples deal with orders in SAP (my last post), but in the posts before things should be generic (point me to something that's not). SAP is just one of the places where you have more complex message-structures and SAP is of a broader interest to IT-people. Also: It is known that other products (Mercator) have better transformation capabilities than Message Broker. Even though this technology now gets integrated in the WebSphere products, I think that there can just be some ESQL-functions to enhance the transformation capabilities of MB.

I tried to illustrate (the generic reference functions) with a good and viable example. Typically SAP can scare integration people because of the complex IDOC-structures. Or in general because of the complex structures orders (and also invoices) can have. The aggregate-method helps in bridging structural differences between the source and the target-system.

One of the problems with the structure of messages is that messages become "dis-aggregated". SAP Messages also have multiple hierarchical structures, which makes "re-aggregating" difficult. The "aggregate()"-Method helps by decoupling the input- from the output-structure. Opposed to the environment it does that in a transparent way.

The method is interesting for multi-hierarchical structures which became disaggregated (I am talking of a single message). More flat message structures are easy to process anyway, so there is no problem. I don't know if patients or "transaction records" can have these problems, but orders (and invoices) have and a lot of people are processing orders/invoices on their integration platform (they create turnover, profit, etc. for your company). Since I am familiar with orders and to just give another example (besides the 3 examples) here are two more examples each with the reason for the disaggregation:

Someone orders something from a company worth 1 Million Euro/Dollar. This includes different goods and creates a "sales-order" in SAP:

Material1: 10.000 pieces
Material2: 100.000 pieces
Material3: 1.000.000 pieces

When the company tries to deliver the goods the SAP-System (of the company) will create "delivery-orders" which are a couple of deliveries to deliver the goods. Each delivery order might be a truck full with goods. Also the truck might contain previous orders or orders from other customer. The message becomes split (probably into a couple of messages, o.k. then you probably might need to put them into a single message first), and when you have re-joined the messages (usually SAP does that for you, so no problem there) then you will still see the boundaries between the truck-loads. E.g. one truck can only carry 3.300 items, the message you got from SAP might look like:

Order=1
--Material=1
----Qty=3.300
--Material=1
----Qty=3.300
--Material=1
----Qty=3.300
--Material=1
----Qty=100
[...]

The target system likely is only interested in a message like this:

Order=1
--Material=1
----Qty=10.000
[...]

Note that this is a simple example, where you can easily "program something", which will do it. You can have this kind of disaggregation on each hierarchy-level (order, position and subposition) which makes the task more tricky.

Another reason for splitting might be that a certain good (material) might only be available by providing several batches:

Order=1
--Material=1
----Batch=A
----Qty=8.000
--Material=1
----Batch=B
----Qty=2.000
[...]

or

Order=1
--Material=1
----Batch=A
------Qty=8.000
----Batch=B
------Qty=2.000
[...]

If the target system is not interested in batches, then this will also have to be aggregated to:

Order=1
--Material=1
----Qty=10.000
[...]

Yet again, typically with orders you will have 3 places for (this type of) aggregation: Order-Level, Position-Level and Batch(/SubPosition)-Level (a hierarchy of 3 levels). You will have to use the aggregate()-method tree times then:

aggregate( "orders.order.orderNumber" )
aggregate( "positions.position.materialNumber" )
aggregate( "batches.batch.batchNumber" )

You cannot do this (especially not flexibly enough) with any graphical node in message broker. You can code something (probably using the environment). Or you can elegantly use the aggregate method

. Besides the load of a truck other "boundaries", where this fragmentation can occur are: a pallet, a carton or a package. Note that this can be the intended structure, so that someone can see these boundaries in the message. It is not that we are fixing a mistake (from SAP) here.

mqjeff wrote:

Are you attempting to collect records of a certain type in the same place in the same logical tree - i.e. put all "orders" in sequence within the message tree, so that order 2 is the next sibling of order 1 - in the manner that a "Group by" clause on an ESQL Select would do this?

Good point... it is a "GROUP BY", but in code (imperative programming) and using references. And before you argue that "SELECT" and the "GROUP BY" can do it... I must argue that (imperative) code gives you more control over what happens next than a (declarative) statement like "SELECT ...". Lets perceive "aggregate()" as the "imperative approach" and as an alternative to the "environment approach" and the "select's group-by-clause". The environment is not transparent/direct/flexible in my view ... I am not sure if you can do everything you want with "GROUP BY". I think you lose control of what happens because it is declarative (a single short statement that does a lot of work). I think mapping is better done using "imperative" programming.

mqjeff wrote:

It would be significantly faster and better to assemble an intermediate result, that created a REFERENCE equivalent to a C language "array of pointers", that you would then reorder. This would allow you to walk the input tree once, walk your array of pointers enough to sort them, and then use those references to detach/attach from Input to Output.

I am not sure what you mean with reorder/sort (no sorting in my example!). The idea of aggregate() is to (like you say) walk the input tree once and just "aggregating" things to the proper place in the output-tree ("decoupling input from output structure"). It is about structural transformation (but no sorting)! Aggregate() tries to avoid the intermediate result (in the environment tree) and directly puts things into the output-tree. If something goes wrong the transformation will just end/stop exactly where things couldn't proceed (to be mapped) onto the output-tree. This makes it easier to find bugs. I am using REFERENCEs in my code. I am not using detach/attach... I am not sure about this, but I guess since I always go down to each element(-value) I am not using it. Isn't detach/attach primarily for mapping whole (tree-)substructures ? Aggregate() is for transformation when structures are pretty disparate. Since this applies with a lot of interfaces it is very likely that you have to walk into each element and it is unlikely to be able to map whole substructure (with detach/attach). I also do not like to create intermediate structures in the environment and then detaching/attaching them to the output-root (not transparent, not direct, not flexible enough in my view). Also "SELECT GROUP BY" is probably limited in what you can achieve with it. Probably, you can make a "SUM()", but can you do other arithmetic operation (Multiply, divide,...). Also can you easily aggregate multi-hierarchy? Is it transparent, flexible, debug-able and intuitive enough ?

mqjeff wrote:

But again, it's still really not clear what you're trying to do from this code, or how it is "generic" rather than at least problem-domain specific. What if I don't have messages that contain "orders", but instead contain patient records or transaction records or or or...

"aggregate()" is generic in the sense that it is no (message-)specific code, so it is not specific to any one message structure. Thanks for pointing to "GROUP BY": Aggregate() is an imperative (from imperative programming) alternative to the "aggregation in the environment" and the declarative (declarative programming) SELECT's "GROUP BY"-clause. I guess you can also "aggregate" patients or transactions (in case they got disaggregated or there is structural mismatch between source and target system). Note that structural mismatch is typically about aggregation: You can't make up (too many) things (to expand the message), or: You can only process what you received (there certainly are exemptions).
_________________
Just use REFERENCEs

mqsiuser

Posted: Sat May 26, 2012 4:29 am Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Note: This post has been adjusted according to suggestions of subsequent posts:"groupBy removed" (2012-05-27)

So actually this feedback is somewhat valid:

mqjeff wrote:

this complicated MOVE REF code.

rekarm01 wrote:

awful lot of "tree-walking"

Especially since you don't need moveRef for "aggregating" (I will show how this works for nested criteria (and without moveRef)).

But first of all a very simple example. If you want to "aggregate" the following input-structure:

Code:

then you'd have the following code:

Code:

CALL CopyMessageHeaders();
DECLARE rOut REFERENCE TO OutputRoot;
CREATE LASTCHILD OF rOut AS rOut DOMAIN 'XMLNSC';
CREATE LASTCHILD OF rOut AS rOut NAME 'message';
CREATE LASTCHILD OF rOut AS rOut NAME 'list';
DECLARE rIn REFERENCE TO InputRoot.XMLNSC.message.list;
MOVE rIn FIRSTCHILD NAME 'element';
DECLARE rOutElement REFERENCE TO rOut;
WHILE LASTMOVE( rIn ) DO
IF NOT moveFirstChildWhere( rOutElement, 'element', 'value', rIn.value ) THEN
CREATE LASTCHILD OF rOutElement AS rOutElement NAME 'element';
END IF;
SET rOutElement.value = rIn.value;
SET rOutElement.count = COALESCE( rOutElement.count, 0) + 1;
MOVE rOutElement TO rOut;
MOVE rIn NEXTSIBLING REPEAT NAME;
END WHILE;
RETURN TRUE;

and "MOVE ref WHERE" - function(s):

Code:

CREATE PROCEDURE moveFirstChildWhere(INOUT ref REFERENCE, IN childName CHAR, IN childField CHAR, IN childValue CHAR) RETURNS BOOLEAN
BEGIN
DECLARE hasMatch BOOLEAN FALSE; -- hasMatch is "LASTMOVE(ref)"
DECLARE ref2 REFERENCE TO ref;
MOVE ref2 FIRSTCHILD NAME childName;
WHILE LASTMOVE(ref2) AND NOT hasMatch DO
IF ref2.{childField} = childValue THEN
MOVE ref TO ref2;
SET hasMatch = TRUE;
END IF;
MOVE ref2 NEXTSIBLING REPEAT TYPE NAME;
END WHILE;
RETURN hasMatch;
END;

CREATE PROCEDURE moveLastChildWhere(INOUT ref REFERENCE, IN childName CHAR, IN childField CHAR, IN childValue CHAR) RETURNS BOOLEAN
BEGIN
DECLARE hasMatch BOOLEAN FALSE; -- hasMatch is "LASTMOVE(ref)"
DECLARE ref2 REFERENCE TO ref;
MOVE ref2 LASTCHILD NAME childName;
WHILE LASTMOVE(ref2) AND NOT hasMatch DO
IF ref2.{childField} = childValue THEN
MOVE ref TO ref2;
SET hasMatch = TRUE;
END IF;
MOVE ref2 PREVIOUSSIBLING REPEAT TYPE NAME;
END WHILE;
RETURN hasMatch;
END;

CREATE PROCEDURE moveNextSiblingWhere(INOUT ref REFERENCE, IN childName CHAR, IN childField CHAR, IN childValue CHAR) RETURNS BOOLEAN
BEGIN
DECLARE hasMatch BOOLEAN FALSE; -- hasMatch is "LASTMOVE(ref)"
DECLARE ref2 REFERENCE TO ref;
MOVE ref2 NEXTSIBLING NAME childName;
WHILE LASTMOVE(ref2) AND NOT hasMatch DO
IF ref2.{childField} = childValue THEN
MOVE ref TO ref2;
SET hasMatch = TRUE;
END IF;
MOVE ref2 NEXTSIBLING REPEAT TYPE NAME;
END WHILE;
RETURN hasMatch;
END;

CREATE PROCEDURE movePreviousSiblingWhere(INOUT ref REFERENCE, IN childName CHAR, IN childField CHAR, IN childValue CHAR) RETURNS BOOLEAN
BEGIN
DECLARE hasMatch BOOLEAN FALSE; -- hasMatch is "LASTMOVE(ref)"
DECLARE ref2 REFERENCE TO ref;
MOVE ref2 PREVIOUSSIBLING NAME childName;
WHILE LASTMOVE(ref2) AND NOT hasMatch DO
IF ref2.{childField} = childValue THEN
MOVE ref TO ref2;
SET hasMatch = TRUE;
END IF;
MOVE ref2 PREVIOUSSIBLING REPEAT TYPE NAME;
END WHILE;
RETURN hasMatch;
END;

will have the result:

Code:

_________________
Just use REFERENCEs

Last edited by mqsiuser on Sat May 26, 2012 2:42 pm; edited 2 times in total

mqjeff

Posted: Sat May 26, 2012 5:00 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

So, basically, your "groupBy" function is a "MOVE NEXTSIBLING WHERE" function.

That is, all it returns is the next sibling of the tree that matches a given simple function. And it's not even really any simple function, it's the strict function "field.child=value" for a given child and value.

It should be named to indicate that, rather than suggesting that it does any aggregation or any grouping at all.

mqjeff

Posted: Sat May 26, 2012 5:04 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

to be more precise, I would expect that a function called "groupBy" would behave as follows.

For the message

Code:

if I said

Code:

groupBy(InputRoot.XMLNSC.message.list.element, "value");

I would expect a result like

Code:

You see, where the tree has been GROUPED BY the VALUE of the elements.

Then I could walk the tree and know that if my current element has a different value than my previous element, I'm on to a new group.

mqsiuser

Posted: Sat May 26, 2012 2:29 pm Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

mqjeff wrote:

So, basically, your "groupBy" function is a "MOVE NEXTSIBLING WHERE" function.

Excellent comment, thank you. Conditional MOVEs, thats right:

Code:

MOVE ref FIRSTCHILD NAME 'element' WHERE 'value' EQUALS rIn.value;
MOVE ref LASTCHILD NAME 'element' WHERE 'value' EQUALS rIn.value;
MOVE ref NEXTSIBLING NAME 'element' WHERE 'value' EQUALS rIn.value;
MOVE ref PREVIOUSSIBLING NAME 'element' WHERE 'value' EQUALS rIn.value;

I have adjusted the intial code posting accordingly.
_________________
Just use REFERENCEs

mqsiuser

Posted: Fri Jun 01, 2012 11:27 am Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

mqjeff wrote:

That is, all it returns is the next sibling of the tree that matches a given simple function. And it's not even really any simple function, it's the strict function "field.child=value" for a given child and value.

Simple things perform well... feel free to make it complicated.

So... here is how it all relates:

1. Aggregating in the Environment, something like:

Code:

SET Environment.intermediateResult.{criteria}.element.value = rIn.value;
DECLARE rOutElement REFERENCE TO Environment.intermediateResult.{criteria}.element;
rOutElement.count = COALESCE( rOutElement.count, 0) + 1;
-- Later/Then: Detaching (or copying ?!) from the Environment to the OutputRoot)

... requires an intermediate result (an intermediary result tree) (in the Environment).

2. MOVE ref WHERE... may require pre- and post-processing... which is relativly easily doable... supposedly performs well enough.

3. aggregate() ... does not require intermediate result (trees) or pre- or post-processing (for nested elements) (for the expense of performance).

Mistakes may be included (unintentionally

) good luck finding them.

mqjeff wrote:

to be more precise, I would expect that a function called "groupBy" would behave as follows.

Whats a proper name for it: I call it aggregating... How do you call it if you do "Environment.msg.{criteria}.myStructure..." and then create the result msg based on that ?!
_________________
Just use REFERENCEs

mqsiuser

Posted: Tue Apr 01, 2014 9:48 am Post subject:

Yatiri

Joined: 15 Apr 2008
Posts: 637
Location: Germany

Dear fellow fellows,

I never understood why people wouldn't be totally excited about "2" and "3". But today (and I am really sorry that it took me so long) I fully admit, that "1", (so "curly braces magic" "{...}") is significantly faster: It is like instantly there (at the element), while "2" and "3" will walk (the linked list) to the element.
_________________
Just use REFERENCEs

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Links » Generic ESQL Utilities

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP