Author |
Message
|
longng |
Posted: Thu Mar 14, 2013 7:59 pm Post subject: SELECT FROM WHERE |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
While we are doing analysis on performance implications of SELECT in our particular scenarios, we've come across a strange phenomenon!
It starts out innocent enough with
We have a set of input data that we would want to copy over to a new location (Environment Variables) without any duplications
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<InIds>
<id>AA</id>
<id>BB</id>
<id>BB</id>
<id>CC</id>
<id>CC</id>
<id>AA</id>
<id>BB</id>
</InIds>
|
For simplicity, the above data structure is the input message set that's referenced by the InIDRef within the FOR loop in the code below:
Code: |
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3.id;
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables.Layer1.Layer2.Layer3.id;
FOR InIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF (FindID(EnvInIDRef)) THEN
-- It does not exist
SET EnvOutIDRef = CAST (InIDRef AS CHARACTER);
CREATE NEXTSIBLING OF EnvOutIDRef AS EnvOutIDRef REPEAT;
END IF;
END FOR;
|
The FindID function used above is shown below:
Code: |
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
SET rowId = THE (SELECT A.id
FROM Environment.Variables.Layer1.Layer2.Layer3[] AS A
WHERE A.id = Id);
IF rowId.id IS NULL THEN
SET Found = FALSE;
END IF;
ELSE
SET Found = TRUE;
END IF;
RETURN Found;
END;
|
To our surprise, we still see duplications in the output (Environment.Variables.Layer1.Layer2.Layer3...). Further analysis confirms that the output does not contain duplicated element containing 'AA' but duplicated/triplicated elements containing 'CC' and 'BB', respectively'!' Playing around a bit more we find out that there will not be a duplicated element in the output if its value matched the first entry! It seems to indicate that the WHERE clause would only be effective for the first element?
What takes, what gives?  |
|
Back to top |
|
 |
Esa |
Posted: Thu Mar 14, 2013 11:40 pm Post subject: Re: SELECT FROM WHERE |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
Testing if a field is NULL is unreliable, take a look at EXISTS function. The argument it takes must be an array. Remember that an array with only one element is still an array. |
|
Back to top |
|
 |
Esa |
Posted: Fri Mar 15, 2013 12:17 am Post subject: Re: SELECT FROM WHERE |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
longng wrote: |
Code: |
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
SET rowId = THE (SELECT A.id
FROM Environment.Variables.Layer1.Layer2.Layer3[] AS A
WHERE A.id = Id);
IF rowId.id IS NULL THEN
SET Found = FALSE;
END IF;
ELSE
SET Found = TRUE;
END IF;
RETURN Found;
END;
|
|
In this function you are assigning a value to a variable called Found that you have not declared. If the code compiles in spite of that, the reason must be that you have declared it insome enclosing scope so that it is visible to your function.
That kind of carelessness in scoping is the root of many strange phenomenon!
But probably not in this case. Just take a look at the nice example of EXISTS in the InfoCenter an write a more safe test, something like this:
Code: |
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
RETURN EXISTS(SELECT A.id
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A.id = Id);
END;
|
Yes, the cause of the problem is that you are selecting from ...Layer3[] when you should select from Layer3.id[]...
The solution matches nicely with your topic title "SELECT FROM WHERE"
I hope this helps. |
|
Back to top |
|
 |
adubya |
Posted: Fri Mar 15, 2013 12:55 am Post subject: Re: SELECT FROM WHERE |
|
|
Partisan
Joined: 25 Aug 2011 Posts: 377 Location: GU12, UK
|
Esa wrote: |
Yes, the cause of the problem is that you are selecting from ...Layer3[] when you should select from Layer3.id[]...
|
If the OP selects from Layer3.id[] then there must be child "id" elements underneath the Layer3.id structure for the subsequent code to work. Not sure if that's what the OP's datastructure looks like.
The OP's FindID function as posted above wouldn't compile anyway due to the termination of the IF statement before the ELSE is specified. |
|
Back to top |
|
 |
Esa |
Posted: Fri Mar 15, 2013 1:36 am Post subject: Re: SELECT FROM WHERE |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
adubya wrote: |
Esa wrote: |
Yes, the cause of the problem is that you are selecting from ...Layer3[] when you should select from Layer3.id[]...
|
If the OP selects from Layer3.id[] then there must be child "id" elements underneath the Layer3.id structure for the subsequent code to work. Not sure if that's what the OP's datastructure looks like.
The OP's FindID function as posted above wouldn't compile anyway due to the termination of the IF statement before the ELSE is specified. |
You are right. Talk about carelessness...
It seems OP's code is hand written in the post, not copy/pasted from Toolkit. Maybe there are even more typos? There is something that may make the externally defined variable Found to get stuck with FALSE....
Corrected suggestion:
Code: |
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
RETURN EXISTS(SELECT A.id
FROM Environment.Variables.Layer1.Layer2.Layer3[] AS A
WHERE A.id = Id);
END;
|
|
|
Back to top |
|
 |
kimbert |
Posted: Fri Mar 15, 2013 2:45 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Is the FindID() function likely to be a performance bottleneck?
If it is, then you would be well advised to consider the following:
- do duplicates typically arrive together ( or close together ) in the input? If so, then you could significantly reduce the cost of your FindID() by checking for the last ID ( or last two or three IDs ) before doing the full search.
- If the number of IDs can get large then you could maintain the list in the Environment as a sorted list, and use a binary search in your FindID() function. |
|
Back to top |
|
 |
longng |
Posted: Fri Mar 15, 2013 7:45 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
Thank you! You all are so sharp as to be able to spot all those typos, which I inadvertently introduced while 'translating' from the actual existing code as to remove propriety information and/or superfluous details!
Here are some additional information:
If the code is run in debugging mode, I can see the list 'A' being gradually populated with entries and the 'Found' flag is mostly FALSE except for the case of the second 'AA' entry where Found is TRUE.
In the trace, annotations seem to imply the test (WHERE) would only be applied to the first entry as though the logic doesn't know there are more entries after the first one!
@Kimbert: thanks for the suggestions, I intend to share more details (in other thread) about performance (as perceived by me) in other thread (http://www.mqseries.net/phpBB2/viewtopic.php?t=63726) later.
For completeness, I have corrected the typos and list them here again just in case anyone would like to confirm/refute my observation. It should only take about 10 minutes to construct a simple message flow (MQInput->Compute->MQOutput) with the details below:
Input data
Code: |
<?xml version="1.0" encoding="UTF-8"?>
<InIds>
<id>AA</id>
<id>BB</id>
<id>BB</id>
<id>CC</id>
<id>CC</id>
<id>AA</id>
<id>BB</id>
</InIds> |
Main Function
Code: |
CREATE FUNCTION Main() RETURNS BOOLEAN
BEGIN
CALL CopyEntireMessage();
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3.id;
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables.Layer1.Layer2.Layer3.id;
FOR InIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF (FindID(EnvInIDRef)) THEN
-- It does not exist
SET EnvOutIDRef = CAST (InIDRef AS CHARACTER);
CREATE NEXTSIBLING OF EnvOutIDRef AS EnvOutIDRef REPEAT;
END IF;
END FOR;
RETURN TRUE;
END; |
FindID function
Code: |
The FindID function
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
DECLARE found BOOLEAN FALSE;
SET rowId = THE (SELECT A.id
FROM Environment.Variables.Layer1.Layer2.Layer3[] AS A
WHERE A.id = Id);
IF rowId.id IS NULL THEN
SET Found = FALSE;
ELSE
SET Found = TRUE;
END IF;
RETURN Found;
END;
|
|
|
Back to top |
|
 |
mgk |
Posted: Fri Mar 15, 2013 1:13 pm Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
The new code as posted will still not deploy as a couple of variables are not defined and the sense of the IF is wrong. However, the reason it will fail to do what you want (at least in the example posted here) is that there is only ever one element called "layer3" so the WHERE clause will only ever test the first "id" element, which is why it will only match when the first element is repeated later on. If you change the code to something like this you should have better luck:
Code: |
SET rowId = THE (SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id); |
Note also that your FOR loop at present will always create an "extra" element with a "NULL" value. This may or may not be a problem for you...
Kind regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
longng |
Posted: Fri Mar 15, 2013 1:52 pm Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
mgk wrote: |
The new code as posted will still not deploy as a couple of variables are not defined and the sense of the IF is wrong. However, the reason it will fail to do what you want (at least in the example posted here) is that there is only ever one element called "layer3" so the WHERE clause will only ever test the first "id" element, which is why it will only match when the first element is repeated later on. If you change the code to something like this you should have better luck:
Code: |
SET rowId = THE (SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id); |
Note also that your FOR loop at present will always create an "extra" element with a "NULL" value. This may or may not be a problem for you...
Kind regards, |
That's it!!!!! Thank you very much mgk!
and yes, I am guilty as charged of other typos while trying to 'translate' the actual code to the one I post here. For the extraneous element you mention, the actual code does have logic to clean it up after the FOR loop terminates.
Here's the whole code with correction suggested by mgk:
Code: |
CREATE PROCEDURE BuildIDStructure ( )
BEGIN
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3.id;
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables.Layer1.Layer2.Layer3.id;
FOR EnvInIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF (EnvInIDRef) IS FALSE) THEN
-- It does not exist
SET EnvOutIDRef = CAST (EnvInIDRef AS CHARACTER);
CREATE NEXTSIBLING OF EnvOutIDRef AS EnvOutIDRef REPEAT;
END IF;
END FOR;
[b] -- Remove the last unused field
IF LASTMOVE(EnvOutIDRef) THEN
DELETE FIELD EnvOutIDRef;
END IF;
[/b] END;
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
DECLARE rowId ROW NULL;
DECLARE Found BOOLEAN FALSE;
SET rowId = THE (SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id);
IF rowId IS NULL THEN
SET Found = FALSE;
ELSE
SET Found = TRUE;
END IF;
RETURN Found;
END;
|
|
|
Back to top |
|
 |
Esa |
Posted: Fri Mar 15, 2013 11:38 pm Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
More typos?
longng wrote: |
Code: |
IF (EnvInIDRef) IS FALSE) THEN
|
|
|
|
Back to top |
|
 |
longng |
Posted: Sat Mar 16, 2013 7:37 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
Esa wrote: |
More typos?
longng wrote: |
Code: |
IF (EnvInIDRef) IS FALSE) THEN
|
|
|
Yes Esa! It's mine
Hopefully, there's no more
Code: |
CREATE PROCEDURE BuildIDStructure ( )
BEGIN
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3.id;
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables.Layer1.Layer2.Layer3.id;
FOR EnvInIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF (FindID(EnvInIDRef) IS FALSE) THEN
-- It does not exist
SET EnvOutIDRef = CAST (EnvInIDRef AS CHARACTER);
CREATE NEXTSIBLING OF EnvOutIDRef AS EnvOutIDRef REPEAT;
END IF;
END FOR;
[b] -- Remove the last unused field
IF LASTMOVE(EnvOutIDRef) THEN
DELETE FIELD EnvOutIDRef;
END IF;
[/b] END;
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
DECLARE rowId ROW NULL;
DECLARE Found BOOLEAN FALSE;
SET rowId = THE (SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id);
IF rowId IS NULL THEN
SET Found = FALSE;
ELSE
SET Found = TRUE;
END IF;
RETURN Found;
END; |
|
|
Back to top |
|
 |
adubya |
Posted: Sat Mar 16, 2013 7:53 am Post subject: |
|
|
Partisan
Joined: 25 Aug 2011 Posts: 377 Location: GU12, UK
|
I'd probably go with
Code: |
CREATE PROCEDURE BuildIDStructure ( )
BEGIN
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables;
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3 AS EnvOutIDRef;
FOR EnvInIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF NOT FindID(EnvInIDRef) THEN
-- It does not exist
CREATE LASTCHILD OF EnvOutIDRef NAME 'id' VALUE CAST ( EnvInIDRef AS CHARACTER );
END IF;
END FOR;
END;
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
RETURN EXISTS ( SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id );
END; |
|
|
Back to top |
|
 |
longng |
Posted: Sat Mar 16, 2013 8:19 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
adubya wrote: |
I'd probably go with
Code: |
CREATE PROCEDURE BuildIDStructure ( )
BEGIN
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables;
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3 AS EnvOutIDRef;
FOR EnvInIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF NOT FindID(EnvInIDRef) THEN
-- It does not exist
CREATE LASTCHILD OF EnvOutIDRef NAME 'id' VALUE CAST ( EnvInIDRef AS CHARACTER );
END IF;
END FOR;
END;
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
RETURN EXISTS ( SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id );
END; |
|
@adubya: That's definitely much better  |
|
Back to top |
|
 |
rekarm01 |
Posted: Sat Mar 16, 2013 1:40 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
adubya wrote: |
I'd probably go with
Code: |
...
CREATE FUNCTION FindID (IN Id CHARACTER) RETURNS BOOLEAN
BEGIN
RETURN EXISTS(SELECT A
FROM Environment.Variables.Layer1.Layer2.Layer3.id[] AS A
WHERE A = Id);
END; |
|
Although when it's boiled down to one statement, it doesn't really need to be a separate function anymore. Moving the SELECT call into the procedure also allows for the direct use of EnvOutIDRef:
Code: |
CREATE PROCEDURE BuildIDStructure()
BEGIN
DECLARE EnvOutIDRef REFERENCE TO Environment.Variables;
CREATE FIELD Environment.Variables.Layer1.Layer2.Layer3 AS EnvOutIDRef;
FOR EnvInIDRef AS InputRoot.XMLNSC.InIds.id[] DO
IF NOT EXISTS(SELECT A FROM EnvOutIDRef.id[] AS A WHERE A = EnvInIDRef) THEN
-- It does not exist
CREATE LASTCHILD OF EnvOutIDRef NAME 'id' VALUE CAST(EnvInIDRef AS CHARACTER);
END IF;
END FOR;
END; |
[Edit: Fixed the 'WHERE A = Id' clause, per adubya's hint ... however, the code remains untested.]
Last edited by rekarm01 on Sun Mar 17, 2013 3:06 am; edited 1 time in total |
|
Back to top |
|
 |
adubya |
Posted: Sun Mar 17, 2013 12:36 am Post subject: |
|
|
Partisan
Joined: 25 Aug 2011 Posts: 377 Location: GU12, UK
|
Nearly there, you have a copy/paste error though  |
|
Back to top |
|
 |
|