Author |
Message
|
sleepyjamie |
Posted: Fri Nov 13, 2015 7:03 am Post subject: Handling large HTTP payloads |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
I have a case where a REST API is returning a large payload. This is causing the HTTP Request node to take a really long time to return.
I am wondering if IIB has the ability to provide an input stream such that I can read the bytes from payload and propagate based on a delimiter of my choice? |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Nov 13, 2015 8:19 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I forget if the JSON parser is a streaming parser or not. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
sleepyjamie |
Posted: Fri Nov 13, 2015 8:38 am Post subject: |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
Yeah I've searched through the documentation with no luck  |
|
Back to top |
|
 |
mqjeff |
Posted: Fri Nov 13, 2015 8:45 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
The way to tell is to examine the logic in the large message processing sample, and see how it propagates parts of the inputroot and then deletes them and/or clears the output root before sending the next part.
If that performs better than what you're trying now, it's probably a streaming parser.
Perhaps some parser guru will be along to express an opiinion. _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
timber |
Posted: Fri Nov 13, 2015 8:47 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
I don't remember either, but regardless of the answer JSON is certainly an on-demand parser. So you should be able to do this:
- set the Domain on the HTTPRequest node to BLOB
- parse once 'record' at a time, and generate the output for it
- propagate the resulting message tree and do whatever you need to do with it in the rest of the message flow
- Delete the message tree and parse the next record
Details are in this well-thumbed article:
http://www.ibm.com/developerworks/websphere/library/techarticles/0505_storey/0505_storey.html
( it was written in a preivous era, so ignore references to now-deprecated domains. The techniques still work )
This way, you still need to read in the entire input BLOB ( which may be large ) but at least you're not building the message tree for all of the records at the same time. Memory usage will be a *lot* lower. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Nov 13, 2015 9:05 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
Perhaps some parser guru will be along to express an opiinion. |
All Hail The Parser King!  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
sleepyjamie |
Posted: Fri Nov 13, 2015 9:57 am Post subject: |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
Looks like this approach still isn't feasible.
Reason being is large BLOB payloads cause IIB toolkit to hang after an attempt to cast the BLOB to a string. So having a pointer/reference to an input stream would be ideal. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Nov 13, 2015 10:03 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
sleepyjamie wrote: |
Reason being is large BLOB payloads cause IIB toolkit to hang after an attempt to cast the BLOB to a string. |
a) Have you tried this on a server that's not got a bit more power than the one in the Toolkit
b) I'm surprised to hear that this payload is nothing but a string and no structure that can be managed with DFDL, especially as you talked earlier about using a "delimiter of your choice". What exactly is this data? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
smdavies99 |
Posted: Fri Nov 13, 2015 10:07 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
sleepyjamie wrote: |
Looks like this approach still isn't feasible.
Reason being is large BLOB payloads cause IIB toolkit to hang after an attempt to cast the BLOB to a string. So having a pointer/reference to an input stream would be ideal. |
I guess that you must be trying to use the debugger in order to get the TK to hang.
Why not revert to 'old-style' debugging? i.e. Use user trace.
In fact, I will go as far as saying that this is the best way to handle large payloads.
I say old style because us 'old farts' who have been using this product for more than a decade didn't have the luxury (sic) of the debugger back in them old days.
It was usertrace or nothing.
I have never got on with the debugger so use usertrace for all by debugging even with V10. YMMV though. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
sleepyjamie |
Posted: Fri Nov 13, 2015 10:27 am Post subject: |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
The data response is json. There is no need to parse it into node tree because my flow simply does a quick publish of the data to a queue. So doing the transformation is just wasted cpu. For example:
[{"a": "A1", "b": "B1"}, {"a": "A2", "b": "B2"}]
This would be published to MQ as two messages
<data><a>A1</a><b>B1</b></data>
<data><a>A2</a><b>B2</b></data>
Using DFDL is a bit overkill for this. It's easier to write some simple code that iterates over the character array and propagate whenever I encounter a character sequence. At any given time I only need a few characters in the input buffer for reading.
I'm running a corei7 (8 cores) 16GB of RAM, I don't see why the toolkit would be hanging. I'll try increasing the memory but I think its an issue with the toolkit. Usertrace is an option, however IMO the toolkit should be robust to handle large payloads. I have no issues running the debugger under regular Eclipse IDE using Java for the same json response. |
|
Back to top |
|
 |
Vitor |
Posted: Fri Nov 13, 2015 10:36 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
sleepyjamie wrote: |
The data response is json. There is no need to parse it into node tree because my flow simply does a quick publish of the data to a queue. So doing the transformation is just wasted cpu. |
Ok, now I'm confused:
sleepyjamie wrote: |
For example:
[{"a": "A1", "b": "B1"}, {"a": "A2", "b": "B2"}]
This would be published to MQ as two messages
<data><a>A1</a><b>B1</b></data>
<data><a>A2</a><b>B2</b></data> |
This looks to my untrained eye like a JSON message is being transformed into XML.
sleepyjamie wrote: |
Using DFDL is a bit overkill for this. It's easier to write some simple code that iterates over the character array and propagate whenever I encounter a character sequence. At any given time I only need a few characters in the input buffer for reading. |
Easier than using the JSON parser to identify the end of the first JSON array (using the on-demand parsing and memory saving techniques previously mentioned)?
sleepyjamie wrote: |
I'm running a corei7 (8 cores) 16GB of RAM, I don't see why the toolkit would be hanging. I'll try increasing the memory but I think its an issue with the toolkit. |
Because it's trying to resolve the entire input stream to display it. Welcome to the IIB debugger and one reason why my worthy associate prefers the user trace. As I do.
sleepyjamie wrote: |
So doing the transformation is just wasted cpu |
I also question that assertion. It's almost certainly true in a pure Java world but you're not in Kansas any more Toto and I'll bet my entire annual bonus that the JSON parser can find it's way round a JSON message faster and more efficiently than IIB can find it's place in a string because string handling in IIB is resource expensive and inefficient, especially with large strings.
sleepyjamie wrote: |
I have no issues running the same debugger under regular Eclipse IDE using Java. |
This is the key point. You can stream HTTP traffic into Java and do exactly what you're describing, looking at a few characters at a time. IIB doesn't stream a BLOB (because a BLOB is by definition one singular item).
Humor us. Try using the JSON domain. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
timber |
Posted: Fri Nov 13, 2015 10:42 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
There is no need to parse it into node tree because my flow simply does a quick publish of the data to a queue. |
That would have been useful information in your first post.
Quote: |
I don't see why the toolkit would be hanging |
Nor me. The toolkit, for whatever reason, appears not to like data this large. Without seeing the stack trace it's hard to say any more than that. Have you tried using user trace and Trace nodes?
Quote: |
Using DFDL is a bit overkill for this. |
Possibly true. It depends on where else commas can appear in the JSON data. And how regular the delimiters are ( it is always a single comma, with no spaces on either side? )
You don't need to CAST to CHARACTER to scan the BLOB. Just CAST the string that you are looking for to a BLOB ( making sure to use the correct CCSID ). Then scan for that byte sequence. But don't forget the edge case where the data contains commas that are not delimiters. |
|
Back to top |
|
 |
sleepyjamie |
Posted: Fri Nov 13, 2015 10:49 am Post subject: |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
Vitor wrote: |
sleepyjamie wrote: |
The data response is json. There is no need to parse it into node tree because my flow simply does a quick publish of the data to a queue. So doing the transformation is just wasted cpu. |
Ok, now I'm confused:
sleepyjamie wrote: |
For example:
[{"a": "A1", "b": "B1"}, {"a": "A2", "b": "B2"}]
This would be published to MQ as two messages
<data><a>A1</a><b>B1</b></data>
<data><a>A2</a><b>B2</b></data> |
This looks to my untrained eye like a JSON message is being transformed into XML.
sleepyjamie wrote: |
Using DFDL is a bit overkill for this. It's easier to write some simple code that iterates over the character array and propagate whenever I encounter a character sequence. At any given time I only need a few characters in the input buffer for reading. |
Easier than using the JSON parser to identify the end of the first JSON array (using the on-demand parsing and memory saving techniques previously mentioned)?
sleepyjamie wrote: |
I'm running a corei7 (8 cores) 16GB of RAM, I don't see why the toolkit would be hanging. I'll try increasing the memory but I think its an issue with the toolkit. |
Because it's trying to resolve the entire input stream to display it. Welcome to the IIB debugger and one reason why my worthy associate prefers the user trace. As I do.
sleepyjamie wrote: |
So doing the transformation is just wasted cpu |
I also question that assertion. It's almost certainly true in a pure Java world but you're not in Kansas any more Toto and I'll bet my entire annual bonus that the JSON parser can find it's way round a JSON message faster and more efficiently than IIB can find it's place in a string because string handling in IIB is resource expensive and inefficient, especially with large strings.
sleepyjamie wrote: |
I have no issues running the same debugger under regular Eclipse IDE using Java. |
This is the key point. You can stream HTTP traffic into Java and do exactly what you're describing, looking at a few characters at a time. IIB doesn't stream a BLOB (because a BLOB is by definition one singular item).
Humor us. Try using the JSON domain. |
I've used JSON domain originally and thats when i found the crashing in IIB TK, then thought it might be too resource intensive, so I was going to switch to BLOB. If BLOB comes in as the full binary response then I might be out of luck.
Strange to me that a primitive feature such as reference to HTTP response as an input stream is unavailable. I wish IIB input nodes, parsers and stream handlers were separate logic in the product. This would allow you to write custom input stream handling logic for HTTP Request node. From a development architecture point of view this would be more flexible.
I'll keep trying. thanks! |
|
Back to top |
|
 |
stoney |
Posted: Fri Nov 13, 2015 11:22 am Post subject: |
|
|
Centurion
Joined: 03 Apr 2013 Posts: 140
|
How big a JSON message are we talking about here - KBs, MBs?
Quote: |
I'm running a corei7 (8 cores) 16GB of RAM, I don't see why the toolkit would be hanging. I'll try increasing the memory but I think its an issue with the toolkit. |
You might already be aware, but the toolkit (like all Java applications) is limited by the maximum Java heap size setting - I think this defaults to 1GB in all recent toolkit levels.
You can increase it by editing the -Xmx setting in <install root>/tools/eclipse.ini. |
|
Back to top |
|
 |
sleepyjamie |
Posted: Fri Nov 13, 2015 11:27 am Post subject: |
|
|
Centurion
Joined: 29 Apr 2015 Posts: 135
|
stoney wrote: |
How big a JSON message are we talking about here - KBs, MBs?
Quote: |
I'm running a corei7 (8 cores) 16GB of RAM, I don't see why the toolkit would be hanging. I'll try increasing the memory but I think its an issue with the toolkit. |
You might already be aware, but the toolkit (like all Java applications) is limited by the maximum Java heap size setting - I think this defaults to 1GB in all recent toolkit levels.
You can increase it by editing the -Xmx setting in <install root>/tools/eclipse.ini. |
The payload is in the MB. I'm using the 32-bit version so the toolkit memory is limited. I'll try and see if I can get the 64-bit.
I think a better approach is to ask the REST API dev to implement a paging endpoint.
Cheers.
jamie |
|
Back to top |
|
 |
|