Author |
Message
|
maus |
Posted: Fri Oct 26, 2018 11:54 am Post subject: Error Parsing BLOB to JSON |
|
|
Newbie
Joined: 26 Oct 2018 Posts: 4
|
Hello, I'm having a tough time converting an HTTP input to JSON. My flow has an HTTP Input node with the Message domain set to BLOB. If I put a breakpoint after the input node, I can inspect the bytes and everything looks normal:
Quote: |
20202020202020202022627573696e6573734e616d65223a202257617272696f72204c61f1657320426f776c696e67204c65616775e9222c0a
ñ é " , |
I understand it to mean "f1" and "e9" are the UTF-16 hex representations of the special characters "ñ" and "é", respectively. On the HTTP header of the test request, I'm setting the Content-Type to utf-16, so the InputRoot.Properties.CodedCharSetId = 1204, which is what I would expect.
Later in the flow, I'm trying to parse the input BLOB to JSON, via a Reset Content Descriptor (RCD) node and I'm getting "JSON parsing errors have occurred". The error appears in the message root under the JSON folder; there is nothing in the ExceptionList. The flow proceeds and fails later because of this. This fails whether or not the special characters are included in request, and I'm not sure why. If I do not set the charset on the input request, Broker sets the char set ID to 1208. Then, when I try to run the message through the RCD node, I get the unconvertable character error. With the char set ID set to 1208, if there are no special characters in the request, it parses just fine.
I haven't done any special handling of the bitstream at all, I'm just taking the message in and running it through the RCD node; am I missing something somewhere? Is there something I should be checking? |
|
Back to top |
|
 |
maus |
Posted: Fri Oct 26, 2018 12:12 pm Post subject: |
|
|
Newbie
Joined: 26 Oct 2018 Posts: 4
|
Just to add, the JSON being passed into the flow is valid:
Code: |
{
"client": {
"names": [{
"businessName": "Warrior Lañes Bowling Leagué",
"updateDate": "2017-09-29T15:45:50.082-04:00"
}]
}
} |
|
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Oct 26, 2018 8:53 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
CCSID 1204 is a new one to me for UTF-16. I thought the ones to use were 1200,1201,1202 depending on whether or not you also use a byte level indicator...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
timber |
Posted: Sat Oct 27, 2018 12:46 pm Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
That is not a UTF-16 character stream. I know that because most of the characters occupy 8 bits whereas in UTF-16 all characters are at least 16-bit.
It looks more like a UTF-8 character stream, but you claim that it doesn't work when you set the encoding to UTF-8 either. So it might be a badly-constructed UTF-8 character stream. Either way, it's definitely not UTF-16. |
|
Back to top |
|
 |
rekarm01 |
Posted: Sun Oct 28, 2018 2:08 pm Post subject: Re: Error Parsing BLOB to JSON |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
maus wrote: |
I understand it to mean "f1" and "e9" are the UTF-16 hex representations of the special characters "ñ" and "é", respectively. |
That's neither UTF-16, nor UTF-8. It could be "iso-8859-1" or "windows-1252", but only the sender knows for sure.
maus wrote: |
On the HTTP header of the test request, I'm setting the Content-Type to utf-16 |
The HTTP header does not convert the data, it only describes it. If the description doesn't match the data, then that can cause problems for the receiver, including JSON parser errors, and unconvertable characters. |
|
Back to top |
|
 |
maus |
Posted: Tue Oct 30, 2018 4:48 am Post subject: |
|
|
Newbie
Joined: 26 Oct 2018 Posts: 4
|
The request is coming from a Java application, so I assumed it was UTF-16, but changing the OutputRoot.Properties.CodedCharSetId = 819 made it so the Reset Content Descriptor node can parse the BLOB to JSON, and I'm now seeing the special characters in the message. It must have been ISO.
The RCD node must be setting OutputRoot.Properties.CodedCharSetId = 1208 during the translation. Does that mean my message has been encoded as UTF-8? The reason I think this is because if I put a breakpoint directly after the RCD node, the CodedCharSetId is set to 1208.
The flow is converting the JSON into a SOAP request for the target application, and the target application is expecting UTF-8 anyway, so if I don't need to do any further processing of the message, that would be ideal. |
|
Back to top |
|
 |
timber |
Posted: Tue Oct 30, 2018 7:55 am Post subject: |
|
|
 Grand Master
Joined: 25 Aug 2015 Posts: 1292
|
Quote: |
The request is coming from a Java application, so I assumed it was UTF-16 |
That was not a valid assumption. Java strings are UTF-16 internally but Java can read and write characters streams in any valid encoding.
In any case, you should never try to guess the encoding because there is no reliable algorithm for doing that in most cases. A sample document can look identical in multiple different encodings. The sender should always specify the encoding - preferably along with the message but if not then it should be specified in the design document.
Quote: |
The RCD node must be setting OutputRoot.Properties.CodedCharSetId = 1208 during the translation. Does that mean my message has been encoded as UTF-8? |
When IIB writes a message, it selects the output encoding based on OutputRoot.Properties.CodedCharSetId. So yes, if it is still set to 1208 when the message tree reaches the SOAPRequest node then the output XML will be encoded in UTF-8. |
|
Back to top |
|
 |
rekarm01 |
Posted: Tue Oct 30, 2018 5:14 pm Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 1415
|
maus wrote: |
The RCD node must be setting OutputRoot.Properties.CodedCharSetId = 1208 ... |
The RCD node does not directly set OutputRoot.Properties.CodedCharSetId. The message flow must have set it some other way. |
|
Back to top |
|
 |
maus |
Posted: Tue Oct 30, 2018 5:43 pm Post subject: |
|
|
Newbie
Joined: 26 Oct 2018 Posts: 4
|
Thanks you guys for the great information! |
|
Back to top |
|
 |
|