Author |
Message
|
elenzo |
Posted: Tue Oct 29, 2013 11:20 am Post subject: Encoding problems in HTTPRequest Node |
|
|
Acolyte
Joined: 22 Aug 2006 Posts: 53
|
Hi, I have a simple flow that makes an http request. The web service's response is encoded in ISO-8859-1 but the message flow tries to parse it as UTF-8.
I have 2 trace nodes, the first one is before the http request node and the properties looks like this:
Code: |
( ['MQROOT' : 0x6568b50]
(0x01000000:Name ):Properties = ( ['MQPROPERTYPARSER' : 0x656b650]
(0x03000000:NameValue):MessageSet = '' (CHARACTER)
(0x03000000:NameValue):MessageType = '' (CHARACTER)
(0x03000000:NameValue):MessageFormat = '' (CHARACTER)
(0x03000000:NameValue):Encoding = 0 (INTEGER)
(0x03000000:NameValue):CodedCharSetId = 819 (INTEGER) |
the second trace, immediatly after the http request node, and the CodedCharSetId has a new value, 1208
This CodedCharSetId=1208 generates a parsing error, to solve this the only why I found was:
Set httpRequest node response message parsing to BLOB, add a compute node to set CodedCharSetId=819 and a resent content descriptor to parse the message as XMLNSC.
The question is, why is the httpRequest Node changing the CodedCharSetId value to 1208? How can I avoid this? I 've been reading the infocenter but didnt get the answers to my questions...
Any help will be very appreciated! |
|
Back to top |
|
 |
gs |
Posted: Wed Oct 30, 2013 2:35 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
Have you verified that the web service HTTP response headers match the response? Could it be that Content-Type charset is set to utf-8? |
|
Back to top |
|
 |
elenzo |
Posted: Wed Oct 30, 2013 9:20 am Post subject: |
|
|
Acolyte
Joined: 22 Aug 2006 Posts: 53
|
This is the web services response
Code: |
)
(0x01000000:Name):HTTPResponseHeader = ( ['WSRSPHDR' : 0x654f1f0]
(0x03000000:NameValue):X-Original-HTTP-Status-Line = 'HTTP/1.1 200 OK' (CHARACTER)
(0x03000000:NameValue):X-Original-HTTP-Status-Code = 200 (INTEGER)
(0x03000000:NameValue):Date = 'Wed, 30 Oct 2013 17:16:36 GMT' (CHARACTER)
(0x03000000:NameValue):Server = 'Microsoft-IIS/6.0' (CHARACTER)
(0x03000000:NameValue):Content-Length = '381' (CHARACTER)
(0x03000000:NameValue):Content-Type = 'text/xml' (CHARACTER)
(0x03000000:NameValue):Set-Cookie = 'ASPSESSIONIDCQSDSBSR=OIHLAPJADDBAJCCILOILOKJN; path=/' (CHARACTER)
(0x03000000:NameValue):Cache-control = 'private' (CHARACTER)
)
(0x01000000:Name):BLOB = ( ['none' : 0x654eee0]
(0x03000000:NameValue):UnknownParserName = '' (CHARACTER)
(0x03000000:NameValue):BLOB = X'3c3f786d6c207665727 |
There is no encoding definition in content-type |
|
Back to top |
|
 |
mgk |
Posted: Wed Oct 30, 2013 4:09 pm Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
So a charset of utf-8 will be assumed if the remote server does not set the charset correctly.
Kind regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
gs |
Posted: Thu Oct 31, 2013 2:45 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
|
Back to top |
|
 |
mqjeff |
Posted: Thu Oct 31, 2013 2:52 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Section 3.7.1 says that media types of the "text" type have a default char set of ISO-8859-1, but your media is "text/xml", and I think you'll find that XML has a default charset of utf-8. Although in theory, if the MIME part doesn't include a charset, then the charset in the xml declaration should be used.
I bet you'll find that the doc is being sent in ISO-8859-1, but the xml declaration says "utf-8".... |
|
Back to top |
|
 |
gs |
Posted: Thu Oct 31, 2013 3:06 am Post subject: |
|
|
 Master
Joined: 31 May 2007 Posts: 254 Location: Sweden
|
mqjeff wrote: |
Section 3.7.1 says that media types of the "text" type have a default char set of ISO-8859-1, but your media is "text/xml", and I think you'll find that XML has a default charset of utf-8. Although in theory, if the MIME part doesn't include a charset, then the charset in the xml declaration should be used |
text/xml is a subtype of text, meaning it should be defaulted to ISO-8859-1.
Quote: |
When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. |
|
|
Back to top |
|
 |
elenzo |
Posted: Thu Oct 31, 2013 9:42 am Post subject: |
|
|
Acolyte
Joined: 22 Aug 2006 Posts: 53
|
Thanks for the replys, indeed the httpresponseheader has text/xml as content-tpye and according to what you said it should be set to iso-8859-1 as ccsid, but thats not what it is doing. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Oct 31, 2013 9:51 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Ok... again, I thought the rules were different for text/xml, but they're not [url=http://www.ietf.org/rfc/rfc3023.txt]
8.5 Text/xml with Omitted Charset
Content-type: text/xml
{BOM}<?xml version="1.0" encoding="utf-16"?>
or
{BOM}<?xml version="1.0"?>
This example shows text/xml with the charset parameter omitted. In
this case, MIME and XML processors MUST assume the charset is "us-
ascii", the default charset value for text media types specified in
[RFC2046]. The default of "us-ascii" holds even if the text/xml
entity is transported using HTTP.
Omitting the charset parameter is NOT RECOMMENDED for text/xml. For
example, even if the contents of the XML MIME entity are UTF-16 or
UTF-8, or the XML MIME entity has an explicit encoding declaration,
XML and MIME processors MUST assume the charset is "us-ascii".[/url]
So I'd pursue a PMR. |
|
Back to top |
|
 |
mgk |
Posted: Thu Oct 31, 2013 11:18 am Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
Can you post more of the BLOB response message - the XML-decl may say UTF-8 which would influence the parsing...
Kind regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
elenzo |
Posted: Thu Oct 31, 2013 11:20 am Post subject: |
|
|
Acolyte
Joined: 22 Aug 2006 Posts: 53
|
mgk wrote: |
Can you post more of the BLOB response message - the XML-decl may say UTF-8 which would influence the parsing...
|
The XML declaration of the response is ISO-8859-1, I've already checked |
|
Back to top |
|
 |
mgk |
Posted: Thu Oct 31, 2013 12:18 pm Post subject: |
|
|
 Padawan
Joined: 31 Jul 2003 Posts: 1642
|
So, I've checked the code and this behaviour is by design, since pragmatically the vast majority of customers who omit a charset are actually using utf-8. In fact, this case here is the first one I've seen that does seem to require ISO-8859-1. Is it possible to get the remote end to send the charset with the response?
Kind regards, _________________ MGK
The postings I make on this site are my own and don't necessarily represent IBM's positions, strategies or opinions. |
|
Back to top |
|
 |
elenzo |
Posted: Thu Oct 31, 2013 12:22 pm Post subject: |
|
|
Acolyte
Joined: 22 Aug 2006 Posts: 53
|
mgk wrote: |
So, I've checked the code and this behaviour is by design, since pragmatically the vast majority of customers who omit a charset are actually using utf-8. In fact, this case here is the first one I've seen that does seem to require ISO-8859-1. Is it possible to get the remote end to send the charset with the response?
Kind regards, |
Thanks for the information. Its not possible to get the remote end, it is an external web service. The important is that you confirm how WMB works, that is enough for me. I 'll fix it with my workaround. |
|
Back to top |
|
 |
|