Coding issues for Http requests and responses

Preface:

Today, let’s talk about the relationship between encoding and decoding between Tomcat server and web page.
You can read the URL encoding question
Link: Liao Xuefeng
Fixed: Baidu's current encoding is also UTF-8

Regarding the conversion problem between encodings, you can see
Link: Coding conversion problem

Browser side encoding:

The default decoding is GB2312
Factors that affect the encoding of form submitted data include: form's accept-charset attribute, html document encoding scheme, that is. Among them, whether form's accept-charset can be effective depends on the implementation of specific browsers. Some browsers do not support it, such as IE. The document encoding scheme can be modified through.

Browser-side decoding:

The default decoding is GB2312
This is due to national conditions, whether it is Windows or Linux, unless it is set up by yourself.

Generally speaking, the browser will first parse the text according to the encoding settings of Content-Type, and then discover the charset settings during the parsing process, and then replace the encoding and read it again. If Content-Type does not set encoding, or if the HTML file is not the HTTP protocol at all, the browser will usually guess the encoding to parse the text, and then find that the charset setting is changed and then change the encoding to read.

Let's do a test:
I wrote a web page without declaring meta and other encoding formats, but the text is written in UTF-8.
At this time, the browser opens the web page and finds garbled problems.
Then the background accepts data

request.setCharacterEncoding("GBK"); //or GB2312
System.out.println(request.getParameter("name"));

The output result is not garbled.
replace:

request.setCharacterEncoding("UTF-8"); //or do not write
System.out.println(request.getParameter("name"));

According to the default encoding format of request, garbled code appears according to iso-8859-1.

The browser's data body decoding scheme depends on return information.
a) The browser first looks for the encoding scheme annotation (Content-Type) from the return header, and give a chestnut:

response.setHeader  
("content-type", "text/html;charset=utf-8");

b) If there is no label, if you know that the return content is html content, you will read it from the meta tag of the head, and give me a chestnut:

 <meta http-equiv=Content-Type  
 content="text/html;charset=utf-8">

c) If you haven't found it, the browser will not know how to decode it, and will negatively choose a decoding solution.The default decoding is GB2312

In theory, it is recommended that html documents declare encoding in meta, and the encoding declaration must be completed within 1024 bytes at the beginning of the file, so it is best to declare it immediately at the beginning of the head tag.

Summary: The encoding and decoding of the server mainly depends on:

1. The encoding scheme of meta in the header2. Coding scheme for html document3. form's accept-charset encoding scheme

As long as the page declares what format the charset is, then the encoding and decoding are what format it is.

Server-side encoding:

The Chinese version of the browser generally uses GBK by default. UTF-8 can also be used by setting the browser. Different users may have different browser settings, which leads to different encoding methods. Therefore, many websites do the Chinese or special characters in the URL first use javascript to make URL encode, and then splice the URL to submit data.

Tomcat8 or above is UTF-8 by default

So the encoding default is UTF-8

So getBytes() uses UTF-8 by defaultYou can also choose the encoding format.
Take a chestnut:

response.getOutputStream().write  
("22 This d is a byte output".getBytes("GBK"));

analyze:

The format stored in response is GBK. The browser defaults to GBK and displays normally.
 If changed to.getBytes() is UTF-8The browser will open the code

Summarize:
The default encoding of HttpServletResponse and HttpServletRequest is different
The default encoding of HttpServletResponse is not iso-8859-1, but depends on the encoding format directly stored, which means it depends on thegetBytes。
The default should be in the doGet method, because the browser accesses the Servlet in Get by default.

Set browser decoding format

("Content-type", "text/html;  
charset=UTF-8");

If you modify it, you can read this article in detail:

/jiangwei0910410003/article/details/22886847

Fixed:

response.getWriter().write()The default encoding of ; is UTF-8
response.getOutputStream().write(string.getBytes());  The default encoding format is UTF-8

Complete chestnuts:

response.setHeader("Content-type", "text/html;charset=UTF-8");
response.getOutputStream().write("22 This is a byte output".getBytes());
response.getWriter().write("Haha, yes");

Output:

22 This is a byte output

Summarize:
1. getOutputStreamandgetWriterThese two methods are mutually exclusive. After calling any of them, the other method cannot be called.
2. getOutputStream() and getWriter()Can be used in stacks.
3. After the service method of Serlvet ends, the Servlet engine will check whether the output stream object returned by the getWriter or getOutputStream method has called the close method. If not, the Servlet engine will call the close method to close the output stream object.

Server-side decoding:

This is the hardest thing I think to understand. ...
You can refer to this article:

/qq_38409944/article/details/80633743

The default encoding format of Tomcat8 or above isUTF-8
The default encoding format of HttpServletRequest and HttpServletResponse containers isiso-8859-1

Get and Post decoding formats:
The request container stores the browser's data, which is generally in UTF-8 format.
The request method is such as: getParameter decoding format will set the decoding format according to Get and Post.

1. GetThe default decoding format is Tomcat8 encoding format.  So the URL decoding is UTF-8,
 CoveredrequestContainer decoding format2.The default decoding format of Post isrequestEncoding format.  Nothing to do with Tomcat8 encoding format

There are three ways to initiate access from the browser:

requestGarbage code refers to: the request parameters sent by the browser to the server contain Chinese characters.
 The value of the request parameter obtained by the server is garbled;

Generally, the data passed by the browser is in UTF-8 format by default.
There are two ways to transmit data. One is Post and the other is Get. Data transmitted from different ways. The decoding method is different.

GetYes putURLparameter
 Post is placed in physical content

But it can all passHttpServletRequest requestTo decode

Enter directly in the address barURLaccess,
 Click the hyperlink in the page to access,
 Submit form access.

The first access method browser encodes the parameters according to utf-8 by default.
The following two access methods browsers encode the parameters according to the display code of the current page. Therefore, for request garbled code, you only need to set the corresponding decoding format on the server side. Due to different access methods, the browser's encoding format for parameters is also different. For easy processing, access through hyperlinks and forms also stipulates that it must be in the utf-8 format, that is, the encoding of the current page must also be used utf-8, so that the browser will use utf-8 to encode the parameters uniformly.

On the server side, the encoding of the server response container can be set to utf-8 (default is ISO-8859-1).
But it is only valid for parameters in the request body;
If the parameter follows the uri in the request line, it can do nothing. Therefore, the request method is different, and the solution to garbled code is also different.

By modifying the specified server to decode the get and post in a unified manner according to utf-8, all web applications under tomcat management must use utf-8 encoding.
Conclusion: Conventions are very important, and websites generally use UTF-8 as the default encoding. If it is not a special requirement, do not convert it to other encodings.

Summary: It is recommended to not include Chinese parameters when requesting get

/caowei/p/2013-12-11_request
/jiangwei0910410003/article/details/22886847
/article/18897
http://blog./yuehaoyisheng/1324709
“`