web123456

python 2. python reads .htm file error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in position 0 solution

The problem is this: I use a program written in python to read the data in the .htm file. At first, when I used:fr = open("" , "r"), the program crashed directly after running. Later, according to the prompt error message: ValueError encoding must be one of 'utf_8','big5', or 'gbk'., so I used codecs to rewritten it into the following form:

- coding: utf-8 -

import sys
reload(sys)
("utf-8")
import codecs
fr = (“” ,”r” , “utf-8”)
At least the problem of crashing the program is solved.
But when reading the contents in the file:
When you read a line containing Chinese, the program crashed directly: the content of this line is as follows:

.....-ActiveX
The error prompt is as follows:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb3 in position 0: invalid start byte

The encoding in my file is as follows:

 

reason:

 

The encoding in the file is as follows:
 <html>
 <head>
 <meta http-equiv="Content-Language" content="zh-cn">
 <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
 <meta name="ProgId" content="">
 <meta http-equiv="Content-Type" content="text/html; charset=gb2312">
 Its character set is: gb2312

Therefore, it must be read in gbk encoding

 

 

Solution:

 

Can't decode with "utf-8", use "gbk"

 fr = ("" ,"r" , "gbk")