爬取内容编码格式检测chardet.detect object of type bytes or bytearray, got: <class 'str'> - python scrapy模块

一、前提

scrapy写一个爬虫，爬取到了页面信息，由于某些原因需要检测获取字符串的编码格式，发现检测中提示了TypeError: Expected object of type bytes or bytearray, got: <class 'str'>这样的错误结果

import scrapy

import chardet

responseStr = str(response.body) #str字符串

print('=============|%s|' % type(responseStr)) #返回结果<class 'str'>

print('=============|%s|' % (chardet.detect(responseStr))) #返回结果TypeError: Expected object of type bytes or bytearray, got: <class 'str'>

二、说明

chardet.detect</a>(byte_str)，没错byte_str参数必须是字节类型（bytes）字符串，现在回过头看看上面的错误提示TypeError: Expected object of type bytes or bytearray, got: <class 'str'>，知道原因了，因为我的字符串是文本类型（str）字符串

三、补充

#python有两种不同的字符串，一种存储文本，一种存储字节。对于文本python内部采用unicode存储，而字节字符串显示原始字节序列或者ASCII

#python3，文本字符串类型（使用unicode数据存储）被命名为str，字节字符串类型命名为bytes。

#一般情况下，实例化一个字符串会得到一个str对象，如果想得到bytes，那就在文本之前加上前缀b，或者encode一下。所以，str对象有一个encode方法，bytes对象有一个decode方法

#str和bytes之间的转换：

str->encode()->bytes

bytes->decode()->str

四、代码修改如下

import scrapy

import chardet

responseStr = str(response.body) #str字符串

responseStr = responseStr.encode()

print('=============|%s|' % type(responseStr)) #返回结果<class 'bytes'>

print('=============|%s|' % (chardet.detect(responseStr))) #返回正确结果{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}

.....

转载请注明：谷谷点程序 » 爬取内容编码格式检测chardet.detect object of type bytes or bytearray, got: <class 'str'>