pandas分块读取大量数据集
两个参数:chunksize,iterator
1、chunksize
read_csv 和 read_table 有一个chunksize参数,用以指定一个块大小(每次读取多少行),返回一个可迭代的 TextFileReader 对象。
import pandas as pd
reader = pd.read_csv("pff_GEN_NUCHANGE.csv",chunksize=10000)
for df in reader :
对df处理
#如df.drop(columns=[‘GEN_id‘],axis=1,inplace=True)
#print(type(df),df.shape)打印看一下信息to_csv也同样有chunksize参数
2、iterator=True
import pandas as pd
reader = pd.read_csv("pff_GEN_NUCHANGE.csv", iterator=True)
loop = True
chunkSize = 100000
chunks = []
while loop:
try:
chunk = reader.get_chunk(chunkSize)
chunks.append(chunk)
except StopIteration:
loop = False
print ("Iteration is stopped.")
pff_AA_df = pd.concat(chunks, ignore_index=True) 相关推荐
三石 2020-10-30
roamer 2020-10-29
三石 2020-10-29
wangquannuaa 2020-10-15
wangquannuaa 2020-09-29
jzlixiao 2020-09-15
wangquannuaa 2020-08-30
三石 2020-08-23
逍遥友 2020-08-21
jzlixiao 2020-08-18
wangquannuaa 2020-08-17
QianYanDai 2020-08-16
cjsyrwt 2020-08-14
jzlixiao 2020-07-29
xirongxudlut 2020-07-20
mmmjyjy 2020-07-16
QianYanDai 2020-07-05
QianYanDai 2020-07-05
june0 2020-07-04