Pickle can't dump 2GB+ file on MAC OS X

As a data scientist/engineer you may very likely work with big dataset.

If you’re trying to pickle big files over 2GB, you may encounter this on MAC OS X:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
OSError Traceback (most recent call last)
<ipython-input-5-5b2910a2f7f4> in <module>()
----> 1 df.to_pickle('test.pkl')
/Users/.../lib/python3.5/site-packages/pandas/core/generic.py in to_pickle(self, path)
1175 """
1176 from pandas.io.pickle import to_pickle
-> 1177 return to_pickle(self, path)
1178
1179 def to_clipboard(self, excel=None, sep=None, **kwargs):
/Users/.../lib/python3.5/site-packages/pandas/io/pickle.py in to_pickle(obj, path)
18 """
19 with open(path, 'wb') as f:
---> 20 pkl.dump(obj, f, protocol=pkl.HIGHEST_PROTOCOL)
21
22
OSError: [Errno 22] Invalid argument

This has been a long reported issue over a year, as you can find in Issue 24658, not exactly pickle related though.

Fortunately it’s fixed now but may need some time before release, one can either:

  1. patch the fix
  2. just split the target file to 2GB-
Zhanzhao Deo Liang wechat
欢迎关注我的个人订阅号: deoXdeo
今天的午餐全赖有你支持!