hahadsg's note

Follow me on GitHub

Trick


to clipboard

from pandas.io import clipboard
clipboard.clipboard_set(text)

flatten multiIndex

['_'.join(col) for col in pkg_pivot.columns.values]

.ix deprecated

df.iloc[:, [df.columns.get_loc('target')]]

explode list

df[target_column].apply(lambda x: x.split(',')) \
    .apply(pd.Series) \
    .stack() \
    .reset_index(level=1, drop=True) \
    .to_frame(new_name) \
    .join(df)

read/write

write to excel multi-sheet

df1 = pd.DataFrame({'Data': [11, 12, 13, 14]})
df2 = pd.DataFrame({'Data': [21, 22, 23, 24]})
df3 = pd.DataFrame({'Data': [31, 32, 33, 34]})

writer = pd.ExcelWriter('./test.xlsx', engine='xlsxwriter')

df1.to_excel(writer, sheet_name='Sheet1')
df2.to_excel(writer, sheet_name='Sheet2')
df3.to_excel(writer, sheet_name='Sheet3')

writer.save()

Problem


关于copy

  • copy() 只会拷贝index和data,如果data是object不会进行深度拷贝
In [1]: import pandas as pd

In [2]: class test(object):
   ...:     def __init__(self, x):
   ...:         self.x = x
   ...: 

In [3]: df = pd.DataFrame({'A': [test(1), test(3)]})

In [4]: df
Out[4]: 
                                       A
0  <__main__.test object at 0x105aec908>
1  <__main__.test object at 0x105aec940>

copy()只会对index和data进行拷贝,如果data是object不会进行深度拷贝

In [5]: df2 = df.copy() 

In [6]: df2
Out[6]: 
                                       A
0  <__main__.test object at 0x105aec908>
1  <__main__.test object at 0x105aec940>

用python自带的包进行deepcopy

In [7]: import copy

In [8]: df3 = copy.deepcopy(df)

In [9]: df3
Out[9]:
A
0 <__main__.test object at 0x105b0e1d0>
1 <__main__.test object at 0x105b0e080>

关于Warning: A value is trying to be set on a copy of a slice from a DataFrame.

  • 这个问题很奇怪,如果df2是df的拷贝,那就不应该报警告;而df2是df的引用,那df2修改时df没有修改···
In [18]: df = pd.DataFrame({
   ....:         'A': np.random.randn(5), 
   ....:         'B': np.random.randn(5), 
   ....:         'C': np.random.randn(5), 
   ....:     })

In [19]: df
Out[19]: 
          A         B         C
0  0.778093  0.534797 -0.271507
1  1.536066  0.078380  0.028316
2 -0.429115 -2.397360 -0.766024
3 -0.154933 -0.045179  0.350458
4 -0.520098  1.142862  0.461002

In [20]: df2 = df.loc[df['A'] < 0.1, :]

In [21]: df2
Out[21]: 
          A         B         C
2 -0.429115 -2.397360 -0.766024
3 -0.154933 -0.045179  0.350458
4 -0.520098  1.142862  0.461002

In [22]: df2['B'] = 1
/Library/Frameworks/Python.framework/Versions/3.5/bin/ipython:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  #!/Library/Frameworks/Python.framework/Versions/3.5/bin/python3.5

In [23]: df2
Out[23]: 
          A  B         C
2 -0.429115  1 -0.766024
3 -0.154933  1  0.350458
4 -0.520098  1  0.461002

In [24]: df
Out[24]: 
          A         B         C
0  0.778093  0.534797 -0.271507
1  1.536066  0.078380  0.028316
2 -0.429115 -2.397360 -0.766024
3 -0.154933 -0.045179  0.350458
4 -0.520098  1.142862  0.461002

关于groupby.apply

  • 很奇怪 groupby后apply会对每一行都进行计算
In [4]: df = pd.DataFrame({'a':[1,1,3], 'b':[2,3,4]})

In [5]: def test(x):
   ...:     print(x)
   ...:     return x.sum()
   ...: 

In [6]: df.groupby('a').apply(test)
   a  b
0  1  2
1  1  3
   a  b
0  1  2
1  1  3
   a  b
2  3  4
Out[6]: 
   a  b
a      
1  2  5
3  3  4

关于pivot_table

  • 如果value中又空值,而aggfunc又不能处理空值,则会出错
In [9]: a = pd.DataFrame({
   ...:     'a': [1, 2, 1, 2, 2],
   ...:     'b': [1, 1, 2, 2, 2],
   ...:     'c': ['1', '1', '1', '1', None],
   ...: })
   ...: 
In [12]: a.pivot_table('c', 'a', 'b', aggfunc=lambda s: '+'.join(s)) # 出错了
Out[12]: 
b      1      2
a              
1  a+b+c  a+b+c
2  a+b+c  a+b+c

In [13]: a.pivot_table('c', 'a', 'b', aggfunc=lambda s: '+'.join(s.fillna(''))) # 没问题
Out[13]: 
b  1   2
a       
1  1   1
2  1  1+