Question :
I have a list of tuples like
data = [
('r1', 'c1', avg11, stdev11),
('r1', 'c2', avg12, stdev12),
('r2', 'c1', avg21, stdev21),
('r2', 'c2', avg22, stdev22)
]
and I would like to put them into a pandas DataFrame with rows named by the first column and columns named by the 2nd column. It seems the way to take care of the row names is something like pandas.DataFrame([x[1:] for x in data], index = [x[0] for x in data])
but how do I take care of the columns to get a 2×2 matrix (the output from the previous set is 3×4)? Is there a more intelligent way of taking care of row labels as well, instead of explicitly omitting them?
EDIT It seems I will need 2 DataFrames – one for averages and one for standard deviations, is that correct? Or can I store a list of values in each “cell”?
Answer #1:
You can pivot your DataFrame after creating:
>>> df = pd.DataFrame(data)
>>> df.pivot(index=0, columns=1, values=2)
# avg DataFrame
1 c1 c2
0
r1 avg11 avg12
r2 avg21 avg22
>>> df.pivot(index=0, columns=1, values=3)
# stdev DataFrame
1 c1 c2
0
r1 stdev11 stdev12
r2 stdev21 stdev22
Answer #2:
I submit that it is better to leave your data stacked as it is:
df = pandas.DataFrame(data, columns=['R_Number', 'C_Number', 'Avg', 'Std'])
# Possibly also this if these can always be the indexes:
# df = df.set_index(['R_Number', 'C_Number'])
Then it’s a bit more intuitive to say
df.set_index(['R_Number', 'C_Number']).Avg.unstack(level=1)
This way it is implicit that you’re seeking to reshape the averages, or the standard deviations. Whereas, just using pivot
, it’s purely based on column convention as to what semantic entity it is that you are reshaping.
Answer #3:
This is what I expected to see when I came to this question:
#!/usr/bin/env python
import pandas as pd
df = pd.DataFrame([(1, 2, 3, 4),
(5, 6, 7, 8),
(9, 0, 1, 2),
(3, 4, 5, 6)],
columns=list('abcd'),
index=['India', 'France', 'England', 'Germany'])
print(df)
gives
a b c d
India 1 2 3 4
France 5 6 7 8
England 9 0 1 2
Germany 3 4 5 6