pandas convert some columns into rows

Posted on

Solving problem is about exposing yourself to as many situations as possible like pandas convert some columns into rows and practice these strategies over and over. With time, it becomes second nature and a natural way you approach any problems in general. Big or small, always start with a plan, use other strategies mentioned here till you are confident and ready to code the solution.
In this post, my aim is to share an overview the topic about pandas convert some columns into rows, which can be followed any time. Take easy to follow this discuss.

pandas convert some columns into rows

So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like

location    name    Jan-2010    Feb-2010    March-2010
A           "test"  12          20          30
B           "foo"   18          20          25

What I would like is for it to look like

location    name    Date        Value
A           "test"  Jan-2010    12
A           "test"  Feb-2010    20
A           "test"  March-2010  30
B           "foo"   Jan-2010    18
B           "foo"   Feb-2010    20
B           "foo"   March-2010  25

problem is I don’t know how many dates are in the column (though I know they will always start after name)

Asked By: Wizuriel

||

Answer #1:

UPDATE
From v0.20, melt is a first order function, you can now use

df.melt(id_vars=["location", "name"],
        var_name="Date",
        value_name="Value")
  location    name        Date  Value
0        A  "test"    Jan-2010     12
1        B   "foo"    Jan-2010     18
2        A  "test"    Feb-2010     20
3        B   "foo"    Feb-2010     20
4        A  "test"  March-2010     30
5        B   "foo"  March-2010     25

OLD(ER) VERSIONS: <0.20

You can use pd.melt to get most of the way there, and then sort:

>>> df
  location  name  Jan-2010  Feb-2010  March-2010
0        A  test        12        20          30
1        B   foo        18        20          25
>>> df2 = pd.melt(df, id_vars=["location", "name"],
                  var_name="Date", value_name="Value")
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
1        B   foo    Jan-2010     18
2        A  test    Feb-2010     20
3        B   foo    Feb-2010     20
4        A  test  March-2010     30
5        B   foo  March-2010     25
>>> df2 = df2.sort(["location", "name"])
>>> df2
  location  name        Date  Value
0        A  test    Jan-2010     12
2        A  test    Feb-2010     20
4        A  test  March-2010     30
1        B   foo    Jan-2010     18
3        B   foo    Feb-2010     20
5        B   foo  March-2010     25

(Might want to throw in a .reset_index(drop=True), just to keep the output clean.)

Note: pd.DataFrame.sort has been deprecated in favour of pd.DataFrame.sort_values.

Answered By: DSM

Answer #2:

Use set_index with stack for MultiIndex Series, then for DataFrame add reset_index with rename:

df1 = (df.set_index(["location", "name"])
         .stack()
         .reset_index(name='Value')
         .rename(columns={'level_2':'Date'}))
print (df1)
  location  name        Date  Value
0        A  test    Jan-2010     12
1        A  test    Feb-2010     20
2        A  test  March-2010     30
3        B   foo    Jan-2010     18
4        B   foo    Feb-2010     20
5        B   foo  March-2010     25
Answered By: jezrael

Answer #3:

pd.wide_to_long

You can add a prefix to your year columns and then feed directly to pd.wide_to_long. I won’t pretend this is efficient, but it may in certain situations be more convenient than pd.melt, e.g. when your columns already have an appropriate prefix.

df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f'Value{x}')))
res = pd.wide_to_long(df, stubnames=['Value'], i='name', j='Date').reset_index()
        .sort_values(['location', 'name'])
print(res)
   name        Date location  Value
0  test    Jan-2010        A     12
2  test    Feb-2010        A     20
4  test  March-2010        A     30
1   foo    Jan-2010        B     18
3   foo    Feb-2010        B     20
5   foo  March-2010        B     25
Answered By: jpp

Answer #4:

I guess I found a simpler solution

temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')

Concat whole temp1 with temp2‘s column name

temp1['new_column'] = temp2['name']

You now have what you asked for.

Answered By: Prometheus
The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .

Leave a Reply

Your email address will not be published.