### Question :

Say I have a dictionary with 10 key-value pairs. Each entry holds a numpy array. However, the length of the array is not the same for all of them.

How can I create a dataframe where each column holds a different entry?

When I try:

```
pd.DataFrame(my_dict)
```

I get:

```
ValueError: arrays must all be the same length
```

Any way to overcome this? I am happy to have Pandas use `NaN`

to pad those columns for the shorter entries.

##
Answer #1:

**In Python 3.x:**

```
import pandas as pd
import numpy as np
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in d.items() ]))
Out[7]:
A B
0 1 1
1 2 2
2 NaN 3
3 NaN 4
```

**In Python 2.x:**

replace `d.items()`

with `d.iteritems()`

.

##
Answer #2:

Here’s a simple way to do that:

```
In[20]: my_dict = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
In[21]: df = pd.DataFrame.from_dict(my_dict, orient='index')
In[22]: df
Out[22]:
0 1 2 3
A 1 2 NaN NaN
B 1 2 3 4
In[23]: df.transpose()
Out[23]:
A B
0 1 1
1 2 2
2 NaN 3
3 NaN 4
```

##
Answer #3:

A way of tidying up your syntax, but still do essentially the same thing as these other answers, is below:

```
>>> mydict = {'one': [1,2,3], 2: [4,5,6,7], 3: 8}
>>> dict_df = pd.DataFrame({ key:pd.Series(value) for key, value in mydict.items() })
>>> dict_df
one 2 3
0 1.0 4 8.0
1 2.0 5 NaN
2 3.0 6 NaN
3 NaN 7 NaN
```

A similar syntax exists for lists, too:

```
>>> mylist = [ [1,2,3], [4,5], 6 ]
>>> list_df = pd.DataFrame([ pd.Series(value) for value in mylist ])
>>> list_df
0 1 2
0 1.0 2.0 3.0
1 4.0 5.0 NaN
2 6.0 NaN NaN
```

Another syntax for lists is:

```
>>> mylist = [ [1,2,3], [4,5], 6 ]
>>> list_df = pd.DataFrame({ i:pd.Series(value) for i, value in enumerate(mylist) })
>>> list_df
0 1 2
0 1 4.0 6.0
1 2 5.0 NaN
2 3 NaN NaN
```

You may additionally have to transpose the result and/or change the column data types (float, integer, etc).

##
Answer #4:

While this does not directly answer the OP’s question. I found this to be an excellent solution for my case when I had unequal arrays and I’d like to share:

```
In [31]: d = {'one' : Series([1., 2., 3.], index=['a', 'b', 'c']),
....: 'two' : Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
....:
In [32]: df = DataFrame(d)
In [33]: df
Out[33]:
one two
a 1 1
b 2 2
c 3 3
d NaN 4
```

##
Answer #5:

You can also use `pd.concat`

along `axis=1`

with a list of `pd.Series`

objects:

```
import pandas as pd, numpy as np
d = {'A': np.array([1,2]), 'B': np.array([1,2,3,4])}
res = pd.concat([pd.Series(v, name=k) for k, v in d.items()], axis=1)
print(res)
A B
0 1.0 1
1 2.0 2
2 NaN 3
3 NaN 4
```

##
Answer #6:

Both the following lines work perfectly :

```
pd.DataFrame.from_dict(df, orient='index').transpose() #A
pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in df.items() ])) #B (Better)
```

But with %timeit on Jupyter, I’ve got a ratio of 4x speed for B vs A, which is quite impressive especially when working with a huge data set (mainly with a big number of columns/features).

##
Answer #7:

## Use `pandas.DataFrame`

and `pandas.concat`

- The following code will create a
`list`

of`DataFrames`

with`pandas.DataFrame`

, from a`dict`

of uneven`arrays`

, and then`concat`

the arrays together in a list-comprehension.- This is a way to create a
`DataFrame`

of`arrays`

, that are not equal in length. - For equal length
`arrays`

, use`df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})`

- This is a way to create a

```
import pandas as pd
import numpy as np
# create the uneven arrays
mu, sigma = 200, 25
np.random.seed(365)
x1 = mu + sigma * np.random.randn(10, 1)
x2 = mu + sigma * np.random.randn(15, 1)
x3 = mu + sigma * np.random.randn(20, 1)
data = {'x1': x1, 'x2': x2, 'x3': x3}
# create the dataframe
df = pd.concat([pd.DataFrame(v, columns=[k]) for k, v in data.items()], axis=1)
```

## Use `pandas.DataFrame`

and `itertools.zip_longest`

- For iterables of uneven length,
`zip_longest`

fills missing values with the`fillvalue`

. - The zip generator needs to be unpacked, because the
`DataFrame`

constructor won’t unpack it.

```
from itertools import zip_longest
# zip all the values together
zl = list(zip_longest(*data.values()))
# create dataframe
df = pd.DataFrame(zl, columns=data.keys())
```

## plot

```
df.plot(marker='o', figsize=[10, 5])
```

## dataframe

```
x1 x2 x3
0 232.06900 235.92577 173.19476
1 176.94349 209.26802 186.09590
2 194.18474 168.36006 194.36712
3 196.55705 238.79899 218.33316
4 249.25695 167.91326 191.62559
5 215.25377 214.85430 230.95119
6 232.68784 240.30358 196.72593
7 212.43409 201.15896 187.96484
8 188.97014 187.59007 164.78436
9 196.82937 252.67682 196.47132
10 NaN 223.32571 208.43823
11 NaN 209.50658 209.83761
12 NaN 215.27461 249.06087
13 NaN 210.52486 158.65781
14 NaN 193.53504 199.10456
15 NaN NaN 186.19700
16 NaN NaN 223.02479
17 NaN NaN 185.68525
18 NaN NaN 213.41414
19 NaN NaN 271.75376
```

##
Answer #8:

If you don’t want it to show `NaN`

and you have two particular lengths, adding a ‘space’ in each remaining cell would also work.

```
import pandas
long = [6, 4, 7, 3]
short = [5, 6]
for n in range(len(long) - len(short)):
short.append(' ')
df = pd.DataFrame({'A':long, 'B':short}]
# Make sure Excel file exists in the working directory
datatoexcel = pd.ExcelWriter('example1.xlsx',engine = 'xlsxwriter')
df.to_excel(datatoexcel,sheet_name = 'Sheet1')
datatoexcel.save()
A B
0 6 5
1 4 6
2 7
3 3
```

If you have more than 2 lengths of entries, it is advisable to make a function which uses a similar method.