### Question :

I have a huge CSV with many tables with many rows. I would like to simply split each dataframe into 2 if it contains more than 10 rows.

If true, I would like the first dataframe to contain the first 10 and the rest in the second dataframe.

Is there a convenient function for this? I’ve looked around but found nothing useful…

i.e. `split_dataframe(df, 2(if > 10))`

?

##
Answer #1:

This will return the split DataFrames if the condition is met, otherwise return the original and `None`

(which you would then need to handle separately). Note that this assumes the splitting only has to happen one time per `df`

and that the second part of the split (if it is longer than 10 rows (meaning that the original was longer than 20 rows)) is OK.

```
df_new1, df_new2 = df[:10, :], df[10:, :] if len(df) > 10 else df, None
```

Note you can also use `df.head(10)`

and `df.tail(len(df) - 10)`

to get the front and back according to your needs. You can also use various indexing approaches: you can just provide the first dimensions index if you want, such as `df[:10]`

instead of `df[:10, :]`

(though I like to code explicitly about the dimensions you are taking). You can can also use `df.iloc`

and `df.ix`

to index in similar ways.

Be careful about using `df.loc`

however, since it is label-based and the input will never be interpreted as an integer position. `.loc`

would only work “accidentally” in the case when you happen to have index labels that are integers starting at 0 with no gaps.

But you should also consider the various options that pandas provides for dumping the contents of the DataFrame into HTML and possibly also LaTeX to make better designed tables for the presentation (instead of just copying and pasting). Simply Googling how to convert the DataFrame to these formats turns up lots of tutorials and advice for exactly this application.

##
Answer #2:

I used a List Comprehension to cut a huge DataFrame into blocks of 100’000:

```
size = 100000
list_of_dfs = [df.loc[i:i+size-1,:] for i in range(0, len(df),size)]
```

or as generator:

```
list_of_dfs = (df.loc[i:i+size-1,:] for i in range(0, len(df),size))
```

##
Answer #3:

There is no specific convenience function.

You’d have to do something like:

```
first_ten = pd.DataFrame()
rest = pd.DataFrame()
if df.shape[0] > 10: # len(df) > 10 would also work
first_ten = df[:10]
rest = df[10:]
```

##
Answer #4:

A method based on `np.split`

:

```
df = pd.DataFrame({ 'A':[2,4,6,8,10,2,4,6,8,10],
'B':[10,-10,0,20,-10,10,-10,0,20,-10],
'C':[4,12,8,0,0,4,12,8,0,0],
'D':[9,10,0,1,3,np.nan,np.nan,np.nan,np.nan,np.nan]})
listOfDfs = [df.loc[idx] for idx in np.split(df.index,5)]
```

A small function that uses a modulo could take care of cases where the split is not even (e.g. `np.split(df.index,4)`

will throw an error).

(*Yes, I am aware that the original question was somewhat more specific than this. However, this is supposed to answer the question in the title.*)

##
Answer #5:

If you have a large data frame and need to divide into a variable number of sub data frames rows, like for example each sub dataframe has a max of 4500 rows, this script could help:

```
max_rows = 4500
dataframes = []
while len(df) > max_rows:
top = df[:max_rows]
dataframes.append(top)
df = df[max_rows:]
else:
dataframes.append(df)
```

You could then save out these data frames:

```
for _, frame in enumerate(dataframes):
frame.to_csv(str(_)+'.csv', index=False)
```

Hope this helps someone!

##
Answer #6:

Below is a simple function implementation which splits a DataFrame to chunks and a few code examples:

```
import pandas as pd
def split_dataframe_to_chunks(df, n):
df_len = len(df)
count = 0
dfs = []
while True:
if count > df_len-1:
break
start = count
count += n
#print("%s : %s" % (start, count))
dfs.append(df.iloc[start : count])
return dfs
# Create a DataFrame with 10 rows
df = pd.DataFrame([i for i in range(10)])
# Split the DataFrame to chunks of maximum size 2
split_df_to_chunks_of_2 = split_dataframe_to_chunks(df, 2)
print([len(i) for i in split_df_to_chunks_of_2])
# prints: [2, 2, 2, 2, 2]
# Split the DataFrame to chunks of maximum size 3
split_df_to_chunks_of_3 = split_dataframe_to_chunks(df, 3)
print([len(i) for i in split_df_to_chunks_of_3])
# prints [3, 3, 3, 1]
```

##
Answer #7:

You can use the DataFrame head and tail methods as syntactic sugar instead of slicing/loc here. I use a split size of 3; for your example use headSize=10

```
def split(df, headSize) :
hd = df.head(headSize)
tl = df.tail(len(df)-headSize)
return hd, tl
df = pd.DataFrame({ 'A':[2,4,6,8,10,2,4,6,8,10],
'B':[10,-10,0,20,-10,10,-10,0,20,-10],
'C':[4,12,8,0,0,4,12,8,0,0],
'D':[9,10,0,1,3,np.nan,np.nan,np.nan,np.nan,np.nan]})
# Split dataframe into top 3 rows (first) and the rest (second)
first, second = split(df, 3)
```

##
Answer #8:

The method based on list comprehension and `groupby`

, which stores all the split dataframes in a list variable and can be accessed using the index.

Example:

```
ans = [pd.DataFrame(y) for x, y in DF.groupby('column_name', as_index=False)]***
ans[0]
ans[0].column_name
```