I have a dataframe df :
20060930 10.103 NaN 10.103 7.981 20061231 15.915 NaN 15.915 12.686 20070331 3.196 NaN 3.196 2.710 20070630 7.907 NaN 7.907 6.459
Then I want to select rows with certain sequence numbers which indicated in a list, suppose here is [1,3], then left:
20061231 15.915 NaN 15.915 12.686 20070630 7.907 NaN 7.907 6.459
How or what function can do that ?
ind_list = [1, 3] df.ix[ind_list]
should do the trick!
When I index with data frames I always use the .ix() method. Its so much easier and more flexible…
This is no longer the accepted method for indexing. The
ix method is deprecated. Use
.iloc for integer based indexing and
.loc for label based indexing.
you can also use iloc:
This will not work if the indexes in your dataframe do not correspond to the order of the rows due to prior computations. In that case use:
… as suggested in other responses.
Another way (although it is a longer code) but it is faster than the above codes. Check it using %timeit function:
PS: You figure out the reason
For large datasets, it is memory efficient to read only selected rows via the
pred = lambda x: x not in [1, 3] pd.read_csv("data.csv", skiprows=pred, index_col=0, names=...)
This will now return a DataFrame from a file that skips all rows except 1 and 3.
From the docs:
skiprows: list-like or integer or callable, default
If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be
lambda x: x in [0, 2]
There are many ways of solving this problem, and the ones listed above are the most commonly used ways of achieving the solution. I want to add two more ways, just in case someone is looking for an alternative.
index_list = [1,3] df.take(pos) #or df.query('index in @index_list')