Question :
I’m trying to read in a CSV file into a pandas dataframe and select a column, but keep getting a key error.
The file reads in successfully and I can view the dataframe in an iPython notebook, but when I want to select a column any other than the first one, it throws a key error.
I am using this code:
import pandas as pd
transactions = pd.read_csv('transactions.csv',low_memory=False, delimiter=',', header=0, encoding='ascii')
transactions['quarter']
This is the file I’m working on:
https://www.dropbox.com/s/81iwm4f2hsohsq3/transactions.csv?dl=0
Thank you!
Answer #1:
use sep='s*,s*'
so that you will take care of spaces in column-names:
transactions = pd.read_csv('transactions.csv', sep=r's*,s*',
header=0, encoding='ascii', engine='python')
alternatively you can make sure that you don’t have unquoted spaces in your CSV file and use your command (unchanged)
prove:
print(transactions.columns.tolist())
Output:
['product_id', 'customer_id', 'store_id', 'promotion_id', 'month_of_year', 'quarter', 'the_year', 'store_sales', 'store_cost', 'unit_sales', 'fact_count']
Answer #2:
if you need to select multiple columns from dataframe use 2 pairs of square brackets
eg.
df[["product_id","customer_id","store_id"]]
Answer #3:
The key error generally comes if the key doesn’t match any of the dataframe column name ‘exactly’:
You could also try:
import csv
import pandas as pd
import re
with open (filename, "r") as file:
df = pd.read_csv(file, delimiter = ",")
df.columns = ((df.columns.str).replace("^ ","")).str.replace(" $","")
print(df.columns)