Python pandas read_csv
Python pandas read_csv : Pandas read_csv() method is used to read CSV file (Comma-separated value) into DataFrame object. The CSV format is an open text format representing tabular data as comma-separated values.
Pandas module is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
1. Python Pandas read_csv() Example
We will use the csv file which is available here. The file has the following content:
To read this file with pandas, we can use the pandas read_csv() function:
import pandas df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv') print(df)
By default, the separator for a CSV file is a comma. There is an option in the read_csv() function that allows you to change the default separator.
import pandas df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv',sep='^') print(df)
As you can see below, you just need to add the <set> option with the separator used in your file.
2. Read only the columns you need from the CSV file – Option
The usecols option allows you to specify the desired columns of the CSV file. This is very useful when the CSV file contains a large number of columns, it allows you to reduce the final size of your dataframe.
import pandas df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', usecols=['cdatetime', 'address']) print(df)
3. Specify the line that contains the header row – Option
This option is interesting when the header is not on the first line. This has the effect of eliminating all the lines before the header’s one.
df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', header=2)
4. Reading CSV File without Header – Option
There are some files that don’t contain any headers. For these files it is possible to use the header=None option to specify when reading the file that the file does not contain a header.
df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', header=None)
5. Retrieve only certain rows from the CSV file – Option
df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', skiprows=[15, 20])
There are many other options for the read_csv() function that I won’t go into detail in this post. I’ve shared with you the main functions of the package that we use in general. If you need more information about this function. I advise you to have a look at the documentation of the pandas module : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html
If you want to know how to retrieve the names of the columns of a dataframe, have a look at this article : https://amiradata.com/pandas-get-column-names/
To leave on a note of sweetness, I invite you to watch this video ! 😀