Python

Python pandas read_csv

By ayed_amira , on 02/16/2020 , updated on 09/10/2020 , 1 comment - 3 minutes to read
pandas python

Python pandas read_csv : Pandas read_csv() method is used to read CSV file (Comma-separated value) into DataFrame object. The CSV format is an open text format representing tabular data as comma-separated values.

Pandas module is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.


1. Python Pandas read_csv() Example

We will use the csv file which is available here. The file has the following content:

dataset example pandas

To read this file with pandas, we can use the pandas read_csv() function:

import pandas

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv')

print(df)

By default, the separator for a CSV file is a comma. There is an option in the read_csv() function that allows you to change the default separator.

import pandas

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv',sep='^')

print(df)

As you can see below, you just need to add the <set> option with the separator used in your file.


2. Read only the columns you need from the CSV file – Option usecols 

The usecols option allows you to specify the desired columns of the CSV file. This is very useful when the CSV file contains a large number of columns, it allows you to reduce the final size of your dataframe.

import pandas

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', usecols=['cdatetime', 'address'])

print(df)

3. Specify the line that contains the header row – Option header

This option is interesting when the header is not on the first line. This has the effect of eliminating all the lines before the header’s one.

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', header=2)

 4. Reading CSV File without Header – Option header=None

There are some files that don’t contain any headers. For these files it is possible to use the header=None option to specify when reading the file that the file does not contain a header.

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', header=None)

5. Retrieve only certain rows from the CSV file – Option skiprows

df = pandas.read_csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv', skiprows=[15, 20])

There are many other options for the read_csv() function that I won’t go into detail in this post. I’ve shared with you the main functions of the package that we use in general. If you need more information about this function. I advise you to have a look at the documentation of the pandas module : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

If you want to know how to retrieve the names of the columns of a dataframe, have a look at this article : https://amiradata.com/pandas-get-column-names/

To leave on a note of sweetness, I invite you to watch this video ! 😀

https://www.youtube.com/watch?v=wAEzpwvrveg

ayed_amira

I'm a data scientist. Passionate about new technologies and programming I created this website mainly for people who want to learn more about data science and programming :)

Comments

Leave a comment

Your comment will be revised by the site if needed.