Pandas drop column : If you work in data science and python, you should be familiar with the python pandas library; Pandas development started in 2008 with lead developer Wes McKinney and the library has become a standard for data analysis and management using Python. Mastering the pandas library is essential for professionals working in data science on Python or people looking to automate a data process (mainly data engineers).
In this article we will talk about deleting one or more in a dataframe. If you want to know more about python I advise you to read the Learning python section.
How to drop one or more columns in Pandas Dataframe?
When working with a Pandas dataframe, we sometimes want to delete a column or several columns depending on what we need in our project. Columns are generally desired if they are not needed for further analysis. The Pandemic Delete function allows you to delete one or more columns from a data frame very easily.
Let’s look at some examples of deleting columns from a dataset. We are going to recover a dataset present on the internet. We will load it using the read_csv() function. If you wish to have more information about the read_csv() function, you can read the article https://amiradata.com/python-pandas-read_csv/ about the main features of this function.
import pandas as pd data = pd.read_csv("http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv") data.head()
This file contains 7584 rows and 9 columns which are :
How to delete a column in a Pandas dataframe?
To delete a single column in a pandas dataframe, We need to provide the name of the column to be deleted in the form of a list as an argument to the function. The variable we want to delete is called “longitude“. The pandas drop function can delete a column or a row. To specify that we want to delete the “longitude” column, we need to provide axis = 1 as another argument to the drop function.
# pandas drop a column with drop function data = data.drop(['longitude'], axis=1)
The dataset will contain more than 8 columns (the longitude column has been removed from the dataframe).
How to delete several columns from a Pandas dataframe?
The Pandas drop function can also be used to delete multiple columns. To delete several columns, simply give all the names of the columns we want to delete as a list. Here is an example of deleting 4 columns from the previous data frame.
# pandas drop columns using list of column names data = data.drop(['longitude', 'latitude', 'ucr_ncic_code','beat'], axis=1)
The dataframe has indeed been modified as shown in the following result :
How to drop row from a data frame?
You can also use the drop function to delete lines from a data frame. To delete one or more lines of a Pandas data frame, we must specify the line indexes to be deleted and the axis argument = 0.
In the example below, we will remove rows 2 and 4 from our dataframe :
I advise you to read this article if you wish to have more information about the drop function : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html
If you want to know more about data science with python, you can buy this book (As an Amazon Partner, I make a profit on qualifying purchases) :
See you soon ! 🙂