Pyspark parallelize – Create RDD from a list collection

Pyspark parallelize: In this tutorial, we will see how to use the parallelize() function to create an RDD from a python list. Introduction The pyspark parallelize() function is a SparkContext function that creates an RDD from a python list. An RDD (Resilient Distributed Datasets) is a Pyspark data structure, it represents a collection of immutable… Continue reading Pyspark parallelize – Create RDD from a list collection

Published
Categorized as Python

Pyspark Concat – Concatenate two columns in pyspark

Pyspark concat: In this tutorial we will learn how to concatenate two columns or more in a pyspark dataframe. If you want to learn more about spark, you can read this book : (As an Amazon Partner, I make a profit on qualifying purchases) : Introduction To concatenate several columns from a dataframe, pyspark.sql.functions provides… Continue reading Pyspark Concat – Concatenate two columns in pyspark

Published
Categorized as Python

Pygame Draw Objects and Shapes

Pygame Draw : In this tutorial, we will see how to create simple geometric figures with the Draw function of the Pygame module. Introduction Pygame is a free cross-platform library that facilitates the development of real-time video games using the Python programming language. It allows you to program the multimedia part (graphics, sound and keyboard,… Continue reading Pygame Draw Objects and Shapes

Published
Categorized as Python

PySpark Rename Column on PySpark Dataframe (Single or Multiple Column)

PySpark Rename Column : In this turorial we will see how to rename one or more columns in a pyspark dataframe and the different ways to do it. Introduction In many occasions, it may be necessary to rename a Pyspark dataframe column. For example, when reading a file and the headers do not correspond to… Continue reading PySpark Rename Column on PySpark Dataframe (Single or Multiple Column)

Published
Categorized as Python