PySpark reduceByKey With Example

PySpark reduceByKey : In this tutorial we will learn how to use the reducebykey function in spark. If you want to learn more about spark, you can read this book : (As an Amazon Partner, I make a profit on qualifying purchases) : Introduction The reduceByKey() function only applies to RDDs that contain key and… Continue reading PySpark reduceByKey With Example

Published
Categorized as Python

How to Convert Pyspark Dataframe to Pandas

In this tutorial we will see how to convert a pyspark dataframe into a pandas using the toPandas() function. Introduction After having processed the data in PySpark, we sometimes have to reconvert our pyspark dataframe to use some machine learning applications (indeed some machine learning models are not implemented in pyspark, for example XGBoost). However,… Continue reading How to Convert Pyspark Dataframe to Pandas

Published
Categorized as Python

PySpark lit() Function to Add a Literal or Constant Column to Dataframe

PySpark lit() : In this tutorial we will see how to use the pyspark.sql.functions.lit() in Spark SQL. Introduction The lit() function present in Pyspark is used to add a new column in a Pyspark Dataframe by assigning a constant or literal value. The syntax of the function is as follows: The function is available when… Continue reading PySpark lit() Function to Add a Literal or Constant Column to Dataframe

Published
Categorized as Python