pyspark.sql.DataFrame.drop#
- DataFrame.drop(*cols)[source]#
- Returns a new - DataFramewithout specified columns. This is a no-op if the schema doesn’t contain the given column name(s).- New in version 1.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- cols: str or :class:`Column`
- A name of the column, or the - Columnto be dropped.
 
- Returns
 - Notes - When an input is a column name, it is treated literally without further interpretation. Otherwise, it will try to match the equivalent expression. So dropping a column by its name drop(colName) has a different semantic with directly dropping the column drop(col(colName)). 
 - Examples - Example 1: Drop a column by name. - >>> df = spark.createDataFrame( ... [(14, "Tom"), (23, "Alice"), (16, "Bob")], ["age", "name"]) >>> df.drop('age').show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+ - Example 2: Drop a column by - Columnobject.- >>> df.drop(df.age).show() +-----+ | name| +-----+ | Tom| |Alice| | Bob| +-----+ - Example 3: Drop the column that joined both DataFrames on. - >>> df2 = spark.createDataFrame([(80, "Tom"), (85, "Bob")], ["height", "name"]) >>> df.join(df2, df.name == df2.name).drop('name').sort('age').show() +---+------+ |age|height| +---+------+ | 14| 80| | 16| 85| +---+------+ - >>> df3 = df.join(df2) >>> df3.show() +---+-----+------+----+ |age| name|height|name| +---+-----+------+----+ | 14| Tom| 80| Tom| | 14| Tom| 85| Bob| | 23|Alice| 80| Tom| | 23|Alice| 85| Bob| | 16| Bob| 80| Tom| | 16| Bob| 85| Bob| +---+-----+------+----+ - Example 4: Drop two column by the same name. - >>> df3.drop("name").show() +---+------+ |age|height| +---+------+ | 14| 80| | 14| 85| | 23| 80| | 23| 85| | 16| 80| | 16| 85| +---+------+ - Example 5: Can not drop col(‘name’) due to ambiguous reference. - >>> from pyspark.sql import functions as sf >>> df3.drop(sf.col("name")).show() Traceback (most recent call last): ... pyspark.errors.exceptions.captured.AnalysisException: [AMBIGUOUS_REFERENCE] Reference... - Example 6: Can not find a column matching the expression “a.b.c”. - >>> from pyspark.sql import functions as sf >>> df4 = df.withColumn("a.b.c", sf.lit(1)) >>> df4.show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+ - >>> df4.drop("a.b.c").show() +---+-----+ |age| name| +---+-----+ | 14| Tom| | 23|Alice| | 16| Bob| +---+-----+ - >>> df4.drop(sf.col("a.b.c")).show() +---+-----+-----+ |age| name|a.b.c| +---+-----+-----+ | 14| Tom| 1| | 23|Alice| 1| | 16| Bob| 1| +---+-----+-----+