DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Online Practice Questions and Answers

Questions 4

Which of the following code blocks returns a copy of DataFrame transactionsDf where the column storeId has been converted to string type?

A. transactionsDf.withColumn("storeId", convert("storeId", "string"))

B. transactionsDf.withColumn("storeId", col("storeId", "string"))

C. transactionsDf.withColumn("storeId", col("storeId").convert("string"))

D. transactionsDf.withColumn("storeId", col("storeId").cast("string"))

E. transactionsDf.withColumn("storeId", convert("storeId").as("string"))

Buy Now

Questions 5

Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

A. transactionsDf.drop(["predError", "value"])

B. transactionsDf.drop("predError", "value")

C. transactionsDf.drop(col("predError"), col("value"))

D. transactionsDf.drop(predError, value)

E. transactionsDf.drop("predError and value")

Buy Now

Questions 6

Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

A. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))

B. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-ddHH:mm:ss"))

C. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) 2.dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))

D. 1.dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) 2.dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-ddHH:mm:ss"))

E. 1.dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

Buy Now

Questions 7

Which of the following code blocks generally causes a great amount of network traffic?

A. DataFrame.select()

B. DataFrame.coalesce()

C. DataFrame.collect()

D. DataFrame.rdd.map()

E. DataFrame.count()

Buy Now

Questions 8

The code block displayed below contains multiple errors. The code block should remove column transactionDate from DataFrame transactionsDf and add a column transactionTimestamp in which

dates that are expressed as strings in column transactionDate of DataFrame transactionsDf are converted into unix timestamps. Find the errors.

Sample of DataFrame transactionsDf:

1.+-------------+---------+-----+-------+---------+----+----------------+

4.| 1| 3| 4| 25| 1|null|2020-04-26 15:35|

5.| 2| 6| 7| 2| 2|null|2020-04-13 22:01|

6.| 3| 3| null| 25| 3|null|2020-04-02 10:53|

7.+-------------+---------+-----+-------+---------+----+----------------+

Code block:

1.transactionsDf = transactionsDf.drop("transactionDate")

2.transactionsDf["transactionTimestamp"] = unix_timestamp("transactionDate", "yyyy-MM- dd")

A. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment. Operator to_unixtime() should be used instead of unix_timestamp().

B. Column transactionDate should be dropped after transactionTimestamp has been written. The withColumn operator should be used instead of the existing column assignment. Column transactionDate should be wrapped in a col() operator.

C. Column transactionDate should be wrapped in a col() operator.

D. The string indicating the date format should be adjusted. The withColumnReplaced operator should be used instead of the drop and assign pattern in the code block to replace column transactionDate with the new column transactionTimestamp.

E. Column transactionDate should be dropped after transactionTimestamp has been written. The string indicating the date format should be adjusted. The withColumn operator should be used instead of the existing column assignment.

Buy Now

Questions 9

The code block shown below should return an exact copy of DataFrame transactionsDf that does not include rows in which values in column storeId have the value 25. Choose the answer that correctly fills the blanks in the code block to accomplish this.

A. transactionsDf.remove(transactionsDf.storeId==25)

B. transactionsDf.where(transactionsDf.storeId!=25)

C. transactionsDf.filter(transactionsDf.storeId==25)

D. transactionsDf.drop(transactionsDf.storeId==25)

E. transactionsDf.select(transactionsDf.storeId!=25)

Buy Now

Questions 10

The code block displayed below contains an error. The code block should display the schema of DataFrame transactionsDf. Find the error.

Code block:

transactionsDf.rdd.printSchema

A. There is no way to print a schema directly in Spark, since the schema can be printed easily through using print(transactionsDf.columns), so that should be used instead.

B. The code block should be wrapped into a print() operation.

C. PrintSchema is only accessible through the spark session, so the code block should be rewritten as spark.printSchema(transactionsDf).

D. PrintSchema is a method and should be written as printSchema(). It is also not callable through transactionsDf.rdd, but should be called directly from transactionsDf.

E. PrintSchema is a not a method of transactionsDf.rdd. Instead, the schema should be printed via transactionsDf.print_schema().

Buy Now

Questions 11

In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame transactionsDf and itemsDf on columns productId and itemId, respectively?

.filter(~isnull(col('value')))

.count()

transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))

transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')

.filter(col('value').isnotnull())

.sum(col('value'))

A. 4, 1, 2

B. 3, 1, 6

C. 3, 1, 2

D. 3, 5, 2

E. 4, 6

Buy Now

Questions 12

Which of the following code blocks performs an inner join between DataFrame itemsDf and DataFrame transactionsDf, using columns itemId and transactionId as join keys, respectively?

A. itemsDf.join(transactionsDf, "inner", itemsDf.itemId == transactionsDf.transactionId)

B. itemsDf.join(transactionsDf, itemId == transactionId)

C. itemsDf.join(transactionsDf, itemsDf.itemId == transactionsDf.transactionId, "inner")

D. itemsDf.join(transactionsDf, "itemsDf.itemId == transactionsDf.transactionId", "inner")

E. itemsDf.join(transactionsDf, col(itemsDf.itemId) == col(transactionsDf.transactionId))

Buy Now

Questions 13

The code block displayed below contains an error. The code block should return DataFrame transactionsDf, but with the column storeId renamed to storeNumber. Find the error.

Code block:

transactionsDf.withColumn("storeNumber", "storeId")

A. Instead of withColumn, the withColumnRenamed method should be used.

B. Arguments "storeNumber" and "storeId" each need to be wrapped in a col() operator.

C. Argument "storeId" should be the first and argument "storeNumber" should be the second argument to the withColumn method.

D. The withColumn operator should be replaced with the copyDataFrame operator.

E. Instead of withColumn, the withColumnRenamed method should be used and argument "storeId" should be the first and argument "storeNumber" should be the second argument to that method.

Buy Now

Exam Code: DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK

Exam Name: Databricks Certified Associate Developer for Apache Spark 3.0

Last Update: Mar 02, 2025

Questions: 180

10%OFF Coupon Code: SAVE10

PDF (Q&A)

$49.99

ADD TO CART

VCE

$55.99

ADD TO CART

PDF + VCE

$65.99

ADD TO CART