Given an input vector of features, a Random Forests model performs a classification task and ends in a tie. How does the model handle this outcome?
A. The model will be rebuilt
B. A winner is chosen at random
C. The tree that caused the tie is discarded
D. One more tree is added to the forest
The naive Bayer classifier is trained over 1600 movie reviews and then tested over 400 reviews.
Here is the resulting confusion matrix:
190 (TP) 10(FN)
80 (FP) 120(TN)
What are the precision, recall, and the F1-score values?
A. Precision0.95; Recall: 0704; F1-score: 0.809
B. Precision 0.613, Recall: 0.95, F1-score: 0.745
C. Precision 0.704, Recall: 0.95; F1-score: 0.809
D. Precision 0.95; Recall: 0.613; F1-score: 0.745
What is an ideal use case for HDFS?
A. Storing files that are updated frequently
B. Storing files that are written once and read many times
C. Storing results between Map steps and Reduce steps
D. Storing application files in memory
What advantage does replication provide while storing a file in HDFS?
A. Data protection and scheduling flexibility
B. Elimination of requirement for a combiner process
C. Elimination of requirement for Shuffle and Sort process
D. Memory optimization and minimizing tasks to run
What are two visualization tools used for trivariate data?
A. Scatter plot matrix
B. Hexbin plot and heatmap
C. Scatter plot matrix and density plot
D. Scatter plot matrix and heatmap
What is a characteristic of the trigram language model?
A. Based on the second-order Markov process
B. Equivalent to trigram hidden Markov models
C. Uses smoothing to reduce the high dimensionality in text
D. Can be used for part-of -speech tagging
What elements are needed to determine the time complexity of finding all the cliques of size k in social network analysis?
A. Eigenvector centrality and betwenness
B. Clique size and total number of nodes in the network
C. Number of edges in the network and centrality measure of the cliques
D. Clique size and betweenness centrality
In the graph, which edge would be considered a weak lie? Refer to the exhibit.
A. C-E
B. E-F
C. B-C
D. G-l
A simul-ation to compare two different sales models yields different results for the same set of input
variables in different runs.
What is the likely cause?
A. bit operating system was used
B. The same number of trials was used.
C. A linear congruenlial generator (LCG) was used for pseudo-random number generation.
D. Different seeds forthe random number generator were used
What is a characteristic of spark?
A. Unable to run map -> reduce execution plans
B. Supports applications written in Python, Java, and Scala
C. Less efficient processing small files than Hadoop MapReduce
D. Supports workflows that can return to previous work steps