You have installed a cluster running HDFS and MapReduce version 2 (MRv2) on YARN. You have no afs.hosts entry()ies in your hdfs-alte.xml configuration file. You configure a new worker node by setting fs.default.name in its configuration files to point to the NameNode on your cluster, and you start the DataNode daemon on that worker node.
What do you have to do on the cluster to allow the worker node to join, and start storing HDFS blocks?
A. Nothing; the worker node will automatically join the cluster when the DataNode daemon is started.
B. Without creating a dfs.hosts file or making any entries, run the command hadoop dfsadmin refreshHadoop on the NameNode
C. Create a dfs.hosts file on the NameNode, add the worker node's name to it, then issue the command hadoop dfsadmin refreshNodes on the NameNode
D. Restart the NameNode
Which Yarn daemon or service monitors a Container's per-application resource usage (e.g, memory, CPU)?
A. NodeManager
B. ApplicationMaster
C. ApplicationManagerService
D. ResourceManager
Which two are Features of Hadoop's rack topology?
A. Configuration of rack awareness is accomplished using a configuration file. You cannot use a rack topology script.
B. Even for small clusters on a single rack, configuring rack awareness will improve performance.
C. Rack location is considered in the HDFS block placement policy
D. HDFS is rack aware but MapReduce daemons are not
E. Hadoop gives preference to Intra rack data transfer in order to conserve bandwidth
You are configuring a cluster running HDFS, MapReduce version 2 (MRv2) on YARN running Linux. How must you format the underlying filesystem of each DataNode?
A. They must not formatted - - HDFS will format the filesystem automatically
B. They may be formatted in any Linux filesystem
C. They must be formatted as HDFS
D. They must be formatted as either ext3 or ext4
Your cluster's mapped-site.xml includes the following parameters
And your cluster's yarn-site.xml includes the following parameters
What is the maximum amount of virtual memory allocated for each map before YARN will kill its Container?
A. 4 GB
B. 17.2 GB
C. 24.6 GB
D. 8.2 GB
Your cluster implements HDFS High Availability (HA). Your two NameNodes are named nn01 and nn02. What occurs when you execute the command: hdfs haadmin failover nn01 nn02
A. nn02 becomes the standby NameNode and nn01 becomes the active NameNode
B. nn02 is fenced, and nn01 becomes the active NameNode
C. nn01 becomes the standby NamNode and nn02 becomes the active NAmeNode
D. nn01 is fenced, and nn02 becomes the active NameNode
You have a cluster running with the Fair Scheduler enabled. There are currently no jobs running on the cluster, and you submit a job A, so that only job A is running on the cluster. A while later, you submit Job B. now job A and Job B are running on the cluster at the same time. How will the Fair Scheduler handle these two jobs?
A. When job A gets submitted, it consumes all the tasks slots.
B. When job A gets submitted, it doesn't consume all the task slots
C. When job B gets submitted, Job A has to finish first, before job B can scheduled
D. When job B gets submitted, it will get assigned tasks, while Job A continue to run with fewer tasks.
You observe that the number of spilled records from Map tasks far exceeds the number of map output records. Your child heap size is 1GB and your io.sort.mb value is set to 100 MB. How would you tune your io.sort.mb value to achieve maximum memory to disk I/O ratio?
A. Decrease the io.sort.mb value to 0
B. Increase the io.sort.mb to 1GB
C. For 1GB child heap size an io.sort.mb of 128 MB will always maximize memory to disk I/O
D. Tune the io.sort.mb value until you observe that the number of spilled records equals (or is as close to equals) the number of map output records
Which is the default scheduler in YARN?
A. Fair Scheduler
B. FIFO Scheduler
C. Capacity Scheduler
D. YARN doesn't configure a default scheduler. You must first assign a appropriate scheduler class in yarn-site.xml
A slave node in your cluster has four 2TB hard drives installed (4 x 2TB). The DataNode is configured to store HDFS blocks on the disks. You set the value of the dfs.datanode.du.reserved parameter to 100GB. How does this alter HDFS block storage?
A. A maximum of 100 GB on each hard drive may be used to store HDFS blocks
B. All hard drives may be used to store HDFS blocks as long as atleast 100 GB in total is available on the node
C. 100 GB on each hard drive may not be used to store HDFS blocks
D. 25 GB on each hard drive may not be used to store HDFS blocks