You are developing a proof of concept for a real-time fraud detection model. After undersampling the training set to achieve a 50% fraud rate, you train and tune a tree classifier using area under the curve (AUC) as the metric, and then calibrate the model. You need to share metrics that represent your model’s effectiveness with business stakeholders in a way that is easily interpreted. Which approach should you take?
Calculate the AUC on the holdout dataset at a classification threshold of 0.5, and report true positive rate, false positive rate, and false negative rate.
You need business directions about the cost of misclassification to define the optimal threshold for both balanced and imbalanced classification.
Undersample the minority class to achieve a 50% fraud rate in the holdout set. Plot the confusion matrix at a classification threshold of 0.5, and report precision and recall.
The holdout dataset needs to represent real-world transactions to have a meaningful model evaluation, and you should never change its distribution.
Select all transactions in the holdout dataset. Plot the area under the receiver operating characteristic curve (AUC ROC), and report the F1 score for all available thresholds.
Classes in the holdout dataset are not balanced, so the ROC curve is not appropriate; also, neither F1 score nor ROC curve is recommended for communicating to business stakeholders. The F1 score aggregates precision and recall, but it is important to look at each metric separately to evaluate the model’s performance when the cost of misclassification is highly unbalanced between labels.
Select all transactions in the holdout dataset. Plot the precision-recall curve with associated average precision, and report the true positive rate, false positive rate, and false negative rate for all available thresholds.
The precision-recall curve is an appropriate metric for imbalanced classification when the output can be set using different thresholds. Presenting the precision-recall curve together with the mentioned rates provides business stakeholders with all the information necessary to evaluate model performance.

Your organization’s marketing team wants to send biweekly scheduled emails to customers that are expected to spend above a variable threshold. This is the first machine learning (ML) use case for the marketing team, and you have been tasked with the implementation. After setting up a new Google Cloud project, you use Vertex AI Workbench to develop model training and batch inference with an XGBoost model on the transactional data stored in Cloud Storage. You want to automate the end-to-end pipeline that will securely provide the predictions to the marketing team, while minimizing cost and code maintenance. What should you do?
Create a scheduled pipeline on Vertex AI Pipelines that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.
Vertex AI Pipelines and Cloud Storage are cost-effective and secure solutions. The solution requires the least number of code interactions because the marketing team can update the pipeline and schedule parameters from the Google Cloud console.
Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, copies the data to BigQuery, uses BigQuery ML to perform training and batch prediction, and outputs a table in BigQuery with customer emails and expected spending.
Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. In addition, using BigQuery is not the most cost-effective solution.
Create a scheduled notebook on Vertex AI Workbench that accesses the data from Cloud Storage, performs training and batch prediction on the managed notebook instance, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.
The marketing team would have to enter the Vertex AI Workbench instance to update a pipeline parameter, which does not minimize code interactions.
Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and sends an email to the marketing team’s Gmail group email with an attachment that contains an encrypted list of all customer emails and expected spending.
Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. Also, using email to send personally identifiable information (PII) is not a recommended approach.

You have developed a very large network in TensorFlow Keras that is expected to train for multiple days. The model uses only built-in TensorFlow operations to perform training with high-precision arithmetic. You want to update the code to run distributed training using tf.distribute.Strategy and configure a corresponding machine instance in Compute Engine to minimize training time. What should you do?
Select an instance with an attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add MirroredStrategy to the code, and create the model in the strategy’s scope with batch size dependent on the number of replicas.
It is suboptimal in minimizing execution time for model training. MirroredStrategy only supports multiple GPUs on one instance, which may not be as performant as running on multiple instances.
Create an instance group with one instance with attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add TF_CONFIG and MultiWorkerMirroredStrategy to the code, create the model in the strategy’s scope, and set up data autosharding.
GPUs are the correct hardware for deep learning training with high-precision training, and distributing training with multiple instances will allow maximum flexibility in fine-tuning the accelerator selection to minimize execution time. Note that one worker could still be the best setting if the overhead of synchronizing the gradients across machines is too high, in which case this approach will be equivalent to MirroredStrategy.
Create a TPU virtual machine, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.
TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months.
Create a TPU node, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.
TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months. Also, TPU nodes are not recommended unless required by the application.

You developed a tree model based on an extensive feature set of user behavioral data. The model has been in production for 6 months. New regulations were just introduced that require anonymizing personally identifiable information (PII), which you have identified in your feature set using the Cloud Data Loss Prevention API. You want to update your model pipeline to adhere to the new regulations while minimizing a reduction in model performance. What should you do?
Redact the features containing PII data, and train the model from scratch.
Removing features from the model does not keep referential integrity by maintaining the original relationship between records, and is likely to cause a drop in performance.
Mask the features containing PII data, and tune the model from the last checkpoint.
Masking does not enforce referential integrity, and a drop in model performance may happen. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.
Use key-based hashes to tokenize the features containing PII data, and train the model from scratch.
Hashing is an irreversible transformation that ensures anonymization and does not lead to an expected drop in model performance because you keep the same feature set while enforcing referential integrity.
Use deterministic encryption to tokenize the features containing PII data, and tune the model from the last checkpoint.
Deterministic encryption is reversible, and anonymization requires irreversibility. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.

You set up a Vertex AI Workbench instance with a TensorFlow Enterprise environment to perform exploratory data analysis for a new use case. Your training and evaluation datasets are stored in multiple partitioned CSV files in Cloud Storage. You want to use TensorFlow Data Validation (TFDV) to explore problems in your data before model tuning. You want to fix these problems as quickly as possible. What should you do?
1. Use TFDV to generate statistics, and use Pandas to infer the schema for the training dataset that has been loaded from Cloud Storage.
2. Visualize both statistics and schema, and manually fix anomalies in the dataset’s schema and values.
You also need to use the evaluation dataset for analysis. If the features do not belong to approximately the same range as the training dataset, the accuracy of the model will be affected.
1. Use TFDV to generate statistics and infer the schema for the training and evaluation datasets that have been loaded from Cloud Storage by using URI.
2. Visualize statistics for both datasets simultaneously to fix the datasets’ values, and fix the training dataset’s schema after displaying it together with anomalies in the evaluation dataset.
It takes the minimum number of steps to correctly fix problems in the data with TFDV before model tuning. This process involves installing tensorflow_data_validation, loading the training and evaluation datasets directly from Cloud Storage, and fixing schema and values for both. Note that the schema is only stored for the training set because it is expected to match at evaluation.
1. Use TFDV to generate statistics, and use Pandas to infer the schema for the training dataset that has been loaded from Cloud Storage.
2. Use TFRecordWriter to convert the training dataset into a TFRecord.
3. Visualize both statistics and schema, and manually fix anomalies in the dataset’s schema and values.
Transforming into TFRecord is an unnecessary step. Also, you need to use the evaluation dataset for analysis. If the features do not belong to approximately the same range as the training dataset, the accuracy of the model will be affected.
1. Use TFDV to generate statistics and infer the schema for the training and evaluation datasets that have been loaded with Pandas.
2. Use TFRecordWriter to convert the training and evaluation datasets into TFRecords.
3. Visualize statistics for both datasets simultaneously to fix the datasets’ values, and fix the training dataset’s schema after displaying it together with anomalies in the evaluation dataset.
Transforming into TFRecord is an unnecessary step.

You have developed a simple feedforward network on a very wide dataset. You trained the model with mini-batch gradient descent and L1 regularization. During training, you noticed the loss steadily decreasing before moving back to the top at a very sharp angle and starting to oscillate. You want to fix this behavior with minimal changes to the model. What should you do?
Shuffle the data before training, and iteratively adjust the batch size until the loss improves.
divergence due to repetitive behavior in the data typically shows a loss that starts oscillating after some steps but does not jump back to the top.
Explore the feature set to remove NaNs and clip any noisy outliers. Shuffle the data before retraining.
A large increase in loss is typically caused by anomalous values in the input data that cause NaN traps or exploding gradients.
Switch from L1 to L2 regularization, and iteratively adjust the L2 penalty until the loss improves.
L2 is not clearly a better solution than L1 regularization for wide models. L1 helps with sparsity, and L2 helps with collinearity.
Adjust the learning rate to exponentially decay with a larger decrease at the step where the loss jumped, and iteratively adjust the initial learning rate until the loss improves.
A learning rate schedule that is not tuned typically shows a loss that starts oscillating after some steps but does not jump back to the top.

You trained a neural network on a small normalized wide dataset. The model performs well without overfitting, but you want to improve how the model pipeline processes the features because they are not all expected to be relevant for the prediction. You want to implement changes that minimize model complexity while maintaining or improving the model’s offline performance. What should you do?
Keep the original feature set, and add L1 regularization to the loss function.
Although the approach lets you reduce RAM requirements by pushing the weights for meaningless features to 0, regularization tends to cause the training error to increase. Consequently, the model performance is expected to decrease.
Use principal component analysis (PCA), and select the first n components that explain 99% of the variance.
PCA is an unsupervised approach, and it is a valid method of feature selection only if the most important variables are the ones that also have the most variation. This is usually not true, and disregarding the last few components is likely to decrease model performance.
Perform correlation analysis. Remove features that are highly correlated to one another and features that are not correlated to the target.
Removing irrelevant features reduces model complexity and is expected to boost performance by removing noise.
Ensure that categorical features are one-hot encoded and that continuous variables are binned, and create feature crosses for a subset of relevant features.
This approach can make the model converge faster but it increases model RAM requirements, and it is not expected to boost model performance because neural networks inherently learn feature crosses.

You trained a model in a Vertex AI Workbench notebook that has good validation RMSE. You defined 20 parameters with the associated search spaces that you plan to use for model tuning. You want to use a tuning approach that maximizes tuning job speed. You also want to optimize cost, reproducibility, model performance, and scalability where possible if they do not affect speed. What should you do?
Set up a cell to run a hyperparameter tuning job using Vertex AI Vizier with val_rmse specified as the metric in the study configuration.
Vertex AI Vizier should be used for systems that do not have a known objective function or are too costly to evaluate using the objective function. Neither applies to the specified use case. Vizier requires sequential trials and does not optimize for cost or tuning time.
Using a dedicated Python library such as Hyperopt or Optuna, configure a cell to run a local hyperparameter tuning job with Bayesian optimization.
Bayesian optimization can converge in fewer iterations than the other algorithms but not necessarily in a faster time because trials are dependent and thus require sequentiality. Also, running tuning locally does not optimize for reproducibility and scalability.
Refactor the notebook into a parametrized and dockerized Python script, and push it to Container Registry. Use the UI to set up a hyperparameter tuning job in Vertex AI. Use the created image and include Grid Search as an algorithm.
Grid Search is a brute-force approach and it is not feasible to fully parallelize. Because you need to try all hyperparameter combinations, that is an exponential number of trials with respect to the number of hyperparameters, Grid Search is inefficient for high spaces in time, cost, and computing power.
Refactor the notebook into a parametrized and dockerized Python script, and push it to Container Registry. Use the command line to set up a hyperparameter tuning job in Vertex AI. Use the created image and include Random Search as an algorithm where maximum trial count is equal to parallel trial count.
Random Search can limit the search iterations on time and parallelize all trials so that the execution time of the tuning job corresponds to the longest training produced by your hyperparameter combination. This approach also optimizes for the other mentioned metrics.

You trained a deep model for a regression task. The model predicts the expected sale price for a house based on features that are not guaranteed to be independent. You want to evaluate your model by defining a baseline approach and selecting an evaluation metric for comparison that detects high variance in the model. What should you do?
Use a heuristic that predicts the mean value as the baseline, and compare the trained model’s mean absolute error against the baseline.
Always predicting the mean value is not expected to be a strong baseline; house prices could assume a wide range of values. Also, mean absolute error is not the best metric to detect variance because it gives the same weight to all errors.
Use a linear model trained on the most predictive features as the baseline, and compare the trained model’s root mean squared error against the baseline.
A linear model is not expected to perform well with multicollinearity. Also, root mean squared error does not penalize high variance as much as mean squared error because the root operation reduces the importance of higher values.
Determine the maximum acceptable mean absolute percentage error (MAPE) as the baseline, and compare the model’s MAPE against the baseline.
While defining a threshold for acceptable performance is a good practice for blessing models, a baseline should aim to test statistically a model’s ability to learn by comparing it to a less complex data-driven approach. Also, this approach does not detect high variance in the model.
Use a simple neural network with one fully connected hidden layer as the baseline, and compare the trained model’s mean squared error against the baseline.
A one-layer neural network can handle collinearity and is a good baseline. The mean square error is a good metric because it gives more weight to errors with larger absolute values than to errors with smaller absolute values.

You designed a 5-billion-parameter language model in TensorFlow Keras that used autotuned tf.data to load the data in memory. You created a distributed training job in Vertex AI with tf.distribute.MirroredStrategy, and set the large_model_v100 machine for the primary instance. The training job fails with the following error:
“The replica 0 ran out of memory with a non-zero status of 9.”
You want to fix this error without vertically increasing the memory of the replicas. What should you do?
Keep MirroredStrategy. Increase the number of attached V100 accelerators until the memory error is resolved.
MirroredStrategy is a data-parallel approach. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.
Switch to ParameterServerStrategy, and add a parameter server worker pool with large_model_v100 instance type.
The parameter server alleviates some workload from the primary replica by coordinating the shared model state between the workers, but it still requires the whole model to be shared with workers. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.
Switch to tf.distribute.MultiWorkerMirroredStrategy with Reduction Server. Increase the number of workers until the memory error is resolved.
MultiWorkerMirroredStrategy is a data-parallel approach. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself. Reduction Server increases throughput and reduces latency of communication, but it does not help with memory issues.
Switch to a custom distribution strategy that uses TF_CONFIG to equally split model layers between workers. Increase the number of workers until the memory error is resolved.
This is an example of a model-parallel approach that splits the model between workers. You can use DTensors to implement this. This approach is expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.

You need to develop an online model prediction service that accesses pre-computed near-real-time features and returns a customer churn probability value. The features are saved in BigQuery and updated hourly using a scheduled query. You want this service to be low latency and scalable and require minimal maintenance. What should you do?
1. Configure a Cloud Function that exports features from BigQuery to Memorystore.
2. Use Memorystore to perform feature lookup. Deploy the model as a custom prediction endpoint in Vertex AI, and enable automatic scaling.
This approach creates a fully managed autoscalable service that minimizes maintenance while providing low latency with the use of Memorystore.
1. Configure a Cloud Function that exports features from BigQuery to Memorystore.
2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Memorystore and performs inference with an in-memory model.
Feature lookup and model inference can be performed in Cloud Functions, and using Google Kubernetes Engine increases maintenance.
1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store.
2. Use the online service API from Vertex AI Feature Store to perform feature lookup. Deploy the model as a custom prediction endpoint in Vertex AI, and enable automatic scaling.
Vertex AI Feature Store is not as low-latency as Memorystore.
1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store.
2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Vertex AI Feature Store’s online serving API and performs inference with an in-memory model.
Feature lookup and model inference can be performed in Cloud Functions, and using Google Kubernetes Engine increases maintenance. Also, Vertex AI Feature Store is not as low-latency as Memorystore.

You are logged into the Vertex AI Pipeline UI and noticed that an automated production TensorFlow training pipeline finished three hours earlier than a typical run. You do not have access to production data for security reasons, but you have verified that no alert was logged in any of the ML system’s monitoring systems and that the pipeline code has not been updated recently. You want to debug the pipeline as quickly as possible so you can determine whether to deploy the trained model. What should you do?
Navigate to Vertex AI Pipelines, and open Vertex AI TensorBoard. Check whether the training regime and metrics converge.
TensorBoard provides a compact and complete overview of training metrics such as loss and accuracy over time. If the training converges with the model’s expected accuracy, the model can be deployed.
Access the Pipeline run analysis pane from Vertex AI Pipelines, and check whether the input configuration and pipeline steps have the expected values.
Checking input configuration is a good test, but it is not sufficient to ensure that model performance is acceptable. You can access logs and outputs for each pipeline step to review model performance, but it would involve more steps than using TensorBoard.
Determine the trained model’s location from the pipeline’s metadata in Vertex ML Metadata, and compare the trained model’s size to the previous model.
Model size is a good indicator of health but does not provide a complete overview to make sure that the model can be safely deployed. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.
Request access to production systems. Get the training data’s location from the pipeline’s metadata in Vertex ML Metadata, and compare data volumes of the current run to the previous run.
Data is the most probable cause of this behavior, but it is not the only possible cause. Also, access requests could take a long time and are not the most secure option. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.

You recently developed a custom ML model that was trained in Vertex AI on a post-processed training dataset stored in BigQuery. You used a Cloud Run container to deploy the prediction service. The service performs feature lookup and pre-processing and sends a prediction request to a model endpoint in Vertex AI. You want to configure a comprehensive monitoring solution for training-serving skew that requires minimal maintenance. What should you do?
Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses email to send alerts. When an alert is received, use the console to diagnose the issue.
Vertex AI Model Monitoring is a fully managed solution for monitoring training-serving skew that, by definition, requires minimal maintenance. Using the console for diagnostics is recommended for a comprehensive monitoring solution because there could be multiple causes for the skew that require manual review.
Update the model hosted in Vertex AI to enable request-response logging. Create a Data Studio dashboard that compares training data and logged data for potential training-serving skew and uses email to send a daily scheduled report.
This solution does not minimize maintenance. It involves multiple custom components that require additional updates for any schema change.
Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.
A model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.
Update the model hosted in Vertex AI to enable request-response logging. Schedule a daily DataFlow Flex job that uses Tensorflow Data Validation to detect training-serving skew and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.
This solution does not minimize maintenance. It involves multiple components that require additional updates for any schema change. Also, a model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.

You have a historical data set of the sale price of 10,000 houses and the 10 most important features resulting from principal component analysis (PCA). You need to develop a model that predicts whether a house will sell at one of the following equally distributed price ranges: 200-300k, 300-400k, 400-500k, 500-600k, or 600-700k. You want to use the simplest algorithmic and evaluative approach. What should you do?
Define a one-vs-one classification task where each price range is a categorical label. Use F1 score as the metric.
This approach is more complex than the classification approach suggested in response B. F1 score is not useful with equally distributed labels, and one-vs-one classification is used for multi-label classification, but the use case would require only one label to be correct.
Define a multi-class classification task where each price range is a categorical label. Use accuracy as the metric.
The use case is an ordinal classification task which is most simply solved using multi-class classification. Accuracy as a metric is the best match for a use case with discrete and balanced labels.
Define a regression task where the label is the sale price represented as an integer. Use mean absolute error as the metric.
Regression is not the recommended approach when solving an ordinal classification task with a small number of discrete values. This specific regression approach adds complexity in comparison to the regression approach suggested in response D because it uses the exact sale price to predict a range. Finally, the mean absolute error would not be the recommended metric because it gives the same penalty for errors of any magnitude.
Define a regression task where the label is the average of the price range that corresponds to the house sale price represented as an integer. Use root mean squared error as the metric.
Regression is not the recommended approach when solving an ordinal classification task with a small number of discrete values. This specific regression approach would be recommended in comparison to the regression approach suggested in response C because it uses a less complex label and a recommended metric to minimize variance and bias.

You downloaded a TensorFlow language model pre-trained on a proprietary dataset by another company, and you tuned the model with Vertex AI Training by replacing the last layer with a custom dense layer. The model achieves the expected offline accuracy; however, it exceeds the required online prediction latency by 20ms. You want to optimize the model to reduce latency while minimizing the offline performance drop before deploying the model to production. What should you do?
Apply post-training quantization on the tuned model, and serve the quantized model.
Post-training quantization is the recommended option for reducing model latency when re-training is not possible. Post-training quantization can minimally decrease model performance.
Use quantization-aware training to tune the pre-trained model on your dataset, and serve the quantized model.
Tuning the whole model on the custom dataset only will cause a drop in offline performance.
Use pruning to tune the pre-trained model on your dataset, and serve the pruned model after stripping it of training variables.
Tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, pruning helps in compressing model size, but it is expected to provide less latency improvements than quantization.
Use clustering to tune the pre-trained model on your dataset, and serve the clustered model after stripping it of training variables.
Tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, clustering helps in compressing model size, but it does not reduce latency.

You developed a model for a classification task where the minority class appears in 10% of the data set. You ran the training on the original imbalanced data set and have checked the resulting model performance. The confusion matrix indicates that the model did not learn the minority class. You want to improve the model performance while minimizing run time and keeping the predictions calibrated. What should you do?
Update the weights of the classification function to penalize misclassifications of the minority class.
This approach does not guarantee calibrated predictions and does not improve training run time.
Tune the classification threshold, and calibrate the model with isotonic regression on the validation set.
This approach increases run time by adding threshold tuning and calibration on top of model training.
Upsample the minority class in the training set, and update the weight of the upsampled class by the same sampling factor.
Upsampling increases training run time by providing more data samples during training.
Downsample the majority class in the training set, and update the weight of the downsampled class by the same sampling factor.
Downsampling with upweighting improves performance on the minority class while speeding up convergence and keeping the predictions calibrated.

You have a dataset that is split into training, validation, and test sets. All the sets have similar distributions. You have sub-selected the most relevant features and trained a neural network in TensorFlow. TensorBoard plots show the training loss oscillating around 0.9, with the validation loss higher than the training loss by 0.3. You want to update the training regime to maximize the convergence of both losses and reduce overfitting. What should you do?
Decrease the learning rate to fix the validation loss, and increase the number of training epochs to improve the convergence of both losses.
Changing the learning rate does not reduce overfitting. Increasing the number of training epochs is not expected to improve the losses significantly.
Decrease the learning rate to fix the validation loss, and increase the number and dimension of the layers in the network to improve the convergence of both losses.
Changing the learning rate does not reduce overfitting.
Introduce L1 regularization to fix the validation loss, and increase the learning rate and the number of training epochs to improve the convergence of both losses.
Increasing the number of training epochs is not expected to improve the losses significantly, and increasing the learning rate could also make the model training unstable. L1 regularization could be used to stabilize the learning, but it is not expected to be particularly helpful because only the most relevant features have been used for training.
Introduce L2 regularization to fix the validation loss, and increase the number and dimension of the layers in the network to improve the convergence of both losses.
L2 regularization prevents overfitting. Increasing the model’s complexity boosts the predictive ability of the model, which is expected to optimize loss convergence when underfitting.

You recently used Vertex AI Prediction to deploy a custom-trained model in production. The automated re-training pipeline made available a new model version that passed all unit and infrastructure tests. You want to define a rollout strategy for the new model version that guarantees an optimal user experience with zero downtime. What should you do?
Release the new model version in the same Vertex AI endpoint. Use traffic splitting in Vertex AI Prediction to route a small random subset of requests to the new version and, if the new version is successful, gradually route the remaining traffic to it.
Canary deployments may affect user experience, even if on a small subset of users.
Release the new model version in a new Vertex AI endpoint. Update the application to send all requests to both Vertex AI endpoints, and log the predictions from the new endpoint. If the new version is successful, route all traffic to the new application.
Shadow deployments minimize the risk of affecting user experience while ensuring zero downtime.
Deploy the current model version with an Istio resource in Google Kubernetes Engine, and route production traffic to it. Deploy the new model version, and use Istio to route a small random subset of traffic to it. If the new version is successful, gradually route the remaining traffic to it.
Canary deployments may affect user experience, even if on a small subset of users. This approach is a less managed alternative to response A and could cause downtime when moving between services.
Install Seldon Core and deploy an Istio resource in Google Kubernetes Engine. Deploy the current model version and the new model version using the multi-armed bandit algorithm in Seldon to dynamically route requests between the two versions before eventually routing all traffic over to the best-performing version.
The multi-armed bandit approach may affect user experience, even if on a small subset of users. This approach could cause downtime when moving between services.

You trained a model for sentiment analysis in TensorFlow Keras, saved it in SavedModel format, and deployed it with Vertex AI Predictions as a custom container. You selected a random sentence from the test set, and used a REST API call to send a prediction request. The service returned the error:
“Could not find matching concrete function to call loaded from the SavedModel. Got: Tensor("inputs:0", shape=(None,), dtype=string). Expected: TensorSpec(shape=(None, None), dtype=tf.int64, name='inputs')”.
You want to update the model’s code and fix the error while following Google-recommended best practices. What should you do?
Combine all preprocessing steps in a function, and call the function on the string input before requesting the model’s prediction on the processed input.
Duplicating the preprocessing adds unnecessary dependencies between the training and serving code and could cause training-serving skew.
Combine all preprocessing steps in a function, and update the default serving signature to accept a string input wrapped into the preprocessing function call.
This approach efficiently updates the model while ensuring no training-serving skew.
Create a custom layer that performs all preprocessing steps, and update the Keras model to accept a string input followed by the custom preprocessing layer.
This approach adds unnecessary complexity. Because you update the model directly, you will need to re-train the model.
Combine all preprocessing steps in a function, and update the Keras model to accept a string input followed by a Lambda layer wrapping the preprocessing function.
This approach adds unnecessary complexity. Because you update the model directly, you will need to re-train the model. Note that using Lambda layers over custom layers is recommended for simple operations or quick experimentation only.

You used Vertex AI Workbench user-managed notebooks to develop a TensorFlow model. The model pipeline accesses data from Cloud Storage, performs feature engineering and training locally, and outputs the trained model in Vertex AI Model Registry. The end-to-end pipeline takes 10 hours on the attached optimized instance type. You want to introduce model and data lineage for automated re-training runs for this pipeline only while minimizing the cost to run the pipeline. What should you do?
1. Use the Vertex AI SDK to create an experiment for the pipeline runs, and save metadata throughout the pipeline.
2. Configure a scheduled recurring execution for the notebook.
3. Access data and model metadata in Vertex ML Metadata.
A managed solution does not minimize running costs, and Vertex AI ML Metadata is more managed than Cloud Storage.
1. Use the Vertex AI SDK to create an experiment, launch a custom training job in Vertex training service with the same instance type configuration as the notebook, and save metadata throughout the pipeline.
2. Configure a scheduled recurring execution for the notebook.
3. Access data and model metadata in Vertex ML Metadata.
A managed solution does not minimize running costs, and this approach introduces Vertex training service with Vertex ML Metadata, which are both managed services.
1. Create a Cloud Storage bucket to store metadata.
2. Write a function that saves data and model metadata by using TensorFlow ML Metadata in one time-stamped subfolder per pipeline run.
3. Configure a scheduled recurring execution for the notebook.
4. Access data and model metadata in Cloud Storage.
This approach minimizes running costs by being self-managed. This approach is recommended to minimize running costs only for simple use cases such as deploying one pipeline only. When optimizing for maintenance and development costs or scaling to more than one pipeline or performing experimentation, using Vertex ML Metadata and Vertex AI Pipelines are recommended
1. Refactor the pipeline code into a TensorFlow Extended (TFX) pipeline.
2. Load the TFX pipeline in Vertex AI Pipelines, and configure the pipeline to use the same instance type configuration as the notebook.
3. Use Cloud Scheduler to configure a recurring execution for the pipeline.
4. Access data and model metadata in Vertex AI Pipelines.
A managed solution does not minimize running costs, and this approach introduces Vertex AI Pipelines, which is a fully managed service.

You work for a manufacturing company that owns a high-value machine which has several machine settings and multiple sensors. A history of the machine’s hourly sensor readings and known failure event data are stored in BigQuery. You need to predict if the machine will fail within the next 3 days in order to schedule maintenance before the machine fails. Which data preparation and model training steps should you take?
Data preparation: Daily max value feature engineering with DataPrep; Model training: AutoML classification with BQML
DataPrep is not appropriate.
Data preparation: Daily min value feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to True
DataPrep is not appropriate.
Data preparation: Rolling average feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to False
Model training does not balance class labels for unbalanced data sets
Data preparation: Rolling average feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to True
Considering the noise and fluctuations of the data, the moving average is more appropriate than min/max to show the trend.
Model training: BQML allows you to create and run machine learning models using standard SQL queries in BigQuery.
The 'auto_class_weights=TRUE' option balances class labels in the training data. By default, the training data is not weighted. If the training data labels are out of balance, the model can train to predict by weighting the most popular label classes more.
It is correct because it uses a moving average of the sensor data and balances the weights using the parameters of BQML, AUTO_CLASS_WEIGHTS.

You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?
Automate a blend of the shortest and longest intents to be representative of all intents.
You should not automate the higher value requests.
Automate the more complicated requests first because those require more of the agents’ time.
Live agents are better suited to handle these complicated requests.
Automate the 10 intents that cover 70% of the requests so that live agents can handle the more complicated requests.
It enables a machine to handle the most simple requests and gives the live agents more opportunity to handle higher value requests.
Automate intents in places where common words such as “payment” only appear once to avoid confusing the software.
Dialogflow can handle the same word in multiple intents.

You work for a maintenance company and have built and trained a deep learning model that identifies defects based on thermal images of underground electric cables. Your dataset contains 10,000 images, 100 of which contain visible defects. How should you evaluate the performance of the model on a test dataset?
Calculate the Area Under the Curve (AUC) value.
It is scale-invariant. AUC measures how well predictions are ranked, rather than their absolute values. AUC is also classification-threshold invariant. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen.
Calculate the number of true positive results predicted by the model.
Calculating the number of true positives without considering false positives can lead to misleading results. For instance, the model could classify nearly every image as a defect. This would result in many true positives, but the model would in fact be a very poor discriminator.
Calculate the fraction of images predicted by the model to have a visible defect.
Merely calculating the fraction of images that contain defects doesn’t indicate whether your model is accurate or not.
Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the model’s performance on the training dataset.
This metric is more commonly used in distance-based models (e.g., K Nearest Neighbors). This isn’t an appropriate metric for checking the performance of an image classification model.

You are an ML engineer at a media company. You need to build an ML model to analyze video content frame by frame, identify objects, and alert users if there is inappropriate content. Which Google Cloud products should you use to build this project?
Pub/Sub, Cloud Functions, and Vision API
There is no tool for alerting and notifying.
Pub/Sub, Cloud IoT, Dataflow, Vision API, and Cloud Logging
It uses Vision API for processing videos.
Pub/Sub, Cloud Functions, Video Intelligence API, and Cloud Logging
Video Intelligence API can find inappropriate components and other components satisfy the requirements of real-time processing and notification.
Pub/Sub, Cloud Functions, AutoML, and Cloud Logging
AutoML is for cases where you wish to customize models with Google’s model and your data.

You need to write a generic test to verify whether Deep Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?
Train the model for a few iterations, and check for NaN values.
The test does not check that the model has enough parameters to learn the task.
Train the model for a few iterations, and verify that the loss is constant.
The loss should decrease if you have enough parameters to learn the task.
Train a simple linear model, and determine if the DNN model outperforms it.
Outperforming the linear model does not guarantee that the model has enough parameters to learn tasks with non-linear data representations. The option also doesn’t quantify a metric to give an indication of how well the model performed.
Train the model with no regularization, and verify that the loss function is close to zero.
The test can check that the model has enough parameters to memorize the task.

You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?
Use K-fold cross validation to understand how the model performs on different test datasets.
K-fold cross validation offers no explanation on the predictions made by the model.
Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.
It identifies the pixel of the input image that leads to the classification of the image itself.
Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.
PCA simplifies higher dimensional datasets but offers no added benefit to the scenario.
Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.
clustering images does not provide any insight into why the classification model made the predictions that it did.

You work for a large retailer. You want to use ML to forecast future sales leveraging 10 years of historical sales data. The historical data is stored in Cloud Storage in Avro format. You want to rapidly experiment with all the available data. How should you build and train your model for the sales forecast?
Load data into BigQuery and use the ARIMA model type on BigQuery ML.
BigQuery ML is designed for fast and rapid experimentation and it is possible to use federated queries to read data directly from Cloud Storage. Moreover, ARIMA is considered one of the best in class for time series forecasting.
Convert the data into CSV format and create a regression model on AutoML.
AutoML is not ideal for fast iteration and rapid experimentation. Even if it does not require data cleanup and hyperparameter tuning, it takes at least one hour to create a model.
Convert the data into TFRecords and create an RNN model on TensorFlow on Vertex AI Workbench.
In order to build a custom TensorFlow model, you would still need to do data cleanup and hyperparameter tuning.
Convert and refactor the data into CSV format and use the built-in XGBoost algorithm on Vertex AI custom training.
Using Vertex AI custom training requires preprocessing your data in a particular CSV structure and it is not ideal for fast iteration, as training times can take a long time because it cannot be distributed on multiple machines.

You need to build an object detection model for a small startup company to identify if and where the company’s logo appears in an image. You were given a large repository of images, some with logos and some without. These images are not yet labelled. You need to label these pictures, and then train and deploy the model. What should you do?
Use Google Cloud’s Data Labelling Service to label your data. Use AutoML Object Detection to train and deploy the model.
This will allow you to easily create a request for a labelling task and deploy a high-performance model.
Use Vision API to detect and identify logos in pictures and use it as a label. Use Vertex AI to build and train a convolutional neural network.
Vision API is not guaranteed to work with any company logos, and in the statement it explicitly mentions a small startup, which will further decrease the chance of success.
Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use Vertex AI to build and train a convolutional neural network.
The task of manually labelling the data is time consuming and should be avoided if possible.
Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use Vertex AI to build and train a real time object detection model.
The task of labelling object detection data is very tedious, and real time object detection is designed detecting objects in videos rather than in images.

You work for a gaming company that develops and manages a popular massively multiplayer online (MMO) game. The game’s environment is open-ended, and a large number of positions and moves can be taken by a player. Your team has developed an ML model with TensorFlow that predicts the next move of each player. Edge deployment is not possible, but low-latency serving is required. How should you configure the deployment?
Use a Cloud TPU to optimize model training speed.
Use Vertex AI Endpoint with an NVIDIA GPU.
Use Vertex AI Endpoint with a high-CPU machine type to get a batch prediction for the players.
Use Vertex AI Endpoint with a high-memory machine type to get a batch prediction for the players.

Your team is using a TensorFlow Inception-v3 CNN model pretrained on ImageNet for an image classification prediction challenge on 10,000 images. You will use Vertex AI to perform the model training. What TensorFlow distribution strategy and Vertex AI custom training job configuration should you use to train the model and optimize for wall-clock time?
Default Strategy; Custom tier with a single master node and four v100 GPUs.
Default Strategy does not distribute training across multiple devices.
One Device Strategy; Custom tier with a single master node and four v100 GPUs.
One Device Strategy does not distribute training across multiple devices.
One Device Strategy; Custom tier with a single master node and eight v100 GPUs.
One Device Strategy does not distribute training across multiple devices.
MirroredStrategy; Custom tier with a single master node and four v100 GPUs.
This is the only strategy that can perform distributed training; albeit there is only a single copy of the variables on the CPU host.

You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on Vertex AI. How should you configure the architecture before deploying the model to production?
Deploy model in test environment -> Evaluate and test model -> Create a new Vertex AI model version
The model can be validated after it is deployed to the test environment, and the release version is established before the model is deployed in production.
Validate model -> Deploy model in test environment -> Create a new Vertex AI model version
The model cannot be validated before being deployed to the test environment.
Create a new Vertex AI model version -> Evaluate and test model -> Deploy model in test environment
The model version is being set up for the release candidate before the model is validated. Moreover, the model cannot be validated before being deployed to the test environment.
Create a new Vertex AI model version - > Deploy model in test environment -> Validate model
The model version is being set up for the release candidate before the model is validated.

AutoML, Vertex AI Workbench, and TensorFlow align to which stage of the data-to-AI workflow?
Ingestion and process
Analytics
Storage
Machine learning

Compute Engine, Google Kubernetes Engine, App Engine, and Cloud Functions represent which type of services?
Database and storage
Networking
Compute
Machine learning

Which data storage class is best for storing data that needs to be accessed less than once a year, such as online backups and disaster recovery?
Standard storage
Coldline storage
Nearline storage
Archive storage

Which Google hardware innovation tailors architecture to meet the computation needs on a domain, such as the matrix multiplication in machine learning?
CPUs (central processing units)
TPUs (Tensor Processing Units)
GPUs (graphic processing units)
DPUs (data processing units)

Pub/Sub, Dataflow, Dataproc, and Cloud Data Fusion align to which stage of the data-to-AI workflow?
Ingestion and process
Analytics
Storage
Machine learning

Cloud Storage, Cloud Bigtable, Cloud SQL, Cloud Spanner, and Firestore represent which type of services?
Machine learning
Database and storage
Networking
Compute

Due to several data types and sources, big data often has many data dimensions. This can introduce data inconsistencies and uncertainties. Which type of challenge might this present to data engineers?
Volume
Veracity
Velocity
Variety

Which Google Cloud product acts as an execution engine to process and implement data processing pipelines?
Looker
Dataflow
Looker Studio
Apache Beam

Select the correct streaming data workflow.
Visualize the data, process the data, and ingest the streaming data.
Process the data, visualize the data, and ingest the data.
Ingest the streaming data, visualize the data, and process the data.
Ingest the streaming data, process the data, and visualize the results.

When you build scalable and reliable pipelines, data often needs to be processed in near-real time, as soon as it reaches the system. Which type of challenge might this present to data engineers?
Volume
Veracity
Velocity
Variety

Which Google Cloud product is a distributed messaging service that is designed to ingest messages from multiple device streams such as gaming events, IoT devices, and application streams?
Pub/Sub
Apache Beam
Looker Studio
Looker

In a supervised machine learning model, what provides historical data that can be used to predict future data?
Data points
Labels
Features
Examples

Which BigQuery feature leverages geography data types and standard SQL geography functions to analyze a data set?
Building machine learning models
Ad hoc analysis
Building business intelligence dashboards
Geospatial analysis

BigQuery is a fully managed data warehouse. What does “fully managed” refer to?
BigQuery manages the cost for you.
BigQuery manages the data quality for you.
BigQuery manages the data source for you.
BigQuery manages the underlying structure for you.

Which two services does BigQuery provide?
Application services and storage
Storage and compute
Storage and analytics
Application services and analytics

You want to use machine learning to identify whether an email is spam. Which should you use?
Supervised learning, logistic regression
Unsupervised learning, cluster analysis
Unsupervised learning, dimensionality reduction
Supervised learning, linear regression

You want to use machine learning to group random photos into similar groups. Which should you use?
Supervised learning, logistic regression
Unsupervised learning, cluster analysis
Unsupervised learning, dimensionality reduction
Supervised learning, linear regression

Which pattern describes source data that is moved into a BigQuery table in a single operation?
Spot load
Batch load
Generated data
Streaming

Data has been loaded into BigQuery, and the features have been selected and preprocessed. What should happen next when you use BigQuery ML to develop a machine learning model?
Evaluate the performance of the trained ML model.
Use the ML model to make predictions.
Classify labels to train on historical data.
Create the ML model inside BigQuery.

You work for a video production company and want to use machine learning to categorize event footage, but don’t want to train your own ML model. Which option can help you get started?
Custom training
Pre-built APIs
AutoML
BigQuery ML

Which Google Cloud product lets users create, deploy, and manage machine learning models in one unified platform?
Vertex AI
TensorFlow
AI Platform
Document AI

You work for a global hotel chain that has recently loaded some guest data into BigQuery. You have experience writing SQL and want to leverage machine learning to help predict guest trends for the next few months. Which option is best?
Custom training
Pre-built APIs
AutoML
BigQuery ML

Which code-based solution offered with Vertex AI gives data scientists full control over the development environment and process?
AI Solutions
Custom training
AI Platform
AutoML

Your company has a lot of data, and you want to train your own machine model to see what insights ML can provide. Due to resource constraints, you require a codeless solution. Which option is best?
Custom training
Pre-built APIs
AutoML
BigQuery ML

Which stage of the machine learning workflow includes model evaluation?
Model training
Model serving
Data preparation

Which Vertex AI tool automates, monitors, and governs machine learning systems by orchestrating the workflow in a serverless manner?
Vertex AI console
Vertex AI Feature Store
Vertex AI Pipelines
Vertex AI Workbench

A hospital uses Google’s machine learning technology to help pre-diagnose cancer by feeding historical patient medical data to the model. The goal is to identify as many potential cases as possible. Which metric should the model focus on?
Recall
Confusion matrix
Feature importance
Precision

Which stage of the machine learning workflow includes feature engineering?
Model training
Model serving
Data preparation

A farm uses Google’s machine learning technology to detect defective apples in their crop, such as those that are irregular in size or have scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?
Recall
Confusion matrix
Feature importance
Precision

Select the correct machine learning workflow.
Data preparation, model serving, model training
Data preparation, model training, model serving
Model serving, data preparation, model training
Model training, data preparation, model serving

What would you use to replace user input by machine learning?
Neural networks.
Labeled data.
Pre-trained models.
All of the options.

Which of the following is not part of the ML training phase?
Evaluating the models
Create the models
Connecting Neural Networks
Data management

Which of the following are best practices for Data preparation?
Avoid target leakage
Partially correct.
Provide a time signal
Partially correct.
Avoid training-serving skew
Partially correct.
All of the options.

Which of the following refers to the type of data used in ML models?
Unlabeled data
Partially correct.
Flagged data
Labeled data
Partially correct.
Both Labeled & Unlabeled data

What’s the most efficient way to transcribe speech?
You can collect audio data, train it and predict with it.
Use a Dictionary website for a partial transcription, then using ML to fill in what’s missing.
You can use a speech API.
All of the options.

Which of the following are facets that differentiate deep learning networks in multilayer networks?
More complex ways of connecting layers
Partially correct.
Automatic feature extraction
Partially correct.
All of the options.
Cambrian explosion of computing power to train
Partially correct.

Which of the following statement is incorrect?
Machine learning performs some core and numerical tasks
Machine learning doesn't have unit tests of its own.
None of the options are correct.
Machine learning doesn't serve that task in a website.

Which of the following statement is true about ML systems?
It generates a lot of value for the organization, for customers and for end users.
Partially correct.
Almost every single one has a team of people reviewing the algorithms, reviewing their responses and doing random sub-samples and it generates a lot of value for the organization, for customers and for end users.
None of the options are correct.
Almost every single one has a team of people reviewing the algorithms, reviewing their responses and doing random sub-samples.
Partially correct.

Which of the following networks is used in identifying faces, objects, and traffic signs?
Convolutional Neural Networks
Recurrent Neural Networks
None of the options are correct.
Deep Neural Networks

Vertex AI is flexible. You choose your training method. _____________ lets you create a training application optimized for your targeted outcome. You have complete control over training application functionality; you can target any objective, use any algorithm, develop your own loss functions or metrics, or do any other customization.
Containerized training
AutoML
Custom training
Custom training and AutoML

What is a managed dataset in Vertex AI?
Data loaded into Python - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.
Data loaded into AutoML Tables - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.
Data loaded into Vertex AI - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.
Data loaded into a Pandas Dataframe - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.

Typically, ML practitioners train models using different architectures, input data sets, hyperparameters, and hardware. What architectural type would you use for cyber-security, pattern recognition, self-driving cars, and reinforced learning?
GANS or Generative Adversarial Networks
RNNs or Recurrent Neural Networks
Sorting/Clustering
CNNs or Convolutional Neural Networks

The way you deploy a TensorFlow model is different from how you deploy a PyTorch model, and even TensorFlow models might differ based on whether they were created using AutoML or by means of code. True or False: In the unified set of APIs that Vertex AI provides, you can treat all these models in the same way.
False
True

Which Vertex AI service lets you access data, process data in a Dataproc cluster, train a model, share your results, and more, all without leaving the JupyterLab interface?
Models
Datasets
Workbench
Pipelines

Moving from experimentation to production requires packaging, deploying and monitoring your model - which can give you confidence that your model is making useful predictions in production. Monitoring measures key model performance metrics and includes:
TPU drift, RNN performance, CPU outliers and data quality.
Architectural drift, TPU performance, zone outliers and RNNs.
Model drift, model performance, model outliers and data quality.
Architectural drift, TPU hyperparameter performance, zone outliers and RNNs and CNNS.

In Machine learning development, which phase identifies your use case?
Evaluating the Model
Experimenting
Prepare training Data
Framing the problem

Vertex AI Workbench provides two Jupyter notebook-based options for your data science workflow. __________________are Deep Learning VM Images instances that are heavily customizable and are therefore ideal for users who need a lot of control over their environment.
User-Managed notebook instances
Managed notebook instances
UnManaged notebooks and User-defined notebooks
Managed notebooks and already created notebooks

Vertex AI Workbench provides two Jupyter notebook-based options for your data science workflow. __________________ are Google-managed environments with integrations and features that help you set up and work in an end-to-end notebook-based production environment.
Managed notebook instances
User Managed notebook instances
UnManaged notebooks and User-defined notebooks
Managed notebooks and already created notebooks

Which statement is correct regarding Vertex AI Workbench Notebooks?
Both options are pre-packaged with JupyterLab and have a pre-installed suite of deep learning packages, including support for the TensorFlow and PyTorch frameworks.
Partially correct.
Both options support GPU accelerators and the ability to sync with a GitHub repository.
Partially correct.
Both options are protected by Google Cloud authentication and authorization.
Partially correct.
All of the options.

True or False. In a Vertex AI Workbench Jupyter Notebook, you can access your data without leaving the JupyterLab interface.
True
False

Where can you find the Cloud Storage and Bigquery extension to browse data?
Left side-bar
Top menu-bar
Bottom
In the notebook

For users who have specific networking and security needs, ______ can be the best option. You can use VPC Service Controls to set up a ______ within a service perimeter and implement other built-in networking and security features. You can also configure user-managed notebooks instances manually to satisfy some specific networking and security needs.
User-Managed notebook instances
Managed notebook instances
UnManaged notebooks and User-defined notebooks
Managed notebooks and already created notebooks

Which of the following statements is correct for Explainable AI?
It helps you better understand your model's data.
It offers feature attributions to provide insights into why models generate predictions.
It details the importance of one feature that a model uses as input to make predictions.
It supports only pre-trained models based on tabular and image data.

Your dataset is considered small, less than 5,000 rows and around 10MB. You are not using AutoML but a Jupyter Notebook instance. Which of the following is a Best Practice for Training a model with a small dataset?
For small datasets, train the model using the Vertex AI training service.
For small datasets, train the model within the notebook instance.
For small datasets, train the model within the notebook instance, the Vertex AI training service, and the containerized training service.
For small datasets, train the model within the notebook instance and use the Vertex AI training service.

True or False: Use BigQuery to process tabular data and use Dataflow to process unstructured data.
False
True

The data used to train a model can originate from any number of systems, for example, logs from an online service system, images from a local device, or documents scraped from the web. Which of the following is a Best Practice for Preparing and Storing unstructured data such as images, audio, and video?
In BigQuery
In Cloud storage
In Cloud SQL
In BigTable

Which approach is followed to achieve a better performance across subgroups?
Evaluation metrics
None of the options are correct.
Equality of opportunity
Confusion matrix

Human biases lead to bias in machine learning models. Unconscious biases exist in our data and exist in two forms. What are the two forms of unconscious biases in data?
There are the human biases that exist in data because data found in “data silos” has existing biases with regard to properties like gender, race, and sexual orientation. We can also run into human biases which arise as part of our data collection and labeling procedures.
All of the options.
There are the human biases that exist in data because data found in “the world” has existing biases with regard to properties like gender, race, and sexual orientation. For example, there may be reporting bias by our subjects because they only choose to reveal certain aspects about themselves or their opinions. We can also run into human biases which arise as part of our data collection and labeling procedures.
First, there is human bias as a result of reporting, data collection, and labeling. Second, there is human bias as a result of data visualization and analysis.

One of the key tools to help in understanding inclusion and how to introduce inclusion across different kinds of groups across your data is by understanding the __________________________.
Evaluation regression matrix
Equality of opportunity matrix
Confusion matrix
Sigmoid matrix

The confusion matrix helps which of the following?
Evaluating performance in machine learning
Partially correct.
None of the options are correct.
Understanding inclusion and how to introduce inclusion across different subgroups within your data
Partially correct.
Both of the options are correct.

Datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. The key here is to utilize visualizations that help unlock nuances and insights in large datasets. Which tool would be most appropriate?
Firebase
Pandas
SQL
Facets

The impact of biases in collecting data and labeling data affects the entire machine learning pipeline. The biases in the original data are going to be reflected downstream in our models and consequently are going to result in potentially biased outcomes. You need to create a checklist for situations where you should watch out for bias-related issues. What questions should this checklist include?
Does your use case or product use data that is likely to be highly correlated with any personal characteristics (for example, zip code or other geospatial data is often correlated with socioeconomic status and/or income; image/video data can reveal information about race, gender, and age)?
Partially correct.
All of the options.
Does your use case or product specifically use any of the following data: biometrics, race, skin color, religion, sexual orientation, socioeconomic status, income, country, location, health, language, or dialect?
Partially correct.
Could your use case or product negatively affect individuals’ economic or other important life opportunities?
Partially correct.

What is it called when the label says something doesn't exist, but the model says it exists?
False positive
None of the options are correct.
False negative
True positive

Which of the following is an example of a “false negative”?
When the label says something exists and the model doesn’t predict it—that’s a false negative. So, in the face detection example in this lesson, the model says that there is no face in the image—when the image’s label says there *is* a face.
The label says there is no face, and the model finds no face.
The label says there is no face, but the model finds a face. Perhaps there is a statue in the image and the model falsely identifies it as a face.
The label says there is a face, and the model finds a face.

What are the features of low data quality?
Duplicated data
Partially correct.
Unreliable info
Partially correct.
Incomplete data
Partially correct.
All of the options.

Exploratory Data Analysis is majorly performed using the following methods:
Both Univariate and Bivariate
Univariate
Partially correct.
Bivariate
Partially correct.
None of the options

Which of the following is not a component of Exploratory Data Analysis?
Statistical Analysis and Clustering
Accounting and Summarizing
Anomaly Detection
Hyperparameter tuning

What are the objectives of exploratory data analysis?
Uncover a parsimonious model, one which explains the data with a minimum number of predictor variables.
Partially correct.
Check for missing data and other mistakes.
Partially correct.
Gain maximum insight into the data set and its underlying structure.
Partially correct.
All of the options.

Which of the following are categories of data quality tools?
Both ‘Cleaning tools’ and ‘Monitoring tools’
Cleaning tools
Partially correct.
Monitoring tools
Partially correct.
None of the options

Why is regularization important in logistic regression?
Finds errors in the algorithm
Avoids overfitting
Keeps training time down by regulating the time allowed
Encourages the use of large weights

Which model would you use if your problem required a discrete number of values or classes?
Supervised Model
Regression Model
Unsupervised Model
Classification Model

What is the most essential metric a regression model uses?
Both ‘Mean squared error as their loss function’ & ‘Cross entropy’
Mean squared error as their loss function
Cross entropy
None of the options

Which of the following machine learning models have labels, or in other words, the correct answers to whatever it is that we want to learn to predict?
Reinforcement Model
Unsupervised Model
Supervised Model
None of the options

To predict the continuous value of our label, which of the following algorithms is used?
Unsupervised
Classification
Regression
None of the options

Which of the following are stages of the Machine Learning workflow that can be managed with Vertex AI?
Train an ML model on your data.
Partially correct.
Create a dataset and upload data.
Partially correct.
All of the options.
Deploy your trained model to an endpoint for serving predictions.
Partially correct.

What is the main benefit of using an automated Machine Learning workflow?
It makes the model run faster.
It makes the model perform better.
It reduces the time it takes to develop trained models and assess their performance.
It deploys the model into production.

What does the Feature Importance attribution in Vertex AI display?
How much each feature impacts the model, expressed as a ratio
How much each feature impacts the model, expressed as a percentage
How much each feature impacts the model, expressed as a decimal
How much each feature impacts the model, expressed as a ranked list

MAE, MAPE, RMSE, RMSLE and R2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?
Linear Regression Metrics
Forecasting Regression Metrics
Decision Trees Progression Metrics
Clustering Regression Metrics

For a user who can use SQL, has little Machine Learning experience and wants a ‘Low-Code’ solution, which Machine Learning framework should they use?
BigQuery ML
Scikit-Learn
Python
AutoML

If the business case is to predict fraud detection, which is the correct Objective to choose in Vertex AI?
Forecasting
Clustering
Segmentation
Regression/Classification

What is the default setting in AutoML Tables for the data split in model evaluation?
80% Training, 15% Validation, 5% Testing
70% Training, 20% Validation, 10% Testing
80% Training 10% Validation, 10% Testing
80% Training, 5% Validation, 15% Testing

If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?
Tabular
Image
Video
Text

Which of the following metrics can be used to find a suitable balance between precision and recall in a model?
ROC AUC
PR AUC
F1 Score
Log Loss

For Classification or Regression problems with decision trees, which of the following models is most relevant?
XGBoost
AutoML Tables
Wide and Deep NNs
Linear Regression

Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?
DNN Classifier (TensorFlow)
AutoML Tables
XGBoost
Logistic Regression

What are the 3 key steps for creating a Recommendation System with BigQuery ML?
Prepare training data in BigQuery, specify the model options in BigQuery ML, export the predictions to Google Analytics
Import training data to BigQuery, train a recommendation system with BigQuery ML, tune the hyperparameters
Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production
Prepare training data in BigQuery, select a recommendation system from BigQuery ML, deploy and test the model

Which of the following are advantages of BigQuery ML when compared to Python based ML frameworks?
All of the options.
BigQuery ML automates multiple steps in the ML workflow
Partially correct.
BigQuery ML custom models can be created without the use of multiple tools
Partially correct.
Moving and formatting large amounts of data takes longer with Python based models compared to model training in BigQuery
Partially correct.

Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?
Time Series Anomaly Detection
Recommendation - Matrix Factorization
Time Series Forecasting
K-Means Clustering

Which of the following loss functions is used for classification problems?
MSE
Both MSE & Cross entropy
Cross entropy
None of the options are correct.

Which of the following gradient descent methods is used to compute the entire dataset?
Batch gradient descent
Mini-batch gradient descent
Gradient descent
None of the options are correct.

Which of the following are benefits of Performance metrics over loss functions?
Performance metrics are easier to understand.
Partially correct.
Performance metrics are easier to understand and are directly connected to business goals.
Performance metrics are directly connected to business goals.
Partially correct.
None of the options are correct.

For the formula used to model the relationship i.e. y = mx + b, what does ‘m’ stand for?
It captures the amount of change we've observed in our label in response to a small change in our feature.
It refers to a bias term which can be used for regression and it captures the amount of change we've observed in our label in response to a small change in our feature.
It refers to a bias term which can be used for regression.
None of the options are correct.

What are the basic steps in an ML workflow (or process)?
Collect data
Partially correct.
Perform statistical analysis and initial visualization
Partially correct.
Check for anomalies, missing data and clean the data
Partially correct.
All of the options.

Which of the following allows you to split the dataset based upon a field in your data?
FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.
BUCKETIZE, an open-source hashing algorithm that is implemented in BigQuery SQL.
ML_FEATURE FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.
None of the options are correct.

Which of the following actions can you perform on your model when it is trained and validated?
You can write it once, and only once, against the independent test dataset.
You can write it once, and only once against the dependent test dataset.
You can write it multiple times against the independent test dataset.
You can write it multiple times against the dependent test dataset.

Which of the following allows you to create repeatable samples of your data?
Use the last few digits of a hash function on the field that you're using to split or bucketize your data.
Use the first few digits of a hash function on the field that you're using to split or bucketize your data.
Use the first few digits or the last few digits of a hash function on the field that you're using to split or bucketize your data.
None of the options are correct.

How do you decide when to stop training a model?
When your loss metrics start to increase
When your loss metrics start to decrease
When your loss metrics start to both increase and decrease
None of the options are correct

Which is the best way to assess the quality of a model?
Observing how well a model performs against a new dataset that it hasn't seen before.
Observing how well a model performs against an existing known dataset.
Observing how well a model performs against a new dataset that it hasn't seen before and observing how well a model performs against an existing known dataset.
None of the options are correct.

How does TensorFlow represent numeric computations?
Using a Directed Acyclic Graph (or DAG)
None of the options are correct
Both Using a Directed Acyclic Graph (or DAG) and Flow chart
Flow chart

Which are useful components when building custom Neural Network models?
tf.losses
Partially correct.
All of the options.
tf.optimizers
Partially correct.
tf.metrics
Partially correct.

Which API is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops.
tf.Tensor
All of the options.
tf.device
tf.data.Dataset

What operations can be performed on tensors?
They can be reshaped
Partially correct.
None of the options are correct.
They can be both reshaped and sliced
They can be sliced
Partially correct.

Which of the following is true when we compute a loss gradient?
TensorFlow records all operations executed inside the context of a tf.GradientTape onto a tape.
Partially correct.
All of the options.
The computed gradient of a recorded computation will be used in reverse mode differentiation.
Partially correct.
It uses tape and the gradients associated with each recorded operation to compute the gradients.
Partially correct.

Which of the following statements is true of TensorFlow?
TensorFlow is a scalable and single-platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning.
TensorFlow is a scalable and multi platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning.
Although able to run on other processing platforms, TensorFlow 2.0 is not yet able to run on Graphical Processing Units (or GPU's).
Although able to run on other processing platforms, TensorFlow 2.0 is not yet able to run on Tensor Processing Units (or TPU's).

What are distinct ways to create a dataset?
A data transformation constructs a dataset from one or more tf.data.Dataset objects.
Partially correct.
A data source constructs a Dataset from data stored in memory or in one or more files and a data transformation constructs a dataset from one or more tf.data.Dataset objects.
A data source constructs a Dataset from data stored in memory or in one or more files.
Partially correct.
None of the options are correct.

What is the use of tf.keras.layers.TextVectorization?
It turns continuous numerical features into bucket data with discrete ranges.
It turns raw strings into an encoded representation that can be read by an Embedding layer or Dense layer.
It performs feature-wise normalization of input features.
It turns string categorical values into encoded representations that can be read by an Embedding layer or Dense layer.

Which of the following is true about embedding?
Embedding is a handy adapter that allows a network to incorporate spores or categorical data.
Partially correct.
The number of embeddings is the hyperparameter to your machine learning model.
Partially correct.
An embedding is a weighted sum of the feature crossed values.
Partially correct.
All of the options.

Which is true regarding feature columns?
Feature columns describe how the model should use raw output data from your TPU's.
Feature columns describe how the model should use raw input data from your features dictionary.
Feature columns describe how the model should use raw output data from your features dictionary.
Feature columns describe how the model should use a graph to plot a line.

When should you avoid using the Keras function adapt()?
When using TextVectorization while training on a TPU pod
When using StringLookup while training on multiple machines via ParameterServerStrategy
When working with lookup layers with very large vocabularies
When working with lookup layers with very small vocabularies

Which of the following is a part of Keras preprocessing layers?
Image preprocessing
Partially correct.
Numerical features preprocessing
Partially correct.
Image data augmentation
Partially correct.
All of the options.

Which of the following layers is non-trainable?
Hashing
Partially correct.
Normalization
Partially correct.
Discretization
Partially correct.
StringLookup
Partially correct.
All of the options.

Which of the following is not a part of Categorical features preprocessing?
tf.keras.layers.Hashing
tf.keras.layers.IntegerLookup
tf.keras.layers.CategoryEncoding
tf.keras.layers.Discretization

Select the correct statement regarding the Keras Functional API.
The Keras Functional API does not provide a more flexible way for defining models.
Unlike the Keras Sequential API, we do not have to provide the shape of the input to the model.
Unlike the Keras Sequential API, we have to provide the shape of the input to the model.
None of the options are correct.

The Keras Functional API can be characterized by having:
Multiple inputs and outputs and models with non-shared layers.
Multiple inputs and outputs and models with shared layers.
Single inputs and outputs and models with shared layers.
None of the options are correct.

What is the significance of the Fit method while training a Keras model?
Defines the validation steps
Defines the number of steps per epochs
Defines the number of epochs
Defines the batch size

The predict function in the tf.keras API returns what?
Both numpy array(s) of predictions & input_samples of predictions
Numpy array(s) of predictions
Input_samples of predictions
None of the options are correct.

During the training process, each additional layer in your network can successively reduce signal vs. noise. How can we fix this?
Use sigmoid or tanh activation functions.
Use non-saturating, linear activation functions.
Use non-saturating, nonlinear activation functions such as ReLUs.
None of the options are correct.

Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information?
True
False

How does Adam (optimization algorithm) help in compiling the Keras model?
Both by updating network weights iteratively based on training data by diagonal rescaling of the gradients
By updating network weights iteratively based on training data
Partially correct.
By diagonal rescaling of the gradients
Partially correct.
None of the options are correct.

How does regularization help build generalizable models ?
By adding dropout layers to our neural networks and by using image processing APIS to find out accuracy
By adding dropout layers to our neural networks
By using image processing APIS to find out accuracy
None of the options are correct.

The L2 regularization provides which of the following?
It adds a sum of the squared parameter weights term to the loss function.
It subtracts a sum of the squared parameter weights term to the loss function.
It multiplies a sum of the squared parameter weights term to the loss function.
None of the options are correct.

When sending training jobs to Vertex AI, it is common to split most of the logic into a _________ and a ___________ file.
task.py, model.py
task.xml, model.xml
task.json, model.json
task.avro, model.avro

When you package up a TensorFlow model as a Python Package, what statement should every Python module contain in every folder?
model.py
an __init__.py
tmodel.json
tmodel.avro

To make your code compatible with Vertex AI, there are three basic steps that must be completed in a specific order. Choose the answer that best describes those steps.
First, upload data to Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI. Next, move code into a trainer Python package.
First, move code into a trainer Python package. Next, upload data to Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI.
First, download data from Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI. Next, move code into a trainer Python package.
First, upload data to Google Cloud Storage. Next, move code into a trainer Python package. Then submit your training job with gcloud to train on Vertex AI.

You can use either pre-built containers or custom containers to run training jobs. Both containers require you specify settings that Vertex AI needs to run your training code, including __________, ____________, and ________.
Source distribution name, job name, worker pool
Cloud storage bucket name, display-name, worker-pool-spec
Region, source distribution, custom URI
Region, display-name, worker-pool-spec

Which file is the entry point to your code that Vertex AI will start and contains details such as “how to parse command-line arguments and where to write model outputs?
model.py
task.py
tmodel.json
tmodel.avro

Where are the features registered?
Feature registry
Online Store
Feature Monitoring
Offline Store

Which of the following is an instance of an entity type?
Feature
Online Store
Featurestore
Entity

What is one definition of a feature in machine learning?
A value that you receive from a model as an output
A method of feature store
A place to store any data
A value that is passed as input to a model

Vertex AI Feature Store provides a centralized repository for organizing, storing, and serving ML features. Using a central featurestore, enables an organization to efficiently share, discover, and re-use ML features at scale, which can increase the velocity of developing and deploying new ML applications. What are the key challenges that Vertex AI Feature Store solves?
Mitigate data storage silos, which occurs when you might have built and managed separate solutions for storage and the consumption of feature values.
Detect drift, as a result of significant changes to your feature data distribution over time.
Partially correct.
All of the options.
Mitigate training-serving skew, which occurs when the feature data distribution that you use in production differs from the feature data distribution that was used to train your model.
Partially correct.

Which of the following is the process of importing feature values computed by your feature engineering jobs into a featurestore?
Feature store
Feature Monitoring
Feature ingestion
Feature serving

What are the two methods feature store offers for serving features?
Online serving and Offline serving
Batch serving and Online serving
Batch serving and Stream serving
Offline serving and Stream serving

In what form can raw data be used inside ML models?
None of the options are correct.
After turning your raw data into a useful feature matrix
After turning your raw data into a useful feature vectors
After turning your raw data into multidimensional vectors

Which of the following statements is true about preprocessing?
None of the options are correct.
Preprocessing without the context of Cloud ML allows you to do it at scale.
Preprocessing within the context of Cloud ML allows you to do it at scale.
Both options are correct.

A good feature has which of the following characteristics?
All of the options.
It should be known at prediction time.
Partially correct.
It should be related to the objective.
Partially correct.
It should be numeric with meaningful magnitude.
Partially correct.

Which of the following are the requirements to build an effective machine learning model?
All of the options.
It should find good features.
Partially correct.
It should scale to a large dataset.
Partially correct.
It should be able to preprocess with Vertex AI Platform.
Partially correct.

Which of the following statements is true?
None of the options are correct.
Different problems in the same domain may need different features.
Same problems in the same domain may need different features.
Different problems in different domains may need the same features.

Which of the following statements are true regarding the ML.BUCKETIZE function?
ML.BUCKETIZE is a pre-processing function that creates buckets by returning a STRING as the bucket name after numerical_expression is split into buckets by array_split_points..
Partially correct.
None of the options are correct.
Both options are correct.
It bucketizes a continuous numerical feature into a string feature with bucket names as the value.
Partially correct.

True or False:
Feature Engineering is often one of the most valuable tasks a data scientist can do to improve model performance, for three main reasons:
1. You can isolate and highlight key information, which helps your algorithms "focus" on what’s important.
2. You can bring in your own domain expertise.
3. Once you understand the "vocabulary" of feature engineering, you can bring in other people’s domain expertise.
True
False

What is one-hot encoding?
One-hot encoding is a process by which categorical variables are converted into a form that could be provided to neural networks to do a better job in prediction.
One-hot encoding is a process by which only the hottest numeric variable is retained for use by the neural network.
One-hot encoding is a process by which numeric variables are converted into a categorical form that could be provided to neural networks to do a better job in prediction.
One-hot encoding is a process by which numeric variables are converted into a form that could be provided to neural networks to do a better job in prediction.

What is a feature cross?
A feature cross is a synthetic feature formed by adding (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.
None of the options are correct.
A feature cross is a synthetic feature formed by dividing (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.
A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.

Which of the following is true about Feature Cross?
It is a process of combining features into a single feature.
Partially correct.
None of the options are correct.
Both options are correct.
Feature Cross enables a model to learn separate weights for each combination of features.
Partially correct.

What do you use the tf.keras.layers.Discretization function for?
To compute the hash buckets needed to one-hot encode categorical values
None of the options are correct.
To discretize floating point values into a smaller number of categorical bins
To count the number of unique buckets the input values falls into

What is the significance of ML.FEATURE_CROSS?
ML.FEATURE_CROSS generates a STRUCT feature with all combinations of crossed categorical features except for 1-degree items.
None of the options are correct.
ML.FEATURE_CROSS generates a STRUCT feature with all combinations of crossed categorical features including 1-degree items.
ML.FEATURE_CROSS generates a STRUCT feature with few combinations of crossed categorical features except for 1-degree items.

Which of the following statements are true regarding the ML.EVALUATE function?
The ML.EVALUATE function can be used with linear regression, logistic regression, k-means, matrix factorization, and ARIMA-based time series models.
Partially correct.
All of the options.
You can use the ML.EVALUATE function to evaluate model metrics.
Partially correct.
The ML.EVALUATE function evaluates the predicted values against the actual data.
Partially correct.

True or False:
A ParDo acts on all items at once (like a Map in MapReduce).
True
False. A ParDo acts on one item at a time (like a Map in MapReduce)

To run a pipeline you need something called a ______________.
pipeline
runner
Apache Beam
executor

What is the purpose of a Cloud Dataflow connector? .apply(TextIO.write().to(“gs://…”));
Connectors allow you to authenticate your pipeline as specific users who may have greater access to datasets.
Connectors allow you to output the results of a pipeline to a specific data sink like Bigtable, Google Cloud Storage, flat file, BigQuery, and more.
Connectors allow you to chain multiple data-processing steps together automatically so they process in parallel.

Which of these accurately describes the relationship between Apache Beam and Cloud Dataflow?
Cloud Dataflow is the proprietary version of the Apache Beam API and the two are not compatible.
Cloud Dataflow is the API for data pipeline building in java or python and Apache Beam is the implementation and execution framework.
They are the same.

True or False:
The Filter method can be carried out in parallel and autoscaled by the execution framework:
True: Anything in Map or FlatMap can be parallelized by the Beam execution framework.
False: Anything in Map or FlatMap can be parallelized by the Beam execution framework.

Your development team is about to execute this code block. What is your team about to do?
We are preparing a staging area in Google Cloud Storage for the output of our Cloud Dataflow pipeline and will be submitting our BigQuery job with a later command.
We are compiling our Cloud Dataflow pipeline written in Java and are submitting it to the cloud for execution. Notice that we are calling mvn compile and passing in --runner=DataflowRunner.
We are compiling our Cloud Dataflow pipeline written in Python and are loading the outputs of the executed pipeline inside of Google Cloud Storage (gs://)

What is one key advantage of preprocessing your features using Apache Beam?
Apache Beam code is often harder to maintain and run at scale than BigQuery preprocessing pipelines.
The same code you use to preprocess features in training and evaluation can also be used in serving.
Apache Beam transformations are written in Standard SQL which is scalable and easy to author.

In the __________ layers, the lines are colored by the __________ of the connections between neurons. Blue shows a _________ weight, which means the network is using that _________ of the neuron as given. An orange line shows that the network is assigning a __________ weight.
Hidden, weights, positive, output, negative
Output, weights, negative, hidden, positive
Weights, hidden, negative, output, positive
Hidden, weights, negative, output, positive

True or False:
We can create many different kinds of feature crosses.
For example:
• [A X B]: a feature cross formed by multiplying the values of two features.
• [A x B x C x D x E]: a feature cross formed by multiplying the values of five features.
• [A x A]: a feature cross formed by squaring a single feature.
False
True

True or False:
In TensorFlow Playground, the data points (represented by small circles) are initially colored orange or blue, which correspond to zero and negative one.
False
The answer is positive one to negative one.
True

True or False:
In TensorFlow Playground, orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.
False
True

Why might you create an embedding of a feature cross?
To identify similar sets of inputs for clustering
Partially correct.
All of the options.
To reuse weights learned in one problem in another problem
Partially correct.
To create a lower-dimensional representation of the input space
Partially correct.

True or False:
In TensorFlow Playground, in the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.
False
True

What is Tensorflow Transform a hybrid of?
Apache Beam and TensorFlow
Both options are correct.
Dataflow and Tensorflow
None of the options are correct.

What does tf.Transform do during the training and serving phase?
Provides a TensorFlow graph for preprocessing
Provides a transformation polynomial to train the data
Provides computation over the entire dataset, including on both internal and external data sources
None of the options are correct.

True or False:
One of the goals of tf.Transform is to provide a TensorFlow graph for preprocessing that can be incorporated into the serving graph (and, optionally, the training graph).
True
False

The ______________ _______________ is the most important concept of tf.Transform. The ______________ _______________ is a logical description of a transformation of the dataset. The ______________ _______________ accepts and returns a dictionary of tensors, where a tensor means Tensor or 2D SparseTensor.
Preprocessing function
Preprocessing method
Preprocessing variable

If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?
Training operationalization
Training formalization
Experimentation & prototyping
Training implementation

What is the correct process that data scientists use to develop the models on an experimentation platform?
Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation
Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation
Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering
Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation

Which two activities are involved in ML development?
Experimentation and version control
Partially correct.
Training formalization and training operationalization
Partially correct.
Experimentation and training operationalization
Version control and training operationalization
Partially correct.

Which process covers algorithm selection, model training, hyperparameter tuning, and model evaluation in the Experimentation and Prototyping activity?
Model validation
Data exploration
Feature engineering
Model prototyping

Which of the following is correct for Online serving?
Online serving is for high-latency data retrieval of small batches of data for real-time processing.
Online serving is for high throughput and serving large volumes of data for offline processing.
Online serving is for low-latency data retrieval of small batches of data for real-time processing.
Online serving is for low throughput and serving large volumes of data for offline processing.

Which of the following is not a part of Google’s enterprise data management and governance tool?
Data Catalog
Dataplex
Feature Store
Analytics Catalog

Which Data processing option can be used for transforming large unstructured data in Google Cloud?
Dataflow
Beam proc
Hadoop proc
Apache prep

Which of the following statements is not a feature of Analytics Hub?
You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery.
Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.
There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber.

What does the Aggregation Values contain in any feature?
The min, zeros, and Std.dev values for each features
The min, median, and max values for each features
The min, median, and Std.dev values for each features
The Count, median, and max values for each features

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too large?
Training may take a long time.
A large learning rate value may result in the model learning a sub-optimal set of weights too fast or an unstable training process.
If the learning rate value is too large, then the model will converge.
The model will not train..

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too small?
Training may take a long time.
Smaller learning rates require less training epochs given the smaller changes made to the weights each update.
If the learning rate value is too small, then the model will diverge.
The model will train more quickly.

Which of the following is true?
Larger batch sizes require smaller learning rates.
Smaller batch sizes require larger learning rates.
Smaller batch sizes require smaller learning rates.
Larger batch sizes require larger learning rates.

What is "data parallelism” in distributed training?
Run the same model & computation on every device, but train each of them using the same training samples.
Run different models & computation on a single device, but train each of them using different training samples.
Run different models & computation on every device, but train each of them using only one training sample.
Run the same model & computation on every device, but train each of them using different training samples.

Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases?
Model is more likely to overfit.
Partially correct.
All of the options.
Model will not figure out general relationships in the data.
Partially correct.
Model performance on a test set is going to be poor.
Partially correct.

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______
1.0 and 3.0.
< 0.0 and > 1.00.
0.0 and 1.0.
> 0.0 and < 1.00.

Which of the following can make a huge difference in model quality?
Increasing the training time.
Setting hyperparameters to their optimal values for a given dataset.
Decreasing the number of epochs.
Increasing the learning rate.

Which of the following algorithms is useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?
Manual Search
Bayesian Optimization
Random Search
Grid Search

Which of the following is a black-box optimization service?
Early stopping
Vertex Vizier
AutoML
Manual Search

Black box optimization algorithms find the best operating parameters for any system whose ______________?
number of iterations is limited to train a model for validation.
execution time is less.
performance can be measured as a function of adjustable parameters.
iterations to get to the optimal set of hyperparameter values are less.

Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________.
All of the options.
requires less iterations to get to the optimal set of hyperparameter values.
Partially correct.
limits the number of times a model needs to be trained for validation.
Partially correct.
enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores.
Partially correct.

Which statements are correct for serving predictions using Pre-built containers?
All of the options.
Vertex AI provides Docker container images that you run as pre-built containers for serving predictions.
Pre-built containers provide HTTP prediction servers that you can use to serve prediction using minimal configurations.
Pre-built containers are organized by Machine learning framework and framework version.

Which of the following statements is invalid for a data source file in batch prediction?
You must use a regional BigQuery dataset.
The first line of the data source CSV file must contain the name of the columns.
If the Cloud Storage bucket is in a different project than where you use Vertex AI, you must provide the Storage Object Creator role to the Vertex AI service account in that project.
BigQuery data source tables must be no larger than 100 GB.

What are the features of Vertex AI model monitoring?
All of the options.
Drift in data quality
Partially correct.
Skew in training vs. serving data
Partially correct.
Feature Attribution and UI visualizations
Partially correct.

For which, the baseline is the statistical distribution of the feature's values seen in production in the recent past.
Skew detection
Categorical features
Numerical features
Drift detection

Which statement is correct regarding the maximum size for a CSV file during batch prediction?
Each data source file must include multiple files, up to a maximum amount of 50 GB.
The data source file must be no larger than 100 GB.
Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.
The data source file must be no larger than 50 GB. You can not include multiple files.

What should be done if the source table is in a different project?
You should provide the BigQuery Data Viewer role to the Vertex AI service account in your project.
You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.
You should provide the BigQuery Data Viewer role to the Vertex AI service account in that project.
You should provide the BigQuery Data Editor role to the Vertex AI service account in your project.

How can you define the pipeline's workflow as a graph?
By using the outputs of a component as an input of another component
Use the previous pipeline's output as an input for the current pipeline.
By using predictive input for each component.
By using different inputs for each component.

What can you use to compile the pipeline?
kfp.Compiler
kfp.v2.compiler
kfp.v2.compiler.Compiler
compiler.Compiler

Which package is used to define and interact with pipelines and components?
kfp.components
kfp.compiler
kfp.containers
kfp.dsl package

What can you use to create a pipeline run on Vertex AI Pipelines?
kfp.v2.compiler.Compiler
Pipeline root path
Service account
Vertex AI python client

Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?
A Google Cloud Storage bucket that acts as an output for both AutoML, custom training jobs, serialized training jobs.
A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.
A Google Cloud database that acts as an output for both AutoML and custom training jobs.
A Google Cloud database that acts as an input for both AutoML and custom training jobs.

When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature.
Transmutation
Translation
Duplication
Transformation

Match the three types of data ingest with an appropriate source of training data.
Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage)
You wouldn't ingest streaming data from BigQuery, although you could stream to it. Pub/Sub is a poor place to store your batch data, although you might use it to replay events.
Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine)
These are just made up terms.
Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
On Google Cloud, the three types of data ingestion map to three different products. If you are ingesting streaming data, use Pub/Sub. If you are ingesting structured data directly into your ML model, use BigQuery, and if you are transforming data from training so that you can train on it later, read from Cloud Storage.

Which type of training do you use if your data set doesn’t change over time?
Dynamic training
Real-time training
Static training
Online training

Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?
Request-response logging
Container logging
Access logging
Cloud logging

What is the responsibility of model evaluation and validation components?
To ensure that the models are not good before moving them into a staging environment.
To ensure that the models are good after moving them into a production/staging environment.
To ensure that the models are good before moving them into a production/staging environment.
To ensure that the models are not good after moving them into a staging environment.

In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type.
False
True

What percent of system code does the ML model account for?
25%
50%
5%
90%

Which of the following tools help software users manage dependency issues?
Monolithic programs
Polylithic programs
Modular programs
Maven, Gradle, and Pip

Which component identifies anomalies in training and serving data and can automatically create a schema by examining the data?
Data validation
Data identifier
Data ingestion
Data transform

Which of the following models are susceptible to a feedback loop? Check all that apply.
A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.
A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.
A house's location, size, or number of bedrooms cannot be quickly changed in response to price forecasts, which makes a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be analyzed.
A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.
There is no feedback loop here, because model predictions don't have any impact on the photo database. However, versioning of the input data is a concern here, because these monthly updates could potentially have unforeseen effects on the model.
An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.
If the model does not publish its forecast until after the polls have closed, its predictions cannot affect voter behavior.
A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.
Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.
A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).
The model's rankings may drive additional interest to top-rated schools, increasing the number of applications they receive. If these schools continue to admit the same number of students, selectivity will increase (the percentage of students admitted will go down). This will boost these schools' rankings, which will further increase prospective student interest, and so on…

What is training skew caused by?
The Cloud Storage you load your data from in the training environment is physically closer than the Cloud Storage you load your data from in the production environment.
The distance of where the data is stored to the processing device does not impact prediction performance.
Starting and stopping of the processing when training the model.
Starting and stopping the processing makes no difference to the training.
Your development and production environments are different, or different code is used in the training environment than in the development environment.
Different versions may cause predictions to be significantly slower or consume more memory in the training environment than in the development environment. Different code may result in different performance.
The prediction environment is slower than the training environment.
Training may take longer in development than in production, but the training is the same.

Suppose you are building an ML-based system to predict the likelihood that a customer will leave a positive review. The user interface that customers leave reviews on changed a few months ago, but you don't know about this. Which of these is a potential consequence of mismanaging this data dependency?
Change in ability of model to be part of a streaming ingest
Your model structure doesn't change just because it's easier or harder to leave reviews.
Losses in prediction quality
For example, a review might be easier to write, and so your prediction of whether someone will leave a review (whether good or bad) is too low because it was trained on reviews that resulted from the older, harder-to-use user interface
Change in model serving signature
Your model structure doesn't change just because it's easier or harder to leave reviews.

Gradual drift is used for which of the following?
An old concept that incrementally changes to a new concept over a period of time
An old concept that may reoccur after some time
A new concept that occurs within a short time
A new concept that rapidly replaces an old one over a short period of time

What is the shift in the actual relationship between the model inputs and the output called?
Prediction drift
Label drift
Data drift
Concept drift

If each of your examples is large in terms of size and requires parsing, and your model is relatively simple and shallow, your model is likely to be:
I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.
Your ML training will be I/O bound if the number of inputs is large or heterogeneous (requires parsing) or if the model is so small that the compute requirements are trivial. This also tends to be the case if the input data is on a storage system with low throughput. If you are I/O bound, look at storing the data more efficiently, storing the data on a storage system with higher throughput, or parallelizing the reads. Although it is not ideal, you might consider reducing the batch size so that you are reading less data in each step.
CPU-bound, so you should use GPUs or TPUs.
This doesn't sound like computational power is your limiting factor.
Latency-bound, so you should use faster hardware
Review I/O-bound, CPU-bound and memory-bound models.

Which of the following indicates that ML training is CPU bound?
If you are running a model on accelerated hardware.
If I/O is complex, but the model involves lots of complex/expensive computations.
If you are running a model on powered hardware.
If I/O is simple, but the model involves lots of complex/expensive computations.

What does high-performance machine learning determine?
Training a model
Time taken to train a model
Reliability of a model
Deploying a model

For the fastest I/O performance in TensorFlow…
Prefetch the data
dataset.prefetch decouples the time data is produced from the time it is consumed. It prefetches the data into a buffer in parallel with the training step. This means that we have input data for the next training step before the current one is completed.
Read TF records into your model.
dataset tf.data.TFRecordDataset(...) TF Records are set for fast, efficient, batch reads, without the overhead of having to parse the data in Python.
Read in parallel threads.
dataset tf.data.TFRecordDataset(files, num_parallel_reads40) When you're dealing with a large dataset sharded across Cloud Storage, you can speed up by reading multiple files in parallel to increase the effective throughput. You can use this feature with a single option to the TFRecordDataset constructor called num_parallel_reads.
Optimize TensorFlow performance using the Profiler.

Which of the following determines the correct property of Tensorflow Lite?
Increased code footprint
Lower precision arithmetic
Higher precision arithmetic
Quantization

To copy the input data into TensorFlow, which of the following syntaxes is correct?
inferenceInterface.feed(floatValues, 1, inputSize, inputSize, 3);
inferenceInterface.feed(inputName, floatValues, 1, inputSize, 3);
inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);
inferenceInterface.feed(inputName, floatValues, 1, inputSize; inputSize, 3);

A key principle behind Kubeflow is portability so that you can:
Move your model from on-premises to Google Cloud.
Portability is at the container level, and you can move to any environment that offers Kubernetes.
Convert your model from CUDA to XLA.
Migrate your model from TensorFlow to PyTorch.

Which of these are reasons that you may not be able to perform machine learning solely on Google Cloud? Check all that apply.
You are tied to on-premises or multi-cloud infrastructure due to business reasons.
TensorFlow is not supported on Google Cloud.
of course Google Cloud supports TensorFlow.
You need to run inference on the edge.

How does OCR (optical character recognition) transform images into an electronic form?
OCR analyzes the color of the letters and numbers to turn the scanned image into text.
OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
OCR uses a magnetic ink for the letters and numbers to turn the scanned image into text.
OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text.
OCR examines the text of a document and translates the characters into code that can be used for data processing.
OCR analyzes the size of the letters and numbers to turn the scanned image into text.
OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Which pre-built ML API is used for language translations?
Natural Language Processing API
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
Speech API
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
Vision API
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
Translation API
Translation API is built on parallel texts from language translations. It translates texts into more than one hundred languages.

What are the possible consequences for an ML model being trained with high resolution photos with high color depth?
It will increase the input size for an ML model but will reduce the training time.
If the ML model is trained with high resolution photos with high color depth, the input size will increase with longer training time for an ML model. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
Performance issues such as insufficient computing power will not occur.
If the ML model is trained with high resolution photos with high color depth, performance issues such as insufficient computing power will occur. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
It will increase the input size with longer training time for an ML model.
If the ML model is trained with high resolution photos with high color depth, there will be performance issues and an increase in input size with longer training time for the ML model.
It may lead to performance issues like insufficient computing power.
If the ML model is trained with high resolution photos with high color depth, there will be performance issues and an increase in input size with longer training time for the ML model.

How does instance segmentation help in classifying the images?
It identifies which objects are present in an image by outputting the class labels and class probabilities of objects present in that image.
Identifying which objects are present in an image by inputting the class labels and class probabilities of the objects present in that image is performed by object recognition. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
It partitions an image into multiple regions and segments all pixels in the image into different categories. Then it labels each pixel in the image, including the background and different colors.
Partitioning an image into multiple regions and segmenting all pixels in the image into different categories is done by semantic segmentation. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
It identifies the boundaries of an object and labels pixels with different colors.
Instance segmentation identifies the boundaries of an object and labels pixels with different colors. The exact outline of the object within an image is provided by image segmentation.
It assigns a class label to an image and creates a bounding box around a single object in an image.
Creation of a bounding box around a single object in an image is used in image classification with localization. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

What does the Vision API do?
It only extracts the edges from an image by identifying the boundaries of objects within an image.
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
It compares the features of images, which may be different in orientation, perspective, lighting, size, and color.
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.
It assigns labels to images and quickly classifies them into millions of predefined categories. It detects objects and faces, reads printed and handwritten text, and builds valuable metadata into the image catalog.
The API identifies labels within a video instead of images.
Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

What method do you use to create and train a model with minimal technical effort to quickly prototype models and explore new datasets before investing in development?
Managed dataset
Managed dataset manages your datasets with training applications and models. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Endpoints
Endpoints promises to improve privacy and reduce latency for online prediction tasks by eliminating the need for data to go through any public networks before making it back into VPCs. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
AutoML
AutoML lets you create and train a model with minimal technical effort. Even if you want the flexibility of a custom training application, you can use AutoML to quickly prototype models and explore new datasets before investing in development.
Unmanaged dataset
Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Which AutoML model type analyzes your video data and returns a list of shots and segments where objects are detected?
Image classification model
An image classification model analyzes image data and returns a list of content categories that apply to the image. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Video object tracking model
A video object tracking model analyzes video data and returns a list of shots and segments where certain objects were detected. For example, if it analyzes video data from a soccer game, it can identify and track the ball.
Video classification model
A video classification model analyzes video data to classify shots and segments or detect and track multiple objects. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Video action recognition model
A video action recognition model analyzes video data and returns a list of categorized actions with the moments the actions occurred. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

What is true about batch prediction?
Batch prediction is optimized to minimize the latency of serving predictions.
Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Predictions returned in the response message.
Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Batch prediction is a synchronous, or real-time, prediction, which means that it quickly returns a prediction.
Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Batch prediction is useful for making several prediction requests at the same time and is optimized to handle a high volume of instances in a job.
Requesting a batch prediction is an asynchronous request, which means that the model waits until it processes all of the prediction requests before returning a response in JSON files in Cloud Storage buckets.

What prediction method do you use for synchronous or real-time prediction that quickly returns a prediction but only accepts one prediction request per API call?
Online prediction
Vertex AI online prediction is optimized to run your data through hosted models with as little latency as possible.
AutoML prediction
Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Batch prediction
Batch prediction is useful for making several prediction requests at the same time. Requesting a batch prediction is an asynchronous request. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Online and batch prediction
Review the module “Vertex AI and AutoML Vision on Vertex AI”.

What does Vertex AI offer to achieve your ML goals?
Fast experimentation, accelerated deployment, and simplified model management
Fast experimentation, decelerated deployment, and simplified model management
Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Slow experimentation, accelerated deployment, and simplified model management
Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.
Slow experimentation, accelerated training, and simplified model management
Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

When is the dropout technique used?
Dropout is a technique used to prevent a model from underfitting.
Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
Dropout is a technique used to prevent a model from overfitting.
Dropout is a regularization technique that prevents neural networks from overfitting. During training, dropout randomly discards a portion of the neurons to avoid overfitting.
Dropout is a technique used to remove a small percentage of weights at each iteration. So weights will never be equal to zero.
Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
Dropout is a feature added between the layers of the neural network, and it continuously takes the output from the previous layer and normalizes it before sending it to the next layer.
Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

How does the batch normalization work?
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 0.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.
Batch normalization normalizes the outputs using a mean equal to 0 and a standard deviation equal to 1 (μ0,σ1).
Batch normalization applies a transformation that maintains the mean output close to 1 and the output standard deviation close to 0.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
Batch normalization applies a transformation that maintains the mean output close to 1 and the output standard deviation close to 1.
Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

When can a sequential model be used?
A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and multiple output tensors.
A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
A sequential model is appropriate for a plain stack of layers where each layer has multiple input tensors and one output tensor.
A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
A sequential model is appropriate for a plain stack of layers where each layer has multiple input tensors and multiple output tensors.
A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.
A sequential model is not appropriate when the model has multiple inputs or multiple outputs.

What function can be used for a model to do prediction?
model.compile()
The model.compile() function configures the model with losses and metrics. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
model.fit()
The model.fit() function measures how well a machine learning model generalizes to data similar to the data it was trained on. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
model.evaluate()
The model.evaluate() function predicts the output for the given input and then computes the metrics function specified in the model. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
model.predict()
The model.predict() function is used for a model to do prediction.

What does the loss function do?
The loss function computes the updated model based on the data being observed.
The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
The loss function groups layers into an object with training and inference features.
The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.
The loss function measures how accurate the model is during training.
The loss function is a measure of how accurately your prediction model predicts the expected outcome (or value).
The loss function computes the average value of the cost function over all the training samples.
The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

What kind of padding methods are available in Keras?
Same padding and valid padding
Keras has two padding methods available: one is called “same” and the other is called “valid.” In general, when small, square, and odd-numbered sizes are used for the kernels, the differences are not very meaningful. Also, Keras provides built-in support for padding.
Casual padding and valid padding
Please refer to the module “Convolutional Neural Networks”.
Casual padding
Please refer to the module “Convolutional Neural Networks”.
Same padding and casual padding
Please refer to the module “Convolutional Neural Networks”.

What does the max-pooling operation do in a convolutional neural network?
It returns the maximum value out of all the input data values passed to a kernel.
A pooling layer that relies on max-pooling does not require any weights, because the operation only cares about the largest of the input values evaluated by the kernel. This means that during training none of the parameters of the pooling layer need to change.
It returns the average value out of all the input data values passed to a kernel.
Please refer to the module “Convolutional Neural Networks”.
It returns the minimum value out of all the input data values passed to a kernel.
Please refer to the module “Convolutional Neural Networks”.
It calculates the ratio for each patch of the feature map.
Please refer to the module “Convolutional Neural Networks”.

Which factor does not affect the accuracy of the deep neural network, or DNN?
Inadequate data
Please refer to the module “Convolutional Neural Networks”.
Transfer function
Please refer to the module “Convolutional Neural Networks”.
Network architecture
Please refer to the module “Convolutional Neural Networks”.
Pixel randomization
The accuracy of the deep neural network, or DNN, is not affected by the pixel randomization because the data is not structured hierarchically.

What is true about strides?
Stride refers to the number of pixels by which the input matrix slides over the filter matrix.
Please refer to the module “Convolutional Neural Networks”.
Larger strides will produce a larger feature map.
Please refer to the module “Convolutional Neural Networks”.
Strides are the size of the step by which the filter slides across the input image.
Using a larger step will skip input pixels and produce fewer output values.
Using a stride with a value greater than 1 will reduce the shape produced by the convolutional layer.
The size of the output will be divided along every dimension by the size of the stride step.

What is a kernel in a convolutional neural network?
Kernels are the size of the step by which the filter slides across the input image.
Please refer to the module “Convolutional Neural Networks”.
Kernels are the building blocks of CNNs because they are used to extract the right and relevant features from the input data using the convolution operation.
A kernel is only a filter that is used to extract the features from the images.
A kernel is a parameter that depends on the number of channels in the input image.
Please refer to the module “Convolutional Neural Networks”.
A kernel is the area of an image in which a convolutional neural network processes.
Please refer to the module “Convolutional Neural Networks”.

What is convolution?
Convolution is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.
Please refer to the module “Convolutional Neural Networks”.
Convolution is a function that is used to reduce the spatial size of a representation and increase the number of parameters and amount of computation in a network.
Please refer to the module “Convolutional Neural Networks”.
Convolution is a parameter that helps to maintain the same size across the input and the output.
Please refer to the module “Convolutional Neural Networks”.
Convolution is the process of "sliding" a kernel across an image.
A convolution is the mathematical combination of two functions to produce a third function.

Which CNN model parameter helps to maintain the same size across the input and the output of the convolutional layer?
Kernel size
The kernel size is the size or dimension of each filter and can be a single number, like 3 for a 3x3 filter, or a pair like (3,5) for a rectangular 3x5 filter. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.
Input channels
The input channels parameter depends on the number of channels in the input image. For example, for an input image of 256x256x3, the input channels are 3. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.
Padding
Padding extends the area of an image in which a convolutional neural network processes. The approach adds a border around the input values in the original image. Therefore, it helps you maintain the same size across the input and the output of the convolutional layer.
Strides
Strides are the size of the step by which the filter slides across the input image. The default step size is 1 pixel in both directions. Using a larger step will skip input pixels and produce fewer output values. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.

How many learnable parameters does a pooling layer have?
One
Please refer to the module “Convolutional Neural Networks”.
Zero
The pooling layer doesn't have learnable parameters because it only calculates a specific number. Thus, the number of parameters in this layer is zero.
Four
Please refer to the module “Convolutional Neural Networks”.
Two
Please refer to the module “Convolutional Neural Networks”.

What is data augmentation?
Data augmentation is the amount of pixels added to an image when it is being processed by the kernel of a CNN.
Please refer to the module “Dealing with Image Data”.
Data augmentation is a set of techniques that enhance the size and quality of training datasets with the goal of creating more accurate ML models that generalize better.
Data augmentation improves the model's resilience and accuracy by creating more data.
Data augmentation is a technique where randomly selected neurons are ignored during training.
Please refer to the module “Dealing with Image Data”.
Data augmentation is the grouping together of resources for the purposes of maximizing advantage or minimizing risk to the users.
Please refer to the module “Dealing with Image Data”.

What is the proportion of the number of parameters in the entire network while computing?
A large number of parameters comes from the dense layers at the end, and the convolutional layers contain far fewer parameters.
To compute the number of parameters in a convolutional layer, multiply the number of parameters per filter by the number of filters and divide by the stride, and then finally add bias terms for each filter.
The number of parameters from dense layers and convolutional layers has the same proportion.
Please refer to the module “Dealing with Image Data”.
A small number of parameters comes from the dense layers at the end, and the convolutional layers also contain far fewer parameters.
Please refer to the module “Dealing with Image Data”.
A large number of parameters comes from the dense layers at the end, along with the convolutional layers.
Please refer to the module “Dealing with Image Data”.

What is negative transfer learning in computer vision?
When labeled data for a specific target task is scarce, the target performance is enhanced.
Please refer to the module “Dealing with Image Data”.
When labeled data for a specific target task is abundant, target performance is not degraded.
Please refer to the module “Dealing with Image Data”.
When knowledge is transferred from a less related source, the target performance is not degraded.
Please refer to the module “Dealing with Image Data”.
When knowledge is transferred from a less related source, the target performance might be degraded.
When labeled data is scarce for a specific target task, the target performance is affected.

How does transfer learning deal with the data scarcity problem?
Transfer learning boosts the need for data by initializing the parameters with better values.
Please refer to the module “Dealing with Image Data”.
Transfer learning transfers knowledge across tasks so, instead of creating more data, transfer learning decreases the need for data by initializing the parameters with better values.
Transfer learning uses knowledge acquired for one task to solve related tasks.
Transfer learning takes the pre-existing samples and changes them in some way to create new samples and also increase the number of training samples, and is typically used with image data.
Please refer to the module “Dealing with Image Data”.
Transfer learning increases the number of parameters to deal with data scarcity.
Please refer to the module “Dealing with Image Data”.

How does preprocessing help to improve the quality of the image?
Preprocessing increases unwanted noise and controls the quality of the image.
Please refer to the module “Dealing with Image Data”.
Preprocessing improves the image data by inducing missing values, noisy data, and other inconsistencies before executing it to the algorithm.
Please refer to the module “Dealing with Image Data”.
Preprocessing increases unwanted distortions and enhances the required features that are essential for the application.
Please refer to the module “Dealing with Image Data”.
Preprocessing suppresses unwanted distortions and enhances the required features that are essential for the application.
Before raw images can be fed into an image model, they usually have to be preprocessed. These preprocessing operations can include resizing, converting between color spaces, cropping, flipping, rotating and transposing for shape transformation or image adjustments, segmentation, and compression for quality enhancement.

What are the options to create a processor for Document AI?
Choose an existing processor created for general purposes.
Choose an existing processor created for a specialized task.
All of the options.
Create a custom processor and build it on your own.

What is NOT an application of NLP?
Text classification
Machine translation
Image recognition
Interactive conversation

What are the three major components that the Dialogflow API helps to identify in a conversation?
Intent (the topic), entity (the details), and context (the flow of the conversation).
Questions, answers, and feedback
End-user, Dialogflow, and fulfillment
Time, location, and participants

What are the three options provided by Google Cloud to develop an NLP project?
Pre-built APIs, AutoML, and custom training
BigQuery, Dataflow, and Looker
Dialogflow API, Contact Center AI, and Cloud Healthcare API
Dataflow, Dialogflow API, and Google Data Studio

What are the NLP tasks solved by AutoML?
Text classification
All of the options.
Entity extraction
Sentiment analysis

Vertex AI provides two solutions to build an NLP project. Which of the following is correct about these two solutions?
AutoML, which is a no-code solution, and custom training, which is a code-based solution
Document AI and the Dialogflow API
AutoML, which is a code-based solution, and custom training, which is a no-code solution
CCAI, which stands for Contact Center AI, and Document AI

What are the major stages of an end-to-end workflow to build an NLP project with Vertex AI?
Data preparation, model training, and model serving
Model deployment, model monitoring, and model serving
Dataset upload, feature engineering, and model training
Model training, model evaluation, and model deployment

Which of the following is NOT a major step of feature engineering in NLP?
Text representation
Tokenization
Preprocessing
Model testing

What is the difference between continuous bag-of-words (CBOW) and skip-gram, the two primary techniques of word2vec?
CBOW uses the next word to predict previous words, whereas skip-gram uses previous words to predict the next word.
CBOW uses a center word to predict surrounding words, whereas skip-gram uses surrounding words to predict a center word.
CBOW uses previous words to predict the next word, whereas skip-gram uses the next word to predict previous words.
CBOW uses surrounding words to predict a center word, whereas skip-gram uses a center word to predict surrounding words.

Which of the following is correct about one-hot encoding when you represent text with basic vectorization?
One-hot encoding divides a sentence to character-level.
One-hot encoding encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest.
One-hot encoding converts a sentence to a dense vector that retains the meaning of the sentence.
One-hot encoding encodes the word to the frequency it occurs in a sentence.

What are the benefits of using word embedding (such as word2vec) compared to basic vectorization (such as one-hot encoding) when you convert text to vectors?
All of the options.
Compared to basic vectorization, which converts text to sparse vectors, word embedding converts text to dense vectors.
You can use pre-trained word-embedding to represent text.
Compared to basic vectorization, which converts text to vectors without semantic meaning, word embeddings represent words in a vector space where the distance between them indicates semantic similarity and difference.

What are the major gates in a standard LSTM (long short-term memory) cell?
A standard LSTM cell includes two gates: the input gate to input information and the output gate to output information.
A standard LSTM cell includes two gates: the remember gate to remember relevant information and the forget gate to forget irrelevant information.
A standard LSTM cell includes three gates: the input gate to input information, the hidden gate to remember information, and the output gate to output information.
A standard LSTM cell includes three gates: the forget gate to forget irrelevant information, the input gate to remember relevant information, and the update gate to update new information.

What is the key feature to enable a “memory” of an RNN (recurrent neural network)?
An RNN uses a mechanism called hidden state to carry the previous information to the next learning iteration.
An RNN has a single lambda layer.
An RNN has multiple hidden layers.
An RNN has one hidden layer.

What is the coding in Keras to build the hidden layer of a GRU (gated recurrent unit) model?
gru_model = build_gru_model(embed_dim=EMBED_DIM)
GRU(units)
Lambda(lambda x: tf.reduce_mean(x, axis=1))
Dense(N_CLASSES, activation="softmax")

What is the major improvement of BERT (Bidirectional Encoder Representations) compared to transformers?
BERT considers the order of the words in a sentence, whereas a transformer doesn’t.
BERT doesn’t consider the order of the words in a sentence, whereas a transformer does.
BERT is a sequence-to-one model, whereas a transformer is a sequence-to-sequence model.
BERT is a sequence-to-sequence model, whereas a transformer is a sequence-to-one model.

Which of the following is correct about large language models?
Large in large language models refers to both huge training datasets and many parameters.
All of the options.
Transformers and BERT are examples of large language models.
Large language models can be pre-trained for general purpose and then fine-tuned for specific tasks

What is the problem that an encoder-decoder mainly solves?
Sequence-to-one problems such as email spam detection, where you use sequence of text to predict if an email is a spam
None of the above.
Sequence-to-sequence problems such as machine translation where you translate sentences to another language
One-to-sequence problems such as image captioning, where you generate a few sentences based on one image

What are some ways you can address the cold-start problem that can occur for new users of a collaborative filter recommendation system?
Rely on a content-based method instead for new users.
Content-based systems require that we either base our recommendations solely on the properties of the items, by looking for similar items for example, or that we have representations of our users in the same embedding space of our items. For a trivial example, by asking users which genres they prefer, we could make content-based recommendations using representations of items with genres as features.
Ask the user for some basic preferences.
With only a few preferences, we could classify users into different personas we've derived across our user-base and base our recommendations on the preferences of this entire group.
Give up and ask new users to make their own recommendations.
Power users may be willing to help but their capacity won't scale with your service.
Ask the new user's friends to recommend items they think would be relevant.
Our friends are not always the best product recommenders, as anyone who has ever gotten a weird gift knows. Additionally, it means your service would have little value to users who don't already have friends on the site, which might very likely be the case when the product launches in a new country.

Suppose you want to build a collaborative filter to suggest new hiking trails for users. The problem is you don't have any good explicit user ratings for trails. What feature might be useful for creating an implicit measure of a user's rating for a trail instead?
The length of the trail.
The length of the trail is constant across all users so it can't be a measure of an individual user's rating of a particular trail.
The number of times the user hiked that trail
The decision to hike a trail can be reasonably interpreted as an implicit measure of user preference.
The distance of the trail to the user's home.
Distance to the trail better captured using a knowledge-based recommendation system.
The number of times all users hiked that trail.
The number of times all users hiked a trail would be a useful measure of objective or consensus quality but it would not be able to capture an individual users' preference.

What are some potential techniques to determine how similar two items are?
Plot the two items in the embedding space and simply visually inspect to see how close they are on a graph by looking briefly.
Visual inspection is great for quick decisions but is not rigorous enough or scalable for anything formal.
Measure the cosine similarity between the two items in an embedding space.
Compare the norms (which are directionless) of the two items in an embedding space to see if they are similar.
Because norms are directionless scalars, two vectors can have identical norms and sit in completely different parts of the vector space.
Count how many features the two items have in common.
Compute the inner product between the two items in an embedding space.

When building a content-based recommender system, it's important to express both your items and users using the same embedding space (that is. the same dimensions and features).
True.
We calculate our recommendation using the product of the user embedding and the item embedding, so it's critical that the dimensions of these two embeddings are the same.
False
Think again about the computation we perform to generate predictions for a given user: we multiply a user-feature vector by an item-feature vector to get a prediction. In order for this multiplication to make sense, the shape of the vectors and the meaning of each dimension need to be the same.

ALS and WALS create embedding tables for both users and items. Because these are held in memory, it's important to plan for their size. How big would you expect the embedding table for the users to be?
Proportional to the number of users squared.
The embeddings table has a number of rows equal to the number of users and a number of columns equal to the embedding size, which is a hyper parameter. While the embedding size could theoretically be the same as the number of users, in practice, this number is never that big.
Proportional to k, the number of dimensions in your embedding space.
Proportional to the number of users.
Proportional to the number of users multiplied by the number of items.
The original sparse rating matrix has these dimensions, not the embeddings.

You want to create a hybrid recommendation system to suggest music for new users on your music streaming app that just launched. New users are asked to rate a few bands they like. You have reliable data for artist name, song name, album name, etc. Each song is labeled for genre at a coarse level (rock, pop, etc.). Which component of your recommendation system will likely perform the best?
The collaborative filtering component.
The performance of the collaborative filtering component is determined by the density of the ratings matrix. Because our app just launched, and because explicit feedback is so rare, our collaborative filtering recommendations will likely be poor.
The knowledge-based component.
Because the user has entered some basic preferences, and because we have reliable metadata for each song, we can recommend songs with the same metadata or allow users to find such songs on their own.
The content-based component.
The performance of the content-based component is determined by the quality of the representations of the content. In this case, we know very little about each song. Our genre labels are coarse and we have no representations of the raw audio itself.

In which of the following use cases is it recommended to go with a contextual bandit system?
Training two agents to cooperate with each other to win a multi-agent strategy game.
Forecasting demand for various products in a supermarket in a given time horizon.
Training a robot to walk.
Tailoring the results of a search engine to a specific user.

You would like to train an agent to drive a car. The action space consists of the following variables: the acceleration (between 0 and 300), the angular degree of turn or tilt (between 0 and 180 degrees), and the direction (either forward or reverse). Select the three algorithms which are appropriate.
Deep Q Networks
Proximal Policy Optimization (PPO)
Deep deterministic policy gradient (DDPG)
REINFORCE

Which of the following would make for suitable good value functions?
Scenario: You have a tennis video game. The reward is the negated value of the final score.
Scenario: You want to train an agent to win a race. The reward is the total time taken to run the race.
Scenario: You have a movie recommender system. The reward is the count of clicks.
Scenario: You want to train an agent to win a race. The reward is the negative value of the total time taken to run the race.

In which scenarios is reinforcement learning preferable over supervised learning?
When you have optimization or control problems with scarce data points and trial and error is impossible.
When you have predictive modeling problems with an offline static dataset.
When you have optimization or control problems where simulation trial and error is possible.
When you have predictive modeling with scarce data and a differentiable metric to be optimized.

Which of the following is not a motivating rationale to use replay buffers?
For achieving data efficiency.
For de-correlating experience trajectories.
For repeating rare experiences.
To keep the model policy well aligned with the newest experience

Which of the following steps is part of continuous integration and delivery (CI/CD) but not continuous training (CT)?
Measuring the model
Retraining the model
Building the model
Monitoring the model

Which of the following characteristics of delivering an ML model is considered as a characteristic of maturity level 0?
Manual, script-driven, and interactive process
Feature store integration
Pipeline continuous integration
Source control automation

What is the process of monitoring, measuring, retraining, and serving ML models automatically and continuously to adapt to changes in the data before they’re redeployed?
Continuous training
Continuous deployment
Continuous integration
Continuous delivery

What is the important aspect of MLOps which differs from DevOps?
MLOps constantly monitors, retrains, and serves the model.
MLOps focuses on a single software package or service.
MLOps deploys code and moves to another task.
MLOps tests and validates only the code and components.

What is the MLOps life cycle iterative process that retrains your production models with the new data?
Predictive serving
Continuous delivery
Continuous training
ML development

What component of an ML pipeline is responsible for deploying the model to any edge devices?
Analyze and transform
Upload model and deploy endpoint
Upload and track
Evaluate

How does end-to-end MLOps help ML practitioners with the machine learning life cycle?
End-to-end MLOps helps ML practitioners efficiently and responsibly manage, monitor, govern, and explain ML projects throughout the entire development lifecycle.
End-to-end MLOps lets ML practitioners only perform exploratory data analysis (EDA) and prototyping.
End-to-end MLOPs lets ML practitioners only monitor ML models.
End-to-end MLOPs lets ML practitioners only train and tune ML models.

Suppose you want to develop a supervised machine learning model to predict whether a given email is "spam" or "not spam." Which of the following statements are true?
Emails not marked as "spam" or "not spam" are unlabeled examples.
Because our label consists of the values "spam" and "not spam", any email not yet marked as spam or not spam is an unlabeled example.
Words in the subject header will make good labels.
Words in the subject header might make excellent features, but they won't make good labels.
We'll use unlabeled examples to train the model.
We'll use labeled examples to train the model. We can then run the trained model against unlabeled examples to infer whether the unlabeled email messages are spam or not spam.
The labels applied to some examples might be unreliable.
Definitely. It's important to check how reliable your data is. The labels for this dataset probably come from email users who mark particular email messages as spam. Since most users do not mark every suspicious email message as spam, we may have trouble knowing whether an email is spam. Furthermore, spammers could intentionally poison our model by providing faulty labels.

Suppose an online shoe store wants to create a supervised ML model that will provide personalized shoe recommendations to users. That is, the model will recommend certain pairs of shoes to Marty and different pairs of shoes to Janet. The system will use past user behavior data to generate training data. Which of the following statements are true?
"Shoe size" is a useful feature.
"Shoe size" is a quantifiable signal that likely has a strong impact on whether the user will like the recommended shoes. For example, if Marty wears size 9, the model shouldn't recommend size 7 shoes.
"Shoe beauty" is a useful feature.
Good features are concrete and quantifiable. Beauty is too vague a concept to serve as a useful feature. Beauty is probably a blend of certain concrete features, such as style and color. Style and color would each be better features than beauty.
"The user clicked on the shoe's description" is a useful label.
Users probably only want to read more about those shoes that they like. Clicks by users is, therefore, an observable, quantifiable metric that could serve as a good training label. Since our training data derives from past user behavior, our labels need to derive from objective behaviors like clicks that strongly correlate with user preferences
"Shoes that a user adores" is a useful label.
Adoration is not an observable, quantifiable metric. The best we can do is search for observable proxy metrics for adoration.

A plot of 10 points. A line runs through 6 of the points. 2 points are 1 A plot of 10 points. A line runs through 8 of the points. 1 point is 2
Which of the two data sets shown in the preceding plots has the higher Mean Squared Error (MSE)?
The dataset on the left.
The six examples on the line incur a total loss of 0. The four examples not on the line are not very far off the line, so even squaring their offset still yields a low value: $$ MSE = \frac{0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 0^2} {10} = 0.4$$
The dataset on the right.
The eight examples on the line incur a total loss of 0. However, although only two points lay off the line, both of those points are twice as far off the line as the outlier points in the left figure. Squared loss amplifies those differences, so an offset of two incurs a loss four times as great as an offset of one. $$ MSE = \frac{0^2 + 0^2 + 0^2 + 2^2 + 0^2 + 0^2 + 0^2 + 2^2 + 0^2 + 0^2} {10} = 0.8$$

When performing gradient descent on a large data set, which of the following batch sizes will likely be more efficient?
The full batch.
Computing the gradient from a full batch is inefficient. That is, the gradient can usually be computed far more efficiently (and just as accurately) from a smaller batch than from a vastly bigger full batch.
A small batch or even a batch of one example (SGD).
Amazingly enough, performing gradient descent on a small batch or even a batch of one example is usually more efficient than the full batch. After all, finding the gradient of one example is far cheaper than finding the gradient of millions of examples. To ensure a good representative sample, the algorithm scoops up another random small batch (or batch of one) on every iteration.

We looked at a process of using a test set and a training set to drive iterations of model development. On each iteration, we'd train on the training data and evaluate on the test data, using the evaluation results on test data to guide choices of and changes to various model hyperparameters like learning rate and features. Is there anything wrong with this approach?
Totally fine, we're training on training data and evaluating on separate, held-out test data.
Actually, there's a subtle issue here. Think about what might happen if we did many, many iterations of this form.
Doing many rounds of this procedure might cause us to implicitly fit to the peculiarities of our specific test set.
Yes indeed! The more often we evaluate on a given test set, the more we are at risk for implicitly overfitting to that one test set.
This is computationally inefficient. We should just pick a default set of hyperparameters and live with them to save resources.
Although these sorts of iterations are expensive, they are a critical part of model development. Hyperparameter settings can make an enormous difference in model quality, and we should always budget some amount of time and computational resources to ensure we're getting the best quality we can.

Different cities in California have markedly different housing prices. Suppose you must create a model to predict housing prices. Which of the following sets of features or feature crosses could learn city-specific relationships between roomsPerPerson and housing price?
Three separate binned features: [binned latitude], [binned longitude], [binned roomsPerPerson]
Binning is good because it enables the model to learn nonlinear relationships within a single feature. However, a city exists in more than one dimension, so learning city-specific relationships requires crossing latitude and longitude.
One feature cross: [latitude X longitude X roomsPerPerson]
In this example, crossing real-valued features is not a good idea. Crossing the real value of, say, latitude with roomsPerPerson enables a 10% change in one feature (say, latitude) to be equivalent to a 10% change in the other feature (say, roomsPerPerson).
One feature cross: [binned latitude X binned longitude X binned roomsPerPerson]
Crossing binned latitude with binned longitude enables the model to learn city-specific effects of roomsPerPerson. Binning prevents a change in latitude producing the same result as a change in longitude. Depending on the granularity of the bins, this feature cross could learn city-specific or neighborhood-specific or even block-specific effects.
Two feature crosses: [binned latitude X binned roomsPerPerson] and [binned longitude X binned roomsPerPerson]
Binning is a good idea; however, a city is the conjunction of latitude and longitude, so separate feature crosses prevent the model from learning city-specific prices.

Imagine a linear model with 100 input features:
  • 10 are highly informative.
  • 90 are non-informative.
  • Assume that all features have values between -1 and 1. Which of the following statements are true?
    L2 regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.
    Yes, L2 regularization encourages weights to be near 0.0, but not exactly 0.0.
    L2 regularization will encourage most of the non-informative weights to be exactly 0.0.
    L2 regularization does not tend to force weights to exactly 0.0. L2 regularization penalizes larger weights more than smaller weights. As a weight gets close to 0.0, L2 "pushes" less forcefully toward 0.0.
    L2 regularization may cause the model to learn a moderate weight for some non-informative features.
    Surprisingly, this can happen when a non-informative feature happens to be correlated with the label. In this case, the model incorrectly gives such non-informative features some of the "credit" that should have gone to informative features.

    Imagine a linear model with two strongly correlated features; that is, these two features are nearly identical copies of one another but one feature contains a small amount of random noise. If we train this model with L2 regularization, what will happen to the weights for these two features?
    Both features will have roughly equal, moderate weights.
    L2 regularization will force the features towards roughly equivalent weights that are approximately half of what they would have been had only one of the two features been in the model.
    One feature will have a large weight; the other will have a weight of almost 0.0.
    L2 regularization penalizes large weights more than small weights. So, even if one weight started to drop faster than the other, L2 regularization would tend to force the bigger weight to drop more quickly than the smaller weight.
    One feature will have a large weight; the other will have a weight of exactly 0.0.
    L2 regularization rarely forces weights to exactly 0.0. By contrast, L1 regularization does force weights to exactly 0.0.

    In which of the following scenarios would a high accuracy value suggest that the ML model is doing a good job?
    A deadly, but curable, medical condition afflicts .01% of the population. An ML model uses symptoms as features and predicts this affliction with an accuracy of 99.99%.
    Accuracy is a poor metric here. After all, even a "dumb" model that always predicts "not sick" would still be 99.99% accurate. Mistakenly predicting "not sick" for a person who actually is sick could be deadly.
    An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of 99.99%.
    A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better than chance. In some settings, however, the cost of making even a small number of mistakes is still too high. 99.99% accuracy means that the expensive chicken will need to be replaced, on average, every 10 days. (The chicken might also cause extensive damage to cars that it hits.)
    In the game of roulette, a ball is dropped on a spinning wheel and eventually lands in one of 38 slots. Using visual features (the spin of the ball, the position of the wheel when the ball was dropped, the height of the ball over the wheel), an ML model can predict the slot that the ball will land in with an accuracy of 4%.
    This ML model is making predictions far better than chance; a random guess would be correct 1/38 of the time—yielding an accuracy of 2.6%. Although the model's accuracy is "only" 4%, the benefits of success far outweigh the disadvantages of failure.

    Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision?
    Definitely increase.
    Raising the classification threshold typically increases precision; however, precision is not guaranteed to increase monotonically as we raise the threshold.
    Probably increase.
    In general, raising the classification threshold reduces false positives, thus raising precision.
    Probably decrease.
    In general, raising the classification threshold reduces false positives, thus raising precision.
    Definitely decrease.
    In general, raising the classification threshold reduces false positives, thus raising precision.

    Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to recall?
    Always increase.
    Raising the classification threshold will cause both of the following:
    • The number of true positives will decrease or stay the same.
    • The number of false negatives will increase or stay the same.
    Thus, recall will never increase.
    Always decrease or stay the same.
    Raising our classification threshold will cause the number of true positives to decrease or stay the same and will cause the number of false negatives to increase or stay the same. Thus, recall will either stay constant or decrease.
    Always stay constant.
    Raising our classification threshold will cause the number of true positives to decrease or stay the same and will cause the number of false negatives to increase or stay the same. Thus, recall will either stay constant or decrease.

    Consider two models—A and B—that each evaluate the same dataset. Which one of the following statements is true?
    If Model A has better precision than model B, then model A is better.
    While better precision is good, it might be coming at the expense of a large reduction in recall. In general, we need to look at both precision and recall together, or summary metrics like AUC which we'll talk about next.
    If model A has better recall than model B, then model A is better.
    While better recall is good, it might be coming at the expense of a large reduction in precision. In general, we need to look at both precision and recall together, or summary metrics like AUC, which we'll talk about next.
    If model A has better precision and better recall than model B, then model A is probably better.
    In general, a model that outperforms another model on both precision and recall is likely the better model. Obviously, we'll need to make sure that comparison is being done at a precision / recall point that is useful in practice for this to be meaningful. For example, suppose our spam detection model needs to have at least 90% precision to be useful and avoid unnecessary false alarms. In this case, comparing one model at {20% precision, 99% recall} to another at {15% precision, 98% recall} is not particularly instructive, as neither model meets the 90% precision requirement. But with that caveat in mind, this is a good way to think about comparing models when using precision and recall.

    Which of the following ROC curves produce AUC values greater than 0.5?
    An ROC curve with a vertical line running from (0,0) to (0,1), and a horizontal from (0,1) to (1,1). The TP rate is 1.0 for all FP rates.

    This is the best possible ROC curve, as it ranks all positives above all negatives. It has an AUC of 1.0.

    In practice, if you have a "perfect" classifier with an AUC of 1.0, you should be suspicious, as it likely indicates a bug in your model. For example, you may have overfit to your training data, or the label data may be replicated in one of your features.

    An ROC curve with a horizontal line running from (0,0) to (1,0), and a vertical line from (1,0) to (1,1). The FP rate is 1.0 for all TP rates
    This is the worst possible ROC curve; it ranks all negatives above all positives, and has an AUC of 0.0. If you were to reverse every prediction (flip negatives to positives and positives to negatives), you'd actually have a perfect classifier!
    An ROC curve with one diagonal line running from (0,0) to (1,1). TP and FP rates increase linearly at the same rate.
    This ROC curve has an AUC of 0.5, meaning it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing.
    An ROC curve that arcs up and right from (0,0) to (1,1). TP rate increases at a faster rate than FP rate.
    This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range.
    An ROC curve that arcs right and up from (0,0) to (1,1). FP rate increases at a faster rate than TP rate.
    This ROC curve has an AUC between 0 and 0.5, meaning it ranks a random positive example higher than a random negative example less than 50% of the time. The corresponding model actually performs worse than random guessing! If you see an ROC curve like this, it likely indicates there's a bug in your data.

    How would multiplying all of the predictions from a given model by 2.0 (for example, if the model predicts 0.4, we multiply by 2.0 to get a prediction of 0.8) change the model's performance as measured by AUC?
    No change. AUC only cares about relative prediction scores.
    Yes, AUC is based on the relative predictions, so any transformation of the predictions that preserves the relative ranking has no effect on AUC. This is clearly not the case for other metrics such as squared error, log loss, or prediction bias.
    It would make AUC terrible, since the prediction values are now way off.
    Interestingly enough, even though the prediction values are different (and likely farther from the truth), multiplying them all by 2.0 would keep the relative ordering of prediction values the same. Since AUC only cares about relative rankings, it is not impacted by any simple scaling of the predictions.
    It would make AUC better, because the prediction values are all farther apart.
    The amount of spread between predictions does not actually impact AUC. Even a prediction score for a randomly drawn true positive is only a tiny epsilon greater than a randomly drawn negative, that will count that as a success contributing to the overall AUC score.

    Imagine a linear model with 100 input features:
  • 10 are highly informative.
  • 90 are non-informative.
  • Assume that all features have values between -1 and 1. Which of the following statements are true?
    L1 regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.
    In general, L1 regularization of sufficient lambda tends to encourage non-informative features to weights of exactly 0.0. Unlike L2 regularization, L1 regularization "pushes" just as hard toward 0.0 no matter how far the weight is from 0.0.
    L1 regularization will encourage most of the non-informative weights to be exactly 0.0.
    L1 regularization of sufficient lambda tends to encourage non-informative weights to become exactly 0.0. By doing so, these non-informative features leave the model.
    L1 regularization may cause informative features to get a weight of exactly 0.0.
    Be careful--L1 regularization may cause the following kinds of features to be given weights of exactly 0:
  • Weakly informative features.
  • Strongly informative features on different scales.
  • Informative features strongly correlated with other similarly informative features.

  • Imagine a linear model with 100 input features, all having values between -1 and 1:
  • 10 are highly informative.
  • 90 are non-informative.
  • Which type of regularization will produce the smaller model?
    L2 regularization.
    L2 regularization rarely reduces the number of features. In other words, L2 regularization rarely reduces the model size.
    L1 regularization.
    L1 regularization tends to reduce the number of features. In other words, L1 regularization often reduces the model size.

    Which one of the following statements is true of dynamic (online) training?
    The model stays up to date as new data arrives.
    This is the primary benefit of online training—we can avoid many staleness issues by allowing the model to train on new data as it comes in.
    Very little monitoring of training jobs needs to be done.
    Actually, you must continuously monitor training jobs to ensure that they are healthy and working as intended. You'll also need supporting infrastructure like the ability to roll a model back to a previous snapshot in case something goes wrong in training, such as a buggy job or corruption in input data.
    Very little monitoring of input data needs to be done at inference time.
    Just like a static, offline model, it is also important to monitor the inputs to the dynamically updated models. We are likely not at risk for large seasonality effects, but sudden, large changes to inputs (such as an upstream data source going down) can still cause unreliable predictions.

    Which of the following statements are true about static (offline) training?
    The model stays up to date as new data arrives.
    Actually, if we train offline, then the model has no way to incorporate new data as it arrives. This can lead to model staleness, if the distribution we are trying to learn from changes over time.
    You can verify the model before applying it in production.
    Yes, offline training gives ample opportunity to verify model performance before introducing the model in production.
    Offline training requires less monitoring of training jobs than online training.
    In general, monitoring requirements at training time are more modest for offline training, which insulates us from many production considerations. However, the more frequently you train your model, the higher the investment you'll need to make in monitoring. You'll also want to validate regularly to ensure that changes to your code (and its dependencies) don't adversely affect model quality.
    Very little monitoring of input data needs to be done at inference time.
    Counterintuitively, you do need to monitor input data at serving time. If the input distributions change, then our model's predictions may become unreliable. Imagine, for example, a model trained only on summertime clothing data suddenly being used to predict clothing buying behavior in wintertime.

    In offline inference, we make predictions on a big batch of data all at once. We then put those predictions in a look-up table for later use. Which of the following are true of offline inference?
    We must create predictions for all possible inputs.
    Yes, we will have to make predictions for all possible inputs and store them into a cache or lookup table to use offline inference. This is one of the drawbacks of offline inference. We will only be able to serve a prediction for those examples that we already know about. This is fine if the set of things that we're predicting is limited, like all world cities or all items in a database table. But for freeform inputs like user queries that have a long tail of unusual or rare items, we would not be able to provide full coverage with an offline-inference system.
    After generating the predictions, we can verify them before applying them.
    This is indeed one useful thing about offline inference. We can sanity check and verify all of our predictions before they are used.
    For a given input, we can serve a prediction more quickly than with online inference.
    We will need to carefully monitor our input signals over a long period of time.
    This is the one case where we don't actually need to monitor input signals over a long period of time. This is because once the predictions have been written to a look-up table, we're no longer dependent on the input features. Note that any subsequent update of the model will require a new round of input verification.
    We will be able to react quickly to changes in the world.
    No, this is a drawback of offline inference. We'll need to wait until a new set of predictions have been written to the look-up table before we can respond differently based on any changes in the world.

    Dynamic (online) inference means making predictions on demand. That is, in online inference, we put the trained model on a server and issue inference requests as needed. Which of the following are true of dynamic inference?
    You can provide predictions for all possible items.
    Yes, this is a strength of online inference. Any request that comes in will be given a score. Online inference handles long-tail distributions (those with many rare items), like the space of all possible sentences written in movie reviews.
    You can do post-verification of predictions before they are used.
    In general, it's not possible to do a post-verification of all predictions before they get used because predictions are being made on demand. You can, however, potentially monitor aggregate prediction qualities to provide some level of sanity checking, but these will signal fire alarms only after the fire has already spread.
    You must carefully monitor input signals.
    Yes. Signals could change suddenly due to upstream issues, harming our predictions.
    When performing online inference, you do not need to worry about prediction latency (the lag time for returning predictions) as much as when performing offline inference.
    Prediction latency is often a real concern in online inference. Unfortunately, you can't necessarily fix prediction latency issues by adding more inference servers.

    Which of the following models are susceptible to a feedback loop?
    A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.
    Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.
    A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
    Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.
    A university-ranking model that rates schools in part by their selectivity—the percentage of students who applied that were admitted.
    An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.
    If the model does not publish its forecast until after the polls have closed, it is not possible for its predictions to affect voter behavior.
    A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.
    It is not possible to quickly change a house's location, size, or number of bedrooms in response to price forecasts, making a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be teased apart.
    A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.
    There is no feedback loop here, as model predictions don't have any impact on our photo database. However, versioning of our input data is a concern here, as these monthly updates could potentially have unforeseen effects on the model.

    Which of the following model's predictions have been affected by selection bias?
    A German handwriting recognition smartphone app uses a model that frequently incorrectly classifies ß (Eszett) characters as B characters, because it was trained on a corpus of American handwriting samples, mostly written in English.
    This model was affected by a type of selection bias called coverage bias: the training data (American English handwriting) was not representative of the type of data provided by the model's target audience (German handwriting).
    Engineers built a model to predict the likelihood of a person developing diabetes based on their daily food intake. The model was trained on 10,000 "food diaries" collected from a randomly chosen group of people worldwide representing a variety of different age groups, ethnic backgrounds, and genders. However, when the model was deployed, it had very poor accuracy. Engineers subsequently discovered that food diary participants were reluctant to admit the true volume of unhealthy foods they ate, and were more likely to document consumption of nutritious food than less healthy snacks.
    There is no selection bias in this model; participants who provided training data were a representative sampling of users and were chosen randomly. Instead, this model was affected by reporting bias. Ingestion of unhealthy foods was reported at a much lower frequency than true real-world occurrence.
    Engineers at a company developed a model to predict staff turnover rates (the percentage of employees quitting their jobs each year) based on data collected from a survey sent to all employees. After several years of use, engineers determined that the model underestimated turnover by more than 20%. When conducting exit interviews with employees leaving the company, they learned that more than 80% of people who were dissatisfied with their jobs chose not to complete the survey, compared to a company-wide opt-out rate of 15%.
    This model was affected by a type of selection bias called non-response bias. People who were dissatisfied with their jobs were underrepresented in the training data set because they opted out of the company-wide survey at much higher rates than the entire employee population.
    Engineers developing a movie-recommendation system hypothesized that people who like horror movies will also like science-fiction movies. When they trained a model on 50,000 users' watchlists, however, it showed no such correlation between preferences for horror and for sci-fi; instead it showed a strong correlation between preferences for horror and for documentaries. This seemed odd to them, so they retrained the model five more times using different hyperparameters. Their final trained model showed a 70% correlation between preferences for horror and for sci-fi, so they confidently released it into production.
    There is no evidence of selection bias, but this model may have instead been affected by experimenter's bias, as the engineers kept iterating on their model until it confirmed their preexisting hypothesis.

    A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic"):

    Adults

    True Positives (TPs): 512 False Positives (FPs): 51
    False Negatives (FNs): 36 True Negatives (TNs): 9401
    $$\text{Precision} = \frac{TP}{TP+FP} = 0.909$$
    $$\text{Recall} = \frac{TP}{TP+FN} = 0.934$$

    Minors

    True Positives (TPs): 2147 False Positives (FPs): 96
    False Negatives (FNs): 2177 True Negatives (TNs): 5580
    $$\text{Precision} = \frac{TP}{TP+FP} = 0.957$$
    $$\text{Recall} = \frac{TP}{TP+FN} = 0.497$$
    Which of the following statements about the model's test-set performance are true?
    Overall, the model performs better on examples from adults than on examples from minors.
    The model achieves both precision and recall rates over 90% when detecting sarcasm in text messages from adults.

    While the model achieves a slightly higher precision rate for minors than adults, the recall rate is substantially lower for minors, resulting in less reliable predictions for this group.

    The model fails to classify approximately 50% of minors' sarcastic messages as "sarcastic."
    The recall rate of 0.497 for minors indicates that the model predicts "not sarcastic" for approximately 50% of minors' sarcastic texts.
    Approximately 50% of messages sent by minors are classified as "sarcastic" incorrectly.
    The precision rate of 0.957 indicates that over 95% of minors' messages classified as "sarcastic" are actually sarcastic.
    The 10,000 messages sent by adults are a class-imbalanced dataset.
    If we compare the number of messages from adults that are actually sarcastic (TP+FN = 548) with the number of messages that are actually not sarcastic (TN + FP = 9452), we see that "not sarcastic" labels outnumber "sarcastic" labels by a ratio of approximately 17:1.
    The 10,000 messages sent by minors are a class-imbalanced dataset.
    If we compare the number of messages from minors that are actually sarcastic (TP+FN = 4324) with the number of messages that are actually not sarcastic (TN + FP = 5676), we see that there is a 1.3:1 ratio of "not sarcastic" labels to "sarcastic" labels. Given that the distribution of labels between the two classes is quite close to 50/50, this is not a class-imbalanced dataset.

    A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic"):

    Adults

    True Positives (TPs): 512 False Positives (FPs): 51
    False Negatives (FNs): 36 True Negatives (TNs): 9401
    $$\text{Precision} = \frac{TP}{TP+FP} = 0.909$$
    $$\text{Recall} = \frac{TP}{TP+FN} = 0.934$$

    Minors

    True Positives (TPs): 2147 False Positives (FPs): 96
    False Negatives (FNs): 2177 True Negatives (TNs): 5580
    $$\text{Precision} = \frac{TP}{TP+FP} = 0.957$$
    $$\text{Recall} = \frac{TP}{TP+FN} = 0.497$$
    Engineers are working on retraining this model to address inconsistencies in sarcasm-detection accuracy across age demographics, but the model has already been released into production. Which of the following stopgap strategies will help mitigate errors in the model's predictions?
    Restrict the model's usage to text messages sent by adults.

    The model performs well on text messages from adults (with precision and recall rates both above 90%), so restricting its use to this group will sidestep the systematic errors in classifying minors' text messages.

    When the model predicts "not sarcastic" for text messages sent by minors, adjust the output so the model returns a value of "unsure" instead.

    The precision rate for text messages sent by minors is high, which means that when the model predicts "sarcastic" for this group, it is nearly always correct.

    The problem is that recall is very low for minors; The model fails to identify sarcasm in approximately 50% of examples. Given that the model's negative predictions for minors are no better than random guesses, we can avoid these errors by not providing a prediction in these cases.

    Restrict the model's usage to text messages sent by minors.

    The systematic errors in this model are specific to text messages sent by minors. Restricting the model's use to the group more susceptible to error would not help.

    Adjust the model output so that it returns "sarcastic" for all text messages sent by minors, regardless of what the model originally predicted.

    Always predicting "sarcastic" for minors' text messages would increase the recall rate from 0.497 to 1.0, as the model would no longer fail to identify any messages as sarcastic. However, this increase in recall would come at the expense of precision. All the true negatives would be changed to false positives:

    True Positives (TPs): 4324 False Positives (FPs): 5676
    False Negatives (FNs): 0 True Negatives (TNs): 0

    which would decrease the precision rate from 0.957 to 0.432. So, adding this calibration would change the type of error but would not mitigate the magnitude of the error.


    An industrial company wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases. During training, your custom model converges, but the tests are giving unsatisfactory results.

    What do you think might be the problem, and how could you proceed to fix it?
    You have used too few examples, you need to re-train with a larger set of images
    When you have a different trend between training and validation, you have an overfitting problem. More data may help you, but you have to simplify the model first.
    You have to change the type of algorithm and use XGBoost
    The problem is not with the algorithm but is within feature management.
    You have an overfitting problem
    Decrease your Learning Rate hyperparameter
    Decreasing the Learning Rate hyperparameter is useless. The model converges in training.
    The model is too complex, you have to regularize the model and then make it simpler
    Use L2 Ridge Regression

    You need to develop and train a model capable of analyzing snapshots taken from a moving vehicle and detecting if obstacles arise. Your work environment is Vertex AI.

    Which technique or algorithm do you think is best to use?
    TabNet algorithm with TensorFlow
    TabNet is used with tabular data, not images. It is a neural network that chooses the best features at each decision step in such a way that the model is optimized simpler.
    A linear learner with Tensorflow Estimator API
    A linear learner is not suitable for images too. It can be applied to regression and classification predictions.
    XGBoost with BigQuery ML
    BigQuery ML is designed for structured data, not images.
    TensorFlow Object Detection API
    TensorFlow Object Detection API is designed to identify and localize multiple objects within an image. So it is the best solution.


    Your team works on a smart city project with wireless sensor networks and a set of gateways for transmitting sensor data. You have to cope with many design choices. You want, for each of the problems under study, to find the simplest solution.
    For example, it is necessary to decide on the placement of nodes so that the result is the most economical and inclusive. An algorithm without data tagging must be used.

    Which of the following choices do you think is the most suitable?
    K-means
    K-means is an unsupervised learning algorithm used for clustering problems. It is useful when you have to create similar groups of entities. So, even if there is no need to label data, it is not suitable for our scope.
    Q-learning
    Q-learning is an RL Reinforcement Learning algorithm. RL provides a software agent that evaluates possible solutions through a progressive reward in repeated attempts. It does not need to provide labels. But it requires a lot of data and several trials and the possibility to evaluate the validity of each attempt.
    The main RL algorithms are deep Q-network (DQN) and deep deterministic policy gradient (DDPG).

    K-Nearest Neighbors
    K-NN is a supervised classification algorithm, therefore, labeled. New classifications are made by finding the closest known examples.
    Support Vector Machine(SVM)
    SVM is a supervised ML algorithm, too. K-NN distances are computed. These distances are not between data points, but with a hyper-plane, that better divides different classifications.

    The purpose of your current project is the recognition of genuine or forged signatures on checks and documents against regular signatures already stored by the Bank. There is obviously a very low incidence of fake signatures. The system must recognize which customer the signature belongs to and whether the signature is identified as genuine or skilled forged.

    What kind of ML model do you think is best to use?
    Binary logistic regression
    Binary logistic regression deals with a classification problem that may result in true or false, like with spam emails. The issue here is far more complex.
    Matrix Factorization
    Matrix Factorization is used in recommender systems, like movies on Netflix. It is based on a user-item (movie) interaction matrix and the problem of reducing dimensionality.
    Convolutional Neural Networks
    A Convolutional Neural Network is a Deep Neural Network in which the layers are made up of processed sections of the source image. This technique allows you to simplify images and highlight shapes and features regardless of the physical position in which they may be found.
    For example, if we have the same signature in the center or at the bottom right of an image, the object will be different. But the signature is the same. A neural network that compares these derived features and can simplify the model achieves the best results.

    Multiclass logistic regression
    Multiclass logistic regression deals with a classification problem with multiple solutions, fixed and finite classes. It is an extension of binary logistic regression with basically the same principles with the assumption of several independent variables. But in image recognition problems, the best results are achieved with CNN because they are capable of finding and relating patterns positioned in different ways on the images.

    The purpose of your current project is the recognition of genuine or forged signatures on checks and documents against regular signatures already stored by the Bank. There is obviously a very low incidence of fake signatures. The system must recognize which customer the signature belongs to and whether the signature is identified as genuine or skilled forged.

    Which of the following technical specifications can't you use with CNN?
    Kernel Selection
    Filters or kernels are a computation on a sub-matrix of pixels.
    Feature Cross
    A cross of functions is a dome that creates new functions by multiplying (crossing) two or more functions.
    It has proved to be an important technique and is also used to introduce non-linearity to the model. We don't need it in our case.
    Stride
    Stride is obtained by sliding the kernel by 1 pixel.
    Max pooling layer
    A Max pooling layer is created taking the max value of a small region. It is used for simplification.

    Your client has a large e-commerce Website that sells sports goods and especially scuba diving equipment. It has a seasonal business and has collected many sales data from its structured ERP and market trend databases. It wants to predict the demand of its customers both to increase business and improve logistics processes.

    Which of the following types of models and techniques should you focus on to obtain results quickly and with minimum effort?
    Custom Tensorflow model with an autoencoder neural network
    A custom Tensorflow model needs more time and effort. Moreover, an autoencoder is a type of artificial neural network that is used in the case of unlabeled data (unsupervised learning). The autoencoder is an excellent system for generalization and therefore to reduce dimensionality, training the network to ignore insignificant data ("noise") is not our scope.
    Bigquery ML ARIMA
    We need to manage time-series data. Bigquery ML ARIMA_PLUS can manage time-series forecasts. The model automatically handles anomalies, seasonality, and holidays.
    BigQuery Boosted Tree
    Boosted Tree is an ensemble of Decision Trees, so not suitable for time series.
    BigQuery Linear regression
    Linear Regression cuts off seasonality. It is not what the customer wants.

    Your team is designing a fraud detection system for a major Bank. The requirements are:
    • Various banking applications will send transactions to the new system in real-time and in standard/normalized format.
    • The data will be stored in real-time with some statistical aggregations.
    • An ML model will be periodically trained for outlier detection.
    • The ML model will issue the probability of fraud for each transaction.
    • It is preferable to have no labeling and as little software development as possible.

    Which products would you choose?
    Dataprep
    Dataproc
    Dataflow Flex
    Pub/Sub
    Composer
    BigQuery
    BigTable

    Your team is designing a fraud detection system for a major Bank. The requirements are:
    • Various banking applications will send transactions to the new system in real-time and in standard/normalized format.
    • The data will be stored in real-time with some statistical aggregations.
    • An ML model will be periodically trained for outlier detection.
    • The ML model will issue the probability of fraud for each transaction.
    • It is preferable to have no labeling and as little software development as possible.

    Which kinds of ML model could be used?
    K-means
    The k-means clustering is a mathematical and statistical method on numerical vectors that divides and observes k clusters. Each example belongs to the cluster with the closest mean (cluster centroid).
    In ML, it is an unsupervised classification method and is widely used to detect unusual or outlier movements. For these reasons, it is one of the main methods for fraud detection.
    But it is not the only method because not all frauds are linked to strange movements. There may be other factors.
    Decision Tree
    Decision Tree is suboptimal because of just Decision Trees.
    Random Forest
    Random Forest is suboptimal because of just Decision Trees.
    Matrix Factorization
    Matrix Factorization is for recommender systems. So, it predicts the preference of an item based on the experience of other users. Not suitable for us.
    Boosted Tree - XGBoost
    XGBoost, which as you can see from the figure, is an evolution of the decision trees, has recently been widely used in this field and has had many positive results.

    It is an open-source project and this is the description from its Github page:
    XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. The same code runs on major distributed environments (Kubernetes, Hadoop, SGE, MPI, Dask) and can solve problems beyond billions of examples.

    In your company, you train and deploy several ML models with Tensorflow. You use on-prem servers, but you often find it challenging to manage the most expensive training and control and update the models. You are looking for a system that can handle all these tasks.

    Which solutions can you adopt?
    Kubeflow to run on Google Kubernetes Engine
    Kubeflow Pipelines is an open-source platform designed specifically for creating and deploying ML workflows based on Docker containers.
    Their main features:
    • Using packaged templates in Docker images in a K8s environment
    • Manage your various tests/experiments
    • Simplifying the orchestration of ML pipelines
    • Reuse components and pipelines
    Vertex AI
    Vertex AI is an integrated suite of ML services that:
    • Train an ML model both without code (AutoML) and with custom
    • Evaluate and tune a model
    • Deploy models
    • Manage prediction: Batch, Online and monitoring
    • Manage model versions: workflows and retraining
    • Manage the complete model maintenance cycle
    Use Scikit-Learn that is simple and powerful
    Scikit-learn is an ML platform with many standard algorithms easy and immediate to use. TensorFlow (from the official doc) is an end-to-end open-source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art into ML, and developers easily build and deploy ML-powered applications.
    So, there are 2 different platforms, even if there is Scikit Flow that integrates the two.
    Scikit-learn doesn't manage ML Pipelines.
    Use SageMaker managed services
    SageMaker is an AWS ML product.

    You have an NLP model for your company's Customer Care and Support Office. This model evaluates the general satisfaction of customers on the main categories of services offered and has always provided satisfactory performances.

    You have recently expanded the range of your services and want to refine / update your model. You also want to activate procedures that automate these processes.

    Which choices among the following do you prefer in the Cloud GCP?
    You don't need to change anything. If the model is well made and has no overfitting, it will be able to handle anything.
    Retrain using information from the last week of work only.
    Add examples with new product data and still regularly re-train and evaluate new models.
    Creating and using templates is not a one-shot activity. But, like most processes, it is an ongoing one, because the underlying factors can vary over time.
    Therefore, you need to continuously monitor the processes and retrain the model also on newer data, if you find that the frequency distributions of the data vary from the original configuration. It may also be necessary or desirable to create a new model.
    Generally, a periodic schedule is adopted every month or week.
    For this very reason, all the other answers are not exact.
    Make a separate model with new product data and create the model ensemble.

    Your company is designing a series of models aimed at optimal customer care management.

    For this purpose, all written and voice communications with customers are recorded so that they can be classified and managed.
    The problem is that Clients often provide private information that cannot be distributed and disclosed.

    Which of the following techniques can you use?
    Cloud Data Loss Prevention API (DLP)
    Cloud Data Loss Prevention is a managed service specially designed to discover sensitive data automatically that may be protected. It could be used for personal codes, credit card numbers, addresses and any private contact details, etc.

    CNN - Convolutional Neural Network
    A Convolutional Neural Network is a Deep Neural Network in which the layers are made up of processed sections of the source image. So, it is a successful method for image and shape classification.
    Cloud Speech API
    Cloud Speech API is useful if you have audio recordings as it is a speech-to-text service.
    Vision API
    Vision API has a built-in text-detection service. So you can get text from images.

    Your team is working for a major apparel company that is developing an online business with significant investments.

    The company adopted Analytics-360. So, it can achieve a lot of data on the activities of its customers and on the interest of the various commercial initiatives of the websites, such as (from Google Analytics-360):
    • Average bounce rate per dimension
    • Average number of product page views by purchaser type
    • Average number of transactions per purchaser
    • Average amount of money spent per session
    • Sequence of hits (pathing analysis)
    • Multiple custom dimensions at hit or session level
    • Average number of user interactions before purchase
    The first thing management wants is to categorize customers to determine which types are more likely to buy.

    Subsequently, further models will be created to incentivize the most interesting customers better and boost sales.

    You have a lot of work to do and you want to start quickly.

    What techniques do you use in this first phase?
    BigQuery e BigQuery ML
    It is necessary to create different groups of customers based on purchases and their characteristics for these requirements.
    We are in the field of unsupervised learning. BigQuery is already set up both for data acquisition and for training, validation and use of this kind of model.
    Cloud Storage con AVRO
    Vertex AI TensorBoard
    Vertex AI TensorBoard is suitable to set up visualizations for ML experiments.
    Binary Classification
    K-means
    The K-means model in BigQuery ML uses a technique called clustering. Clustering is a statistical technique that allows, in our case, to classify customers with similar behaviors for marketing automatically.
    KNN
    Deep Neural Network

    Your team prepared a custom model with Tensorflow that forecasts, based on diagnostic images, which cases need more analysis and medical support.

    The accuracy of the model is very high. But when it is deployed in production, the medical staff is very dissatisfied.

    What is the most likely motivation?
    Logistic regression with a classification threshold too high
    DNN Model with overfitting
    DNN Model with underfitting
    You have to perform feature crosses

    You work in a company that has acquired an advanced consulting services company. Management wants to analyze all past important projects and key customer relationships. The consulting company does not have an application that manages this data in a structured way but is certified for the quality of its services. All its documents follow specific rules.

    It was decided to acquire structured information on projects, areas of expertise and customers through the analysis of these documents.

    You're looking for ML methodologies that make this process quicker and easier.

    What is the better choice in GCP?
    Vision API
    Cloud Natural Language API
    Document AI
    Document AI is the ideal broad-spectrum solution. It is a service that gives a complete solution with computer vision and OCR, NLP and data management. It allows you to extract and structure information automatically from documents. It can also enrich them with the Google Knowledge Graph to verify company names, addresses, and telephone numbers to draw additional or updated information.
    All other answers are incorrect because their functions are already built into Document AI.
    AutoML Natural Language

    Your customer has an online dating platform that, among other things, analyzes the degree of affinity between the various people. Obviously, it already uses ML models and uses, in particular, XGBoost, the gradient boosting decision tree algorithm, and is obtaining excellent results.

    All its development processes follow CI / CD specifications and use Docker containers. The requirement is to classify users in various ways and update models frequently, based on new parameters entered into the platform by the users themselves.

    So, the problem you are called to solve is how to optimize frequently re-trained operations with an optimized workflow system.

    Which solution among these proposals can best solve your needs?
    Deploy the model on BigQuery ML and setup a job
    Use Kubeflow Pipelines to design and execute your workflow
    Kubeflow Pipelines is the ideal solution because it is a platform designed specifically for creating and deploying ML workflows based on Docker containers. So, it is the only answer that meets all requirements.
    The main functions of Kubeflow Pipelines are:
    • Using packaged templates in Docker images in a K8s environment
    • Manage your various tests/experiments
    • Simplifying the orchestration of ML pipelines
    • Reuse components and pipelines

    It is within the Kubeflow ecosystem, which is the machine learning toolkit for Kubernetes

    Vertex AI Model Monitoring is useful for detecting if the model is no longer suitable for your needs.
    Creating ML workflows is possible with Vertex AI Pipelines.
    The other answers may be partially correct but do not resolve all items or need to add more coding.
    Use Vertex AI Monitoring
    Orchestrate activities with Google Cloud Workflows
    Develop procedures with Pub/Sub and Cloud Run
    Schedule processes with Cloud Composer

    You have an ML model designed for an industrial company that provides the correct price to buy goods based on a series of elements, such as the quantity requested, the level of quality and other specific variables for different types of products.

    You have built a linear regression model that works well but whose performance you want to optimize.

    Which of these techniques could you use?
    Clipping
    Feature clipping eliminates outliers that are too high or too low.
    Log scaling
    When you don't have a fairly uniform distribution, you can instead use Log Scaling which can compress the data range: x1 = log (x)
    Z-score
    Z-Score is similar to scaling, but uses the deviation from the mean divided by the standard deviation, which is the classic index of variability. So, it gives how many standard deviations each value is away from the mean.
    Scaling to a range
    Scaling means transforming feature values into a standard range, from 0 and 1 or sometimes -1 to +1. It's okay when you have an even distribution between minimum and maximum.
    All of them

    You are starting to operate as a Data Scientist and are working on a deep neural network model with Tensorflow to optimize customer satisfaction for after-sales services to create greater client loyalty.
    You are doing Feature Engineering, and your focus is to minimize bias and increase accuracy. Your coordinator has told you that by doing so you risk having problems. He explained to you that, in addition to the bias, you must consider another factor to be optimized. Which one?
    Blending
    Blending indicates an ensemble of ML models.
    Learning Rate
    Learning Rate is a hyperparameter in neural networks.
    Feature Cross
    Feature Cross is the method for obtaining new features by multiplying other ones.
    Bagging
    Bagging is an ensemble method like Blending.
    Variance
    The variance indicates how much function f (X) can change with a different training dataset. Obviously, different estimates will correspond to different training datasets, but a good model should reduce this gap to a minimum.
    The bias-variance dilemma is an attempt to minimize both bias and variance.
    The bias error is the non-estimable part of the learning algorithm. The higher it is, the more underfitting there is.
    Variance is the sensitivity to differences in the training set. The higher it is, the more overfitting there is.

    Your company supplies environmental management services and has a network of sensors that acquire information uploaded to the Cloud to be pre-processed and managed with some ML models with dynamic dashboards used by customers.

    Periodically, the models are retrained and re-deployed, with a rather complex pipeline on VM clusters:
    • New data is streamed from Dataflow
    • Data is transformed through aggregations and normalizations (z-scores)
    • The model is periodically retrained and evaluated
    • New Docker images are created and stored
    You want to simplify the pipeline as much as possible and use fully managed or even serverless services as far as you can.

    Which do you choose from the following services?
    Kubeflow
    Vertex AI custom training
    With Vertex AI you can use AutoML training and custom training in the same environment.
    It's a managed but not a serverless service, especially for custom training.
    It obviously has a rich set of features for managing ML pipelines.
    BigQuery and BigQuery ML
    BigQuery and BigQuery ML are powerful services for data analysis and machine learning.
    They are fully serverless services that can process petabytes of data in public and private datasets and even data stored in files.
    BigQuery works with standard SQL and has a CLI interface: bq.
    You can use BigQuery jobs to automate and schedule tasks and operations.
    With BigQuery ML, you can train models with a rich set of algorithms with data already stored in the Cloud. You may perform feature engineering and hyperparameter tuning and export a BigQuery ML model to a Docker image as required.
    TFX

    Your company runs an e-commerce site. You produced static deep learning models with Tensorflow that process Analytics-360 data. They have been in production for some time. Initially, they gave you excellent results, but gradually, the accuracy has progressively decreased. You retrained the models with the new data and solved the problem.

    At this point, you want to automate the process using the Google Cloud environment.

    Which of these solutions allows you to quickly reach your goal?
    Cluster Compute Engine and KubeFlow
    GKE and TFX
    GKE and KubeFlow
    Vertex AI Pipelines and TensorFlow Extended TFX
    TFX is a platform that allows you to create scalable production ML pipelines for TensorFlow projects, therefore Kubeflow.
    It, therefore, allows you to manage the entire life cycle seamlessly from modeling, training, and validation, up to production start-up and management of the inference service.
    Vertex AI Pipelines can run pipelines built using TFX:
    • You can configure a Cluster
    • Select basic parameters and click create
    • You get your Kubeflow and Kubernetes launched

    All the other answers are correct, but not optimal for a quick and managed solution.

    You have a Linear Regression model for the optimal management of supplies to a sales network based on a large number of different driving factors. You want to simplify the model to make it more efficient and faster. Your first goal is to synthesize the features without losing the information content that comes from them.

    Which of these is the best technique?
    Feature Crosses
    Feature Crosses are for the same objective, but they add non-linearity.
    Principal component analysis (PCA)
    Principal component analysis is a technique to reduce the number of features by creating new variables obtained from linear combinations or mixes of the original variables, which can then replace them but retain most of the information useful for the model. In addition, the new features are all independent of each other.
    The new variables are called principal components.
    A linear model is assumed as a basis. Therefore, the variables are independent of each other.
    Embeddings
    Embeddings, which transform large sparse vectors into smaller vectors are used for categorical data.
    Functional Data Analysis
    Functional Data Analysis has the goal to cope with complexity, but it is used when it is possible to substitute features with functions- not our case.

    TerramEarth is a company that builds heavy equipment for mining and agriculture.

    During maintenance services for vehicles produced by TerramEarth at the service centers, information relating to their use is downloaded. Every evening, this data flows into the data center, is consolidated and sent to the Cloud.

    TerramEarth has an ML model that predicts component failures and optimizes the procurement of spare parts for service centers to offer customers the highest level of service. TerramEarth wants to automate the redevelopment and distribution process every time it receives a new file.

    What is the best service to start the process?
    Cloud Storage trigger with Cloud Functions
    Files are received from Cloud Storage, which has native triggers for all the events related to its file management.
    So, we may start a Cloud Function that may activate any Cloud Service as soon as the file is received.
    Cloud Storage triggers may also activate a Pub/Sub notification, just a little more complex.
    It is the simplest and most direct solution of all the answers.

    Cloud Scheduler every night
    Pub/Sub
    Cloud Run and Cloud Build

    You work in a major banking institution. The Management has decided to rapidly launch a bank loan service, as the Government has created a series of “first home” facilities for the younger population.

    The goal is to carry out the automatic management of the required documents (certificates, origin documents, legal information) so that the practice can be built and verified automatically using the data and documents provided by customers and can be managed in a short time and with the minimum contribution of the scarce specialized personnel.

    Which of these GCP services can you use?
    Dialogflow
    Dialogflow is for speech Dialogs, not written documents.
    Document AI
    Document AI is the perfect solution because it is a complete service for the automatic understanding of documents and their management.
    It integrates computer natural language processing, OCR, and vision and can create pre-trained templates aimed at intelligent document administration.

    Cloud Natural Language API
    NLP is integrated into Document AI.
    AutoML
    functions like AutoML are integrated into Document AI, too.

    Your company does not have an excellent ML experience. They want to start with a service that is as smooth, simple and managed as possible. The idea is to use BigQuery ML. Therefore, you are considering whether it can cover all the functionality you need.

    Which of the following features are not present in BigQuery ML natively?
    Exploratory data analysis
    Feature selection
    Model building
    Training
    Hyperparameter tuning
    Automatic deployment and serving
    BigQuery is perfect for Analytics. So, exploratory data analysis and feature selection are simple and very easy to perform with the power of SQL and the ability to query petabytes of data.
    BigQuery ML offers all other features except automatic deployment and serving.
    BigQuery ML can simply export a model (packaged in a container image) to Cloud Storage.

    Your client has an e-commerce site for commercial spare parts for cars with competitive prices. It started with the small car sector but is continually adding products. Since 80% of them operate in a B2B market, he wants to ensure that his customers are encouraged to use the new products that he gradually offers on the site quickly and profitably.

    Which GCP service can be valuable in this regard and in what way?
    Create a Tensorflow model using Matrix factorization
    Create a Tensorflow model using Matrix factorization could be OK, but it needs a lot of work.
    Use Recommendations AI
    Recommendations AI is a ready-to-use service for all the requirements shown in the question. You don’t need to create models, tune, train, all that is done by the service with your data. Also, the delivery is automatically done, with high-quality recommendations via web, mobile, email. So, it can be used directly on websites during user sessions.
    Import the Product Catalog
    Import the Product Catalog deal only with data management, not creating recommendations.
    Record / Import User events
    Record / Import User events deal only with data management, not creating recommendations.

    Your client has an e-commerce site for commercial spare parts for cars with competitive prices. It started with the small car sector but is continually adding products. Since 80% of them operate in a B2B market, he wants to ensure that his customers are encouraged to use the new products that he gradually offers on the site quickly and profitably.

    You decided on Recommendations AI.

    What specific recommendation model type is not useful for new products?
    Others You May Like
    Frequently Bought Together
    Recommended for You
    Recently Viewed
    The "Recently Viewed" recommendation is not for new products, and it is not a recommendation either.
    It provides the list of products the user has recently viewed, starting with the last.

    Your business makes excellent use of ML models. Many of these were developed with Tensorflow. But lately, you've been making good use of AutoML to make your design work leaner, faster, and more efficient.

    You are looking for an environment that organizes and manages training, validation and tuning, and updating models with new data, distribution and monitoring in production.

    Which of these do you think is the best solution?
    Deploy Tensorflow on Kubernetes
    Leverage Kubeflow Pipelines
    Adopt Vertex AI: custom tooling and pipelines
    Vertex AI combines AutoML, custom models and ML pipeline management through to production.
    Vertex AI integrates many GCP ML services, especially AutoML and custom training, and includes many different tools to help you in every step of the ML workflow.
    So, Vertex AI offers two strategies for model training: AutoML and Personalized training.

    Machine learning operations (MLOps) is the practice of using DevOps for machine learning (ML).
    DevOps strategies automate the release of code changes and control of systems, resulting in greater security and less time to get systems up and running.
    All the other solutions are suitable for production. But, given these requirements, Vertex AI, with the AutoML solution's strong inclusion, is the best and the most productive one.
    Migrate all models to BigQuery ML with AutoML
    Migrate all models to AutoML

    You are a junior Data Scientist. You need to create a multi-class classification Machine Learning model with Keras Sequential model API. In particular, your model must indicate the main categories of a text.

    Which of the following techniques should not be used?
    Feedforward Neural Network
    Feedforward Neural Network is a kind of DNN, widely used for many applications.
    N-grams for tokenize text
    N-grams for tokenizing text is a contiguous sequence of items (usually words) in NLP.
    K-means
    The only unsuitable element is K-means clustering, one of the most popular unsupervised machine learning algorithms. Therefore, it is out of this scope.
    Softmax function
    Softmax is an activation function for multi-class classification.
    Pre-trained embeddings
    Embeddings are used for reducing high-dimensional tensors, so categories, too.
    Dropout layer
    The Dropout layer is used for regularization, eliminating input features randomly.
    Categorical cross-entropy
    Categorical cross-entropy is a loss function for multi-class classification.

    You work for a digital publishing website with an excellent technical and cultural level, where you have both famous authors and unknown experts who express ideas and insights.

    You, therefore, have an extremely demanding audience with strong interests that can be of various types.

    Users have a small set of articles that they can read for free every month. Then they need to sign up for a paid subscription.

    You have been asked to prepare an ML training model that processes user readings and article preferences. You need to predict trends and topics that users will prefer.

    But when you train your DNN with Tensorflow, your input data does not fit into RAM memory.

    What can you do in the simplest way?
    Use tf.data.Dataset
    The tf.data.Dataset allows you to manage a set of complex elements made up of several inner components.
    It is designed to create efficient input pipelines and to iterate over the data for their processing.
    These iterations happen in streaming. So, they work even if the input matrix is very large and doesn’t fit in memory.

    Use a queue with tf.train.shuffle_batch
    A queue with tf.train.shuffle_batch is far more complex, even if it is feasible.
    Use pandas.DataFrame
    A pandas.DataFrame works in real memory, so they don’t solve the problem at all.
    Use a NumPy array
    A NumPy array works in real memory, so they don’t solve the problem at all.

    TerramEarth is a company that builds heavy equipment for mining and agriculture.

    It is developing a series of ML models for different activities: manufacturing, procurement, logistics, marketing, customer service and vehicle tracking.

    TerramEarth uses Google Vertex AI and wants to scale training and inference processes in a managed way.

    It is necessary to forecast whether a vehicle, based on the data collected during the maintenance service, has risks of failures in the next six months in order to recommend an extraordinary service operation.

    Which kind of technology/model should you advise using?
    Feedforward Neural Network
    Feedforward neural networks are the classic example of neural networks. In fact, they were the first and most elementary type of artificial neural network. Feedforward neural networks are mainly used for supervised learning when the data, mainly numerical, to be learned is neither time-series nor sequential (such as NLP).
    These networks do not have any loops or loops in the network. Information moves in one direction only, forward, from the input nodes, through the hidden nodes (if any) and to the output nodes.

    All the other techniques are more complex and suitable for different applications (images, NLP, recommendations).
    Convolutional Neural Network
    The convolutional neural network (CNN) is a type of artificial neural network extensively used for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels.
    Recurrent Neural Network
    A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
    Transformers
    A transformer is a deep learning model that can give different importance to each part of the input data.
    Reinforcement Learning
    Reinforcement Learning provides a software agent that evaluates possible solutions through a progressive reward in repeated attempts. It does not need to provide labels. But it requires a lot of data and several trials, and the possibility to evaluate the validity of each attempt.
    GAN Generative Adversarial Network
    GAN is a special class of machine learning frameworks used for the automatic generation of facial images.
    Autoencoder and self-encoder
    Autoencoder is a neural network aimed to transform and learn with a compressed representation of raw data.

    You work for a large retail company. You are preparing a marketing model. The model will have to make predictions based on the historical and analytical data of the e-commerce site (analytics-360). In particular, customer loyalty and remarketing possibilities should be studied. You work on historical tabular data. You want to quickly create an optimal model, both from the point of view of the algorithm used and the tuning and life cycle of the model.

    What are the two best services you can use?
    AutoML
    AutoML can select the best model for your needs without having to experiment.
    The architectures currently used (they are added at the same time) are:
    • Linear
    • Feedforward deep neural network
    • Gradient Boosted Decision Tree
    • AdaNet
    • Ensembles of various model architectures
    In addition, AutoML automatically performs feature engineering tasks, too, such as:
    • Normalization
    • Encoding and embeddings for categorical features.
    • Timestamp columns management (important in our case)
    So, it has special features for time columns: for example, it can correctly split the input data into training, validation and testing.
    BigQuery ML
    Other option has additional automated feature engineering.
    Vertex AI custom training
    With Vertex AI you can use both AutoML training and custom training in the same environment.
    GKE
    GKE doesn’t supply all the ML features of Vertex AI. It is an advanced K8s managed environment

    TerramEarth is a company that builds heavy equipment for mining and agriculture.

    It is developing a series of ML models for different activities: manufacturing, procurement, logistics, marketing, customer service and vehicle tracking. TerramEarth uses Google Cloud Vertex AI and wants to scale training and inference processes in a managed way.

    During the maintenance service, snapshots of the various components of the vehicle will be taken. Your new model should be able to determine both the degree of deterioration and any breakages or possible failures.

    Which kind of technology/model should you advise using?
    Feedforward Neural Network
    Feedforward neural networks are the classic example of neural networks. In fact, they were the first and most elementary type of artificial neural network. Feedforward neural networks are mainly used for supervised learning when the data, mainly numerical, to be learned is neither time-series nor sequential (such as NLP).
    Convolutional Neural Network
    The convolutional neural network (CNN) is a type of artificial neural network extensively used for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels.

    Recurrent Neural Network
    A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
    Transformers
    A transformer is a deep learning model that can give different importance to each part of the input data.
    Reinforcement Learning
    Reinforcement Learning provides a software agent that evaluates possible solutions through a progressive reward in repeated attempts. It does not need to provide labels. But it requires a lot of data and several trials, and the possibility to evaluate the validity of each attempt.
    GAN Generative Adversarial Network
    GAN is a special class of machine learning frameworks used for the automatic generation of facial images.
    Autoencoder and self-encoder
    Autoencoder is a neural network aimed to transform and learn with a compressed representation of raw data.

    You work for a video game company. Your management came up with the idea of creating a game in which the characteristics of the characters were taken from those of the human players. You have been asked to generate not only the avatars but also various visual expressions during the game actions.

    Which kind of technology/model should you advise using?
    Feedforward Neural Network
    Feedforward neural networks are mainly used for supervised learning when the data, mainly numerical, to be learned is neither time-series nor sequential (such as NLP).
    Convolutional Neural Network
    The convolutional neural network (CNN) is a type of artificial neural network extensively used for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels.
    Recurrent Neural Network
    A recurrent neural network (RNN) is a class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence.
    Transformers
    A transformer is a deep learning model that can give different importance to each part of the input data.
    Reinforcement Learning
    Reinforcement Learning provides a software agent that evaluates possible solutions through a progressive reward in repeated attempts. It does not need to provide labels. But it requires a lot of data and several trials, and the possibility to evaluate the validity of each attempt.
    GAN Generative Adversarial Network
    GAN is a special class of machine learning frameworks used for the automatic generation of facial images.
    GAN can create new characters from the provided images.
    It is also used with photographs and can generate new photos that look authentic.

    It is a kind of model highly specialized for this task. So, it is the best solution.
    Feedforward neural networks are the classic example of neural networks. In fact, they were the first and most elementary type of artificial neural network.
    Autoencoder and self-encoder
    Autoencoder is a neural network aimed to transform and learn with a compressed representation of raw data.

    You work for a digital publishing website with an excellent technical and cultural level, where you have both famous authors and unknown experts who express ideas and insights. You, therefore, have an extremely demanding audience with strong interests of various types. Users have a small set of articles that they can read for free every month; they need to sign up for a paid subscription.

    You aim to provide your audience with pointers to articles that they will indeed find of interest to themselves.

    Which of these models can be useful to you?
    Hierarchical Clustering
    Hierarchical Clustering creates clusters using a hierarchical tree. It may be effective, but it is heavy with lots of data, like in our example.
    Autoencoder and self-encoder
    Autoencoder and self-encoder are useful when you need to reduce the number of variables under consideration for the model, therefore for dimensionality reduction.
    Convolutional Neural Network
    Convolutional Neural Network is used for image classification.
    Collaborative filtering using Matrix Factorization
    Collaborative filtering works on the idea that a user may like the same things of the people with similar profiles and preferences.
    So, exploiting the choices of other users, the recommendation system makes a guess and can advise people on things not yet been rated by them.


    You work for a video game company. Your management came up with the idea of creating a game in which the characteristics of the characters were taken from those of the human players. You have been asked to generate not only the avatars but also the various visual expressions during the game actions. You are working with GAN - Generative Adversarial Network models, but the training is intensive and time-consuming.
    You want to increase the power of your training quickly, but your management wants to keep costs down.

    What solutions could you adopt?
    Use preemptible Cloud TPU
    You may use preemptible Cloud TPU (70% cheaper) for your fault-tolerant machine learning workloads.
    Use Vertex AI with TPUs
    You may use TPUs in the Vertex AI Platform using machine types.
    Use the Cloud TPU Profiler TensorBoard plugin
    You may optimize your workload using the Profiler with TensorBoard.
    TensorBoard is a visual tool for ML experimentation for Tensorflow
    Use one Compute Engine Cloud TPU VM and install TensorFlow
    There are Deep Learning VM Image types. So, you don't have to install your own ML tools and libraries and you can use managed services that help you with more productivity and savings

    TerramEarth is a company that builds heavy equipment for mining and agriculture. During maintenance services for vehicles produced by TerramEarth at the service centers, information relating to their use is collected together with administrative and billing data. All this information goes through a data pipeline process that you are asked to automate in the fastest and most managed way, possibly without code.

    Which service do you advise?
    Cloud Dataproc
    Cloud Dataproc is the managed Hadoop service. So, it could manage data pipelines but in a non-serverless and complex way.
    Cloud Dataflow
    Dataflow is more complex, too, even though it has more functionality, such as batch and stream data processing with the same code.
    Cloud Data Fusion
    Cloud Data Fusion is a managed service for quickly building data pipelines and ETL processes. It is based on the open-source CDAP project and therefore is portable to any environment.
    It has a visual interface that allows you to create codeless data pipelines as required.

    Cloud Dataprep
    Cloud Dataprep is for cleaning, exploration and preparation, and is used primarily for ML processes.

    You are starting to operate as a Data Scientist and are working on a model of prizes optimization with products with a lot of categorical features. You don’t know how to deal with them. Your manager told you that you had to encode them in a limited set of numbers.

    Which of the following methods will not help you with this task?
    Ordinal Encoding
    Ordinal encoding simply creates a correspondence between each unique category with an integer.
    One-Hot Encoding
    One-hot encoding creates a sparse matrix with values (0 and 1, see the picture) that indicate the presence (or absence) of each possible value.

    Sigmoids
    Sigmoids are the most common activation functions (logistic function) for binary classification. There is nothing to do with categorical variables.
    Embeddings
    Embeddings are often used with texts and in Natural Language Processing (NLP) and address the problem of complex categories linked together.
    Feature Crosses
    Feature crosses creates a new feature created by joining or multiplying multiple variables to add further predictive capabilities, such as transforming the geographic location of properties into a region of interest.

    Your company operates an innovative auction site for furniture from all times. You have to create a series of ML models that allow you, starting from the photos, to establish the period, style and type of the piece of furniture depicted.

    Furthermore, the model must be able to determine whether the furniture is interesting and require it to be subject to a more detailed estimate. You want Google Cloud to help you reach this ambitious goal faster.

    Which of the following services do you think is the most suitable?
    AutoML Edge
    AutoML Edge is for local devices
    Vision API
    Vision API uses pre-trained models trained by Google. This is powerful, but not enough.
    Video Intelligence API
    Video Intelligence API manages videos, not pictures. It can extract metadata from any streaming video, get insights in a far shorter time, and let trigger events.
    AutoML
    AutoML lets you train models to classify your images with your own characteristics and labels. So, you can tailor your work as you want.

    You are a junior Data Scientist, and you need to create a new classification Machine Learning model with Tensorflow. You have a limited set of data on which you build your model. You know the rule to create training, test and validation datasets, but you're afraid you won't have enough to make something satisfying.

    Which solution is the best one?
    Use Cross-Validation
    Cross-validation involves running our modeling process on various subsets of data, called "folds".
    Obviously, this creates a computational load. Therefore, it can be prohibitive in very large datasets, but it is great when you have small datasets.

    All data for learning
    All data for learning is the best way to obtain overfitting.
    Split data between Training and Test
    Split data between Training and Test is wrong because with small datasets other option achieves far better results.
    Split data between Training and Test and Validation
    Split data between Training and Test and Validation is wrong because with small datasets other option achieves far better results.

    You are a Data Scientist, and you work in a large organization. A fellow programmer, who is working on a project with Dataflow, asked you what the GCP techniques support the computer's ability to entertain almost human dialogues and if these techniques can be used individually.

    Which of the following choices do you think is wrong?
    Speech to Text
    Speech to Text can convert voice to written text.
    Polly
    Amazon Polly is a text-to-speech service from AWS, not GCP.
    Cloud NLP
    Cloud Natural Language API can understand text meanings such as syntax, feelings, content, entities and can create classifications.
    Text to Speech
    Text to Speech can convert written text to voice.
    Speech Synthesis Markup Language (SSML)
    Speech Synthesis Markup Language (SSML) is not a service but a language used in Text-to-Speech requests. It gives you the ability to indicate how you want to format the audio, pauses, how to read acronyms, dates, times, abbreviations and so on. Really, it is useful for getting closer to human dialogue.

    You are working on a deep neural network model with Tensorflow on a cluster of VMs for a Bank. Your model is complex, and you work with huge datasets with complex matrix computations.
    You have a big problem: your training jobs last for weeks. You are not going to deliver your project in time.

    Which is the best solution that you can adopt?
    Cloud TPU
    Given these requirements, it is the best solution.
    GCP documentation states that the use of TPUs is advisable with models that:
    • use TensorFlow
    • need training for weeks or months
    • have huge matrix computations
    • have deals with big datasets and effective batch sizes
    Nvidia GPU
    Cloud TPU is better than Nvidia GPU
    Intel CPU
    CPUs turned out to be inadequate for our purpose
    AMD CPU
    CPUs turned out to be inadequate for our purpose

    You are working with a Linear Regression model for an important Financial Institution. Your model has many independent variables. You discovered that you could not achieve good results because many variables are correlated. You asked for advice from an experienced Data scientist that explains what you can do.

    Which techniques or algorithms did he advise to use?
    Multiple linear regression with MLE
    Multiple linear regression is an OLS Ordinary Least Square method.
    Partial Least Squares
    Partial Least Squares create new variables that are uncorrelated and uses projected new variables using functions.
    Principal components
    The main PCA components reduce the variables while maintaining their variance. Hence, the amount of variability contained in the original characteristics.
    Maximum Likelihood Estimation
    Maximum Likelihood Estimation requires independence for variables. It finds model parameter values with probability, maximizing the likelihood of seeing the examples given the model.
    Multivariate Multiple Regression
    Multivariate regression finds out ways to explain how different elements in variables react together to changes.

    You are using Vertex AI, with a series of demanding training jobs. So, you want to use TPUs instead of CPUs.

    What is the simplest configuration to indicate it?
    Set machineType to cloud-tpu
    machineType: cloud-tpu
    Set replica as master
    Set replica as worker
    Set replica as paramter server (ps)

    You are training a set of modes that should be simple, using regression techniques. During training, your model seems to work. But the tests are giving unsatisfactory results. You discover that you have several wrongs and missing data. You need a tool that helps you cope with them.

    Which of the following problems is not related to Data Validation?
    Omitted values.
    Omitted values are a problem because they may change fundamental statistics like average, for example.
    Categories
    Categories are not related to Data Validation. Usually, they are categorical, string variables that in ML usually are mapped in a numerical set before training.
    Duplicate examples.
    Duplicate examples may change fundamental statistics, too.
    For example, we may have duplicates when a program loops and creates the same data several times.
    Bad labels.
    Having bad labels (with supervised learning) means obtaining a bad model.
    Bad feature values
    Having bad features means obtaining a bad model.

    You are a junior Data Scientist, and you are being interviewed for a new job.
    A senior Data Scientist asked you:
    Which metric for classification models evaluation gives you the percentage of real spam email that was recognized correctly?

    What was the exact answer to this question?
    Precision
    Precision is the metric that shows the percentage of true positives related to all your positive class predictions.
    Recall
    Recall indicates how true positives were recalled (found).
    Accuracy
    Accuracy is the percentage of correct predictions on all outcomes.
    F-Score
    F1 score is the harmonic mean between precision and recall.

    You are working on an NLP model. So, you are dealing with words and sentences, not numbers. Your problem is to categorize these words and make sense of them. Your manager told you that you have to use embeddings.

    Which of the following techniques is not related to embeddings?
    Count Vector
    A Count Vector gives a matrix with the count of every single word in every example. 0 if no occurrence. It is okay for small vocabularies.
    TF-IDF Vector
    TF-IDF vectorization counts words in the entire experiment, not a single example or sentence.
    Co-Occurrence Matrix
    Co-Occurrence Matrix puts together words that occur together. So, it is more useful for text understanding.
    CoVariance Matrix
    Covariance matrices are square matrices with the covariance between each pair of elements.
    It measures how much the change of one with respect to another is related.


    Your company operates an innovative auction site for furniture from all times. You have to create a series of ML models that allow you to establish the period, style and type of the piece of furniture depicted starting from the photos. Furthermore, the model must be able to determine whether the furniture is interesting and require it to be subject to a more detailed estimate. You created the model, but your manager said that he wants to supply this service to mobile users when they go to the fly markets.

    Which of the following services do you think is the most suitable?
    AutoML Edge
    AutoML Edge lets your model be deployed on edge devices and, therefore, mobile phones, too.
    All the other answers refer to Cloud solutions; so, they are wrong.
    Vision API
    Vision API uses pre-trained models trained by Google.
    Video Intelligence API
    Video Intelligence API manages videos, not pictures. It can extract metadata from any streaming video, get insights in a far shorter time, and let trigger events.
    AutoML
    AutoML lets you train models to classify your images with your own characteristics and labels; so, you can tailor your work as you want.

    You are training a set of modes that should be simple, using regression techniques. During training, your model seems to work. But the tests are giving unsatisfactory results. You discover that you have several missing data. You need a tool that helps you cope with them.

    Which GCP product would you choose?
    Dataproc
    Dataproc is a managed Spark and Hadoop service. Therefore, it is for BigData processing.
    Dataprep
    Dataprep is a serverless service that lets you examine clean and correct structured and unstructured data.
    So, it is fully compliant with our requirements.

    Dataflow
    Cloud Dataflow is a managed service to run Apache Beam-based data pipeline, both batch and streaming.
    Data Fusion
    Data Fusion is for data pipelines too. But it is visual and simpler, and it integrates multiple data sources to produce new data.

    In your company you use Tensorflow and Keras as main libraries for Machine Learning and your data is stored in disk files, so block storage.

    Recently there has been the migration of all the management computing systems to Google Cloud and management has requested that the files should be stored in Cloud Storage and that the tabular data should be stored in BigQuery and pre-processed with Dataflow.

    Which of the following techniques is NOT suitable for accessing tabular data as required?
    BigQuery Python client library
    Python BigQuery client library allows you to get BigQuery data in Panda, so it's definitely useful in this environment.
    BigQuery I/O Connector
    BigQuery I/O Connector is used by Dataflow (Apache Beam) for reading Data for transformation and processing, as required.
    tf.data.Iterator
    tf.data.Iterator is used for enumerating elements in a Dataset, using Tensorflow API, so it is not suitable for accessing tabular data.
    tf.data.dataset reader
    tf.data.dataset reader is wrong because you must first access the data using the tf.data.dataset reader for BigQuery.

    You are a junior Data Scientist. You are working with a linear regression model with sklearn.

    Your outcome model presented a good R-square - coefficient of determination, but the final results were poor.

    When you asked for advice, your mentor laughed and said that you failed because of the Anscombe Quartet problem.

    What are the other possible problems described by the famous Anscombe Quartet?
    Not linear relation between independent and dependent variables
    Outliers that change the result
    Correlation among variables
    Correlation prevent the model from working, but they do not give good theoretical results.
    Uncorrect Data
    Incorrect data prevent the model from working, but they do not give good theoretical results.

    You are working on a deep neural network model with Tensorflow. Your model is complex, and you work with very large datasets full of numbers.
    You want to increase performances. But you cannot use further resources.
    You are afraid that you are not going to deliver your project in time.
    Your mentor said to you that normalization could be a solution.

    Which of the following choices do you think is not for data normalization?
    Scaling to a range
    Scaling to a range converts numbers into a standard range ( 0 to 1 or -1 to 1).
    Feature Clipping
    Feature Clipping caps all numbers outside a certain range.
    Z-test
    z-test is not correct because it is a statistic that is used to prove if a sample mean belongs to a specific population. For example, it is used in medical trials to prove whether a new drug is effective or not.
    log scaling
    Log Scaling uses the logarithms instead of your values to change the shape. This is possible because the log function preserves monotonicity.
    Z-score
    Z-score is a variation of scaling: the resulting number is divided by the standard deviations. It is aimed at obtaining distributions with mean = 0 and std = 1.

    Your team is designing a financial analysis model for a major Bank.

    The requirements are:
    • Various banking applications will send transactions to the new system both in real-time and in batch in standard/normalized format
    • The data will be stored in a repository
    • Structured Data will be trained and retrained
    • Labels are drawn from the data.
    You need to prepare the model quickly and decide to use AutoML for structured Data.

    Which GCP Services could you use?
    AutoML
    Tensorflow Extended
    Tensorflow Extended is for deploying production ML pipelines, and it doesn't have any AutoML Services
    BigQuery ML
    Vertex AI

    You are starting to operate as a Data Scientist and are working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services with the goal of creating greater client loyalty.
    You have to follow the entire lifecycle: model development, design, and training, testing, deployment, and retraining.
    You are looking for UI tools that can help you work and solve all issues faster.

    Which solutions can you adopt?
    Tensorboard
    Tensorboard is aimed at model creation and experimentation:
    • Profiling
    • Monitoring metrics, weights, biases
    • Examine model graph
    • Working with embeddings
    Jupyter notebooks
    Jupyter notebooks are a wonderful tool to develop, experiment, and deploy. You may have the latest data science and machine learning frameworks with them.
    KFServing
    KFServing is an open-source library for Kubernetes that enables serverless inferencing. It works with TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve issues linked to production model serving. So, no UI.
    Kubeflow UI
    The Kubeflow UIs is for ML pipelines and includes visual tools for:
    • Pipelines dashboards
    • Hyperparameter tuning
    • Artifact Store
    • Jupyter notebooks
    Vertex AI
    With Vertex AI you can use AutoML training and custom training in the same environment.

    You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases.
    You work on this project. You need to deal with input data that is binary (images) and made by CSV files.
    You are looking for the most convenient way to import and manage this type of data.

    Which is the best solution that you can adopt?
    tf.RaggedTensor
    RaggedTensor is a tensor with ragged dimensions, that is with different lengths like this: [[6, 4, 7, 4], [], [8, 12, 5], [9], []]
    Tf.quantization
    Quantization is aimed to reduce CPU and TPU GCP latency, processing, and power.
    tf.train.Feature
    tf.train is a feature for Graph-based Neural Structured model training
    tf.TFRecordReader
    The TFRecord format is efficient for storing a sequence of binary and not-binary records using Protocol buffers for serialization of structured data.

    You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases.

    You need to monitor the performance of your models and let them go faster.

    Which is the best solution that you can adopt?
    TF Profiler
    TensorFlow Profiler is a tool for checking the performance of your TensorFlow models and helping you to obtain an optimized version.
    In TensorFlow 2, the default is eager execution. So, one-off operations are faster, but recurring ones may be slower. So, you need to optimize the model.

    TF function
    TF function is a transformation tool used to make graphs out of your programs. It helps to create performant and portable models but is not a tool for optimization.
    TF Trace
    TF tracing lets you record TensorFlow Python operations in a graph.
    TF Debugger
    TF debugging is for Debugger V2 and creates a log of debug information.
    TF Checkpoint
    Checkpoints catch the value of all parameters in a serialized SavedModel format.

    You work for an important Banking group.

    The purpose of your current project is the automatic and smart acquisition of data from documents and modules of different types.

    You work on big datasets with a lot of private information that cannot be distributed and disclosed.

    You are asked to replace sensitive data with specific surrogate characters.

    Which of the following techniques do you think is best to use?
    Format-preserving encryption
    Format-preserving encryption (FPE) encrypts in the same format as the plaintext data.
    For example, a 16-digit credit card number becomes another 16-digit number.
    K-anonymity
    k-anonymity is a way to anonymize data in such a way that it is impossible to identify person-specific information. Still, you maintain all the information contained in the record.
    Replacement
    Replacement just substitutes a sensitive element with a specified value.
    Masking
    Masking replaces sensitive values with a given surrogate character, like hash (#) or asterisk (*).

    You have a customer ranking ML model in production for an e-commerce site; the model used to work very well.

    You use GCP managed services, specifically Vertex AI.

    Suddenly, there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine.

    Finally, you control the input data and notice that the frequency distributions have changed for a specific feature.

    Which GCP service can be helpful for you to manage features in a more organized way?
    Regularization against overfitting
    Regularization against overfitting is wrong because the model is OK
    Feature Store
    Feature engineering means transforming input data, often strings, into a feature vector.
    Lots of effort is spent in mapping categorical values in the best way: we have to convert strings to numeric values. We have to define a vocabulary of possible values, usually mapped to integer values.
    We remember that in an ML model everything must be translated into numbers. Therefore it is easy to run into problems of this type.
    Vertex Feature Store is a service to organize and store ML features through a central store.
    This allows you to share and optimize ML features important for the specific environment and to reuse them at any time.
    All these translate into the greater speed of the creation of ML services. But these also allow minimizing problems such as processing skew, which occurs when the distribution of data in production is different from that of training, often due to errors in the organization of the features.
    For example, Training-serving skew may happen when your training data uses a different unit of measure than prediction requests.
    So, Training-serving skew happens when you generate your training data differently than you generate the data you use to request predictions. For example, if you use an average value, and for training purposes, you average over 10 days, but you average over the last month when you request prediction.
    Hyperparameters tuning
    Hyperparameters tuning is wrong because the model is OK. So both Regularization against overfitting and Hyperparameters are tuned.
    Model Monitoring
    Monitor is suitable for Training-serving skew prevention, not organization.

    You have a customer ranking ML model in production for an e-commerce site; the model used to work very well. You use GCP managed services, specifically Vertex AI. Suddenly there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine.

    Which of the following methods could you use to avoid such problems?
    Regularization against overfitting
    Regularization against overfitting is wrong because the model is OK.
    Feature Store
    Feature Store is suitable for feature organization, not for data skew prevention.
    Hyperparameters tuning
    Hyperparameters tuning is wrong because the model is OK.
    Model Monitoring
    Input data to ML models may change over time. This can be a serious problem, as performance will obviously degrade.
    To avoid this, it is necessary to monitor the quality of the forecasts continuously.
    Vertex Model Monitoring has been designed just for this.
    The main goal is to cope with feature skew and drift detection.
    For skew detection, it looks at and compares the feature's values distribution in the training data.
    For drift detection, it looks at and compares the feature's values distribution in the production data.
    It uses two main methods:

    Your company runs an e-commerce site. You produced static deep learning models with Tensorflow that process Analytics-360 data. They have been in production for some time. Initially, they gave you excellent results, but then gradually, the accuracy has decreased.

    You are using Compute Engine and GKE. You decided to use a library that lets you have more control over all processes, from development up to production.

    Which tool is the best one for your needs?
    TFX
    TensorFlow Extended (TFX) is a set of open-source libraries to build and execute ML pipelines in production. Its main functions are:
    • Metadata management
    • Model validation
    • Deployment
    • Production execution.
    • The libraries can also be used individually.
    Vertex AI
    Vertex AI is an integrated suite of ML managed products, and you are looking for a library.
    Vertex AI's main functions are:
    • Train custom and AutoML models
    • Evaluate and tune models
    • Deploy models
    • Manage prediction: Batch, Online and monitoring
    • Manage model versions: workflows and retraining
    SageMaker
    Sagemaker is a managed product in AWS, not GCP.
    Kubeflow
    Kubeflow Pipelines don’t deal with production control.
    Kubeflow Pipelines is an open-source platform designed specifically for creating and deploying ML workflows based on Docker containers.
    Their main features:
    • Using packaged templates in Docker images in a K8s environment
    • Manage your various tests / experiments
    • Simplifying the orchestration of ML pipelines
    • Reuse components and pipelines

    Your company runs a big retail website. You develop many ML models for all the business activities. You migrated to Google Cloud when you were using Vertex AI. Your models are developed with PyTorch, TensorFlow and BigQuery ML.

    You also use BigTable and CloudSQL, and of course Cloud Storage. In many cases, the same data is used for multiple models and projects. And your data is continuously updated, sometimes in streaming mode.

    Which is the best way to organize the input data?
    Dataflow per Data Transformation sia in streaming che batch
    Dataflow deals with Data Pipelines and is not a way to access and organize data.
    CSV
    CSV is just a data format, and an ML Dataset is made with data and metadata dealing with many different formats.
    BigQuery
    BigQuery and BigTable are just one of the ways in which you can store data. Moreover, BigTable is not currently supported for data store for Vertex datasets.
    Datasets
    Vertex AI integrates the following elements:
    • Datasets: data, metadata and annotations, structured or unstructured. For all kinds of libraries.
    • Training pipelines to build an ML model
    • ML models, imported or created in the environment
    • Endpoints for inference
    Because Datasets are suitable for all kinds of libraries, it is a useful abstraction for this requirement.

    BigTable

    You are a Data Scientist and working on a project with PyTorch. You need to save the model you are working on because you have to cope with an urgency. You, therefore, need to resume your work later.

    What command will you use for this operation?
    callbacks.ModelCheckpoint (keras)
    ModelCheckpoint is used with keras.
    save
    PyTorch is a popular library for deep learning that you can leverage using GPUs and CPUs.
    When you have to save a model for resuming training, you have to record both models and updated buffers and parameters in a checkpoint.
    A checkpoint is an intermediate dump of a model’s entire internal state (its weights, current learning rate, etc.) so that the framework can resume the training from that very point.
    In other words, you train for a few iterations, then evaluate the model, checkpoint it, then fit some more. When you are done, save the model and deploy it as normal.
    To save checkpoints, you must use torch.save() to serialize the dictionary of all your state data,
    In order to reload, the command is torch.load().

    model.fit
    model.fit is used to fit a model in scikit-learn best.
    train.Checkpoint TF
    train.Checkpoint is used with Tensorflow.

    You are a Data Scientist. You are going to develop an ML model with Python. Your company adopted GCP and Vertex AI, but you need to work with your developing tools.

    What are you going to do?
    Use an Emulator
    Use an Emulator is wrong because there isn’t a specific Emulator for using the SDK
    Work with the Console
    Work with the Console is wrong because it was asked to create a local work environment.
    Create a service account key
    Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS

    You are working with Vertex AI, the managed ML Platform in GCP. You are dealing with custom training and you are looking and studying the job progresses during the training service lifecycle.

    Which of the following states is not correct?
    JOB_STATE_ACTIVE
    JOB_STATE_RUNNING
    JOB_STATE_QUEUED
    JOB_STATE_SUCCEEDED

    You work as a Data Scientist for a major banking institution that recently completed the first phase of migration in GCP.

    You now have to work in the GCP Managed Platform for ML. You need to deploy a custom model with Vertex AI so that it will be available for online predictions.

    Which is the correct procedure?
    Save the model in a Docker container
    Vertex AI Prediction can serve prediction deploying custom or pre-built containers on N1 Compute Engine Instances.
    You create an "endpoint object" for your model and then you can deploy the various versions of your model.
    Its main elements are given below:
    Custom or Pre-built containers
    Model
    Vertex AI Prediction uses an architectural paradigm that is based on immutable instances of models and model versions.
    Regional endpoint
    Set a VM with a GPU processor
    You don’t need to set any specific VM. You will point out the configuration and Vertex will manage everything.
    Use TensorFlow Serving
    TensorFlow Serving is used under the hood, but you don’t need to call their functions explicitly.
    Create an endpoint and deploy to that endpoint
    The endpoint is the object that will be equipped with all the resources needed for online predictions and it is the target for your model deployments.

    You work as a Data Scientist in a Startup. You want to create an optimized input pipeline to increase the performance of training sessions, avoiding GPUs and TPUs as much as possible because they are expensive.

    Which technique or algorithm do you think is best to use?
    Caching
    Prefetching
    Parallelizing data
    All of the above
    GPUs and TPUs can greatly increase the performance of training sessions, but an optimized input pipeline is likewise important.
    The tf.data API provides these functions:
    Prefetching
    tf.data.Dataset.prefetch: while the execution of a training pass, the data for the next pass is read.
    Parallelizing data transformation
    The tf.data API offers the map function for the tf.data.Dataset.map transformation.
    This transformation can be parallelized across multiple cores with the num_parallel_calls option.
    Sequential and parallel interleave
    tf.data.Dataset.interleave offers the possibility of interleaving and allowing multiple datasets to execute in parallel (num_parallel_calls).
    Caching
    tf.data.Dataset.cache allows you to cache a dataset increasing performance.

    You are working on a new model together with your client, a large financial institution. The data you are dealing with contains PII (Personally Identifiable Information) contents.

    You face 2 different sets of problems:
    • Transform data to hide personal information you don't need
    • Protect your work environment because certain combinations of personal data are useful for your model and you need to keep them
    What are the solutions offered by GCP that it is advisable to use?
    Cloud Armor security policies
    Cloud Armor is a security service at the edge against attacks like DDoS.
    Cloud HSM
    Cloud HSM is a service for cryptography based on special and certified hardware and software
    Cloud Data Loss Prevention
    Cloud Data Loss Prevention is a service that can discover, conceal and mask personal information in data.
    Network firewall rules
    Network firewall rules are a set of rules that deny or block network traffic in a VPC, just network rules. VPC service-controls lets you define control at a more granular level, with context-aware access, suitable for multi-tenant environments like this one.
    VPC service-controls
    VPC service-controls is a service that lets you build a security perimeter that is not accessible from outside; in this way data exfiltration dangers are greatly mitigated. It is a network security service that helps protect data in a Virtual Private Cloud (VPC) in a multi-tenant environment.

    You are a junior Data Scientist and working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services to create greater client loyalty.

    You are struggling with your model (learning rates, hidden layers and nodes selection) for optimizing processing and letting it converge in the fastest way.

    What is your problem in ML language?
    Cross Validation
    Cross Validation is related to the input data organization for training, test and validation.
    Regularization
    Regularization is related to feature management and overfitting.
    Hyperparameter tuning
    ML training Manages three main data categories:
    • Training data is also called examples or records. It is the main input for model configuration and, in supervised learning, presents labels, that are the correct answers based on past experience. Input data is used to build the model but will not be part of the model.
    • Parameters are instead the variables to be found to solve the riddle. They are part of the final model and they make the difference among similar models of the same type.
    • Hyperparameters are configuration variables that influence the training process itself: Learning rate, hidden layers number, number of epochs, regularization, batch size are all examples of hyperparameters.
    Hyperparameters tuning is made during the training job and used to be a manual and tedious process, made by running multiple trials with different values.
    The time required to train and test a model can depend upon the choice of its hyperparameters.
    With Vertex AI you just need to prepare a simple YAML configuration without coding.
    drift detection management
    Drift management is when data distribution changes and you have to adjust the model.

    You work for an important organization. Your manager tasked you with a new classification model with lots of data drawn from the company Data Lake.

    The big problem is that you don’t have the labels for all the data, but you have very little time to complete the task for only a subset of it.

    Which of the following services could help you?
    Vertex Data Labeling
    In supervised learning, the correctness of label data, together with the quality of all your training data, is utterly important for the resulting model and the quality of the future predictions.
    If you cannot have your data correctly labeled, you may request professional people to complete your training data.
    GCP has a service for this: Vertex AI data labeling. Human labelers will prepare correct labels following your directions.
    You have to set up a data labeling job with:
    • The dataset
    • A list, vocabulary of the possible labels
    • An instructions document for the professional people
    Mechanical Turk
    Mechanical Turk is an Amazon service.
    GitLab ML
    GitLab is a DevOps lifecycle tool.
    Tag Manager
    Tag Manager is in the Google Analytics ecosystem.

    Your company runs an e-commerce site. You manage several deep learning models with Tensorflow that process Analytics-360 data, and they have been in production for some time. The modeling is made essentially with customers and orders data. You need to classify many business outcomes.

    Your Manager realized that different teams in different projects used to deal with the same features based on the same data differently. The problem arose when models drifted unexpectedly over time.

    You have to advise your Manager on the best strategy. Which of the following do you choose?
    Each group classifies their features and sends them to the other teams
    It creates confusion and doesn't solve the problem.
    For each model of the different features store them in Cloud Storage
    It will not avoid feature definition overlapping. Cloud Storage is not enough for identifying different features.
    Search for features in Cloud Storage and reuse them
    It will not avoid feature definition overlapping. Cloud Storage is not enough for identifying different features.
    Search the Vertex Feature Store for features that are the same
    Insert or update the features in Vertex Feature Store accordingly

    You are starting to operate as a Data Scientist. You speak with your mentor who asked you to prepare a simple model with a nonparametric Machine Learning algorithm of your choice. The problem is that you don’t know the difference between parametric and nonparametric algorithms. So you looked for it.

    Which of the following methods are nonparametric?
    Simple Neural Networks
    With Neural Networks you have to figure out the parameters of a specific function that best fit the data
    K-Nearest Neighbors
    K-nearest neighbor is a simple supervised algorithm for both classification and regression problems.
    You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter.
    Decision Trees
    A decision tree has a series of tests inside a flowchart-like structure. So, no mathematical functions to solve.
    Logistic Regression
    With Logistic Regression you have to figure out the parameters of a specific function that best fit the data.

    As a Data Scientist, you are involved in various projects in an important retail company. You prefer to use, whenever possible, simple and easily explained algorithms. Where you can't get satisfactory results, you adopt more complex and sophisticated methods. Your manager told you that you should try ensemble methods. Intrigued, you are documented.

    Which of the following are ensemble-type algorithms?
    Random Forests
    Random forests are made with multiple decision trees, random sampling, a subset of variables and optimization techniques at each step (voting the best models).
    AdaBoost is built with multiple decision trees, too, with the following differences:
    • It creates stumps, that is, trees with only one node and two leaves.
    • Stumps with less error win.
    • Ordering is built in such a way as to reduce errors.
    DCN
    Deep and Cross Networks are a new kind of Neural Networks.
    Decision Tree
    Decision Trees are flowchart like with a series of tests on the nodes.
    XGBoost
    XGBoost is currently very popular. It is similar to Gradient Boost with the following differences:
    • Leaf nodes pruning, that is regularization in order to keep the best ones for generalization
    • Newton Boosting instead of gradient descent, so math-based and faster
    • Correlation between trees reduction with an additional randomization parameter
    • Optimized algorithm for tree penalization
    Gradient Boost
    Gradient Boost is built with multiple decision trees, too, with the following differences from AdaBoost;
    • Trees instead stumps
    • It uses a loss function to minimize errors.
    • Trees are selected to predict the difference from actual values

    Your team works for an international company with Google Cloud, and you develop, train and deploy several ML models with Tensorflow. You use many tools and techniques and you want to make your work leaner, faster, and more efficient.

    You would like engineer-to-engineer assistance from both Google Cloud and Google’s TensorFlow teams.
    Which of the following services can be used to achieve the above requirement?
    Vertex AI
    Vertex AI is a managed service
    Kubeflow
    Kubeflow is an open source library with standard support from the community
    Tensorflow Enterprise
    The TensorFlow Enterprise is a distribution of the open-source platform for ML, linked to specific versions of TensorFlow, tailored for enterprise customers.
    It is free but only for big enterprises with a lot of services in GCP. it is prepackaged and optimized for usage with containers and VMs.
    It works in Google Cloud, from VM images to managed services like GKE and Vertex AI.
    The TensorFlow Enterprise library is integrated in the following products:
    • Deep Learning VM Images
    • Deep Learning Containers
    • Notebooks
    • Vertex AI Training
    It is ready for automatic provisioning and scaling with any kind of processor.
    It has a premium level of support from Google.
    TFX
    TFX is an open source library with standard support from the community

    Your team works for a startup company with Google Cloud. You develop, train and deploy several ML models with Tensorflow. You use data in Parquet format and need to manage it both in input and output. You want the smoothest solution without adding infrastructure and keeping costs down.

    Which one of the following options do you follow?
    Cloud Dataproc
    Cloud Dataproc is the managed Hadoop service in GCP. It uses Parquet but not Tensorflow out of the box. Furthermore, it’d be an additional cost.
    TensorFlow I/O
    TensorFlow I/O is a set of useful file formats, Dataset, streaming, and file system types management not available in TensorFlow's built-in support, like Parquet.
    So the integration will be immediate without any further costs or data transformations.
    Apache Parquet is an open-source column-oriented data storage format born in the Apache Hadoop environment but supported in many tools and used for data analysis.
    Dataflow Flex Template
    There will be an additional cost and additional data transformations.
    BigQuery to TFRecords
    There will be an additional cost and additional data transformations.

    You are starting to operate as a Data Scientist and speaking with your mentor who asked you to prepare a simple model with a lazy learning algorithm.

    The problem is that you don’t know the meaning of lazy learning; so you looked for it.

    Which of the following methods uses lazy learning?
    Naive Bayes
    Naive Bayes is a classification algorithm. The features have to be independent. It requires a small amount of training data.
    K-Nearest Neighbors
    K-nearest neighbor is a simple supervised algorithm for both classification and regression problems.
    You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter.
    Logistic Regression
    With Logistic Regression you have to train the model and figure out the parameters of a specific function that best fit the data before the inference.
    Simple Neural Networks
    With Neural Networks you have to train the model and figure out the parameters of a specific function that best fit the data before the inference.
    Semi-supervised learning
    Semi-supervised learning is a family of classification algorithms with labeled and unlabeled data and methods to organize examples based on similarities and clustering. They have to set up a model and find parameters with training jobs.

    Your company traditionally deals with the statistical analysis of data. The services have been integrated with ML models for forecasting for some years, but analyzes and simulations of all kinds are carried out.

    So you are using two types of tools. But you have been told that it is possible to have more levels of integration between traditional statistical methodologies and those more related to AI / ML processes.

    Which tool is the best one for your needs?
    TensorFlow Hub
    It doesn’t deal with traditional statistical methodologies.
    TensorFlow Probability
    TensorFlow Probability is a Python library for statistical analysis and probability, which can be processed on TPU and GPU.
    TensorFlow Probability main features are:
    • Probability distributions and differentiable and injective (one to one) functions.
    • Tools for deep probabilistic models building.
    • Inference and Simulation methods support: Markov chain, Monte Carlo.
    • Optimizers such as Nelder-Mead, BFGS, and SGLD.
    TensorFlow Enterprise
    It doesn’t deal with traditional statistical methodologies.
    TensorFlow Statistics
    It doesn’t deal with traditional statistical methodologies.

    Your team works for an international company with Google Cloud. You develop, train and deploy different ML models. You use a lot of tools and techniques and you want to make your work leaner, faster and more efficient.

    Now you have the problem that you have to create a model for recognizing photographic images related to collaborators and consultants. You have to do it quickly, and it has to be an R-CNN model. You don't want to start from scratch. So you are looking for something that can help you and that can be optimal for the GCP platform.

    Which of these tools do you think can help you?
    TensorFlow-hub
    TensorFlow Hub is ready to use repository of trained machine learning models.
    It is available for reusing advanced trained models with minimal code.
    The ML models are optimized for GCP.

    GitHub
    GitHub is public and for any kind of code.
    GCP Marketplace Solutions
    GCP Marketplace Solutions is a solution that lets you select and deploy software packages from vendors.
    BigQuery ML Open
    BigQuery ML Open is related to Open Data.

    You work in a large company that produces luxury cars. The following models will have a control unit capable of collecting data on mileage and technical status to allow intelligent management of maintenance by both the customer and the service centers.

    Every day a small batch of data will be sent that will be collected and processed in order to provide customers with the management of their vehicle health and push notifications in case of important messages.

    Which GCP products are the most suitable for this project?
    Pub/Sub
    Pub/Sub for technical data messages
    DataFlow
    DataFlow for data management both in streaming and in batch mode
    DataFlow manages data pipelines directed acyclic graphs (DAG) of transformations (PTransforms) on data (PCollections).
    The same pipeline can activate multiple PTransforms.
    All the processing can be performed both in batch and in streaming mode.
    So, in our case of streaming data, Dataflow can:
    • Serialize input data
    • Preprocess and transform data
    • Call the inference function
    • Get the results and postprocess them
    Dataproc
    Dataproc is the managed Apache Hadoop environment for big data analysis usually for batch processing.
    Firebase Messaging
    Firebase Messaging for push notifications

    Your company does not have a great ML experience. Therefore they want to start with a service that is as smooth, simple and managed as possible.

    The idea is to use BigQuery ML. Therefore you are considering whether it can cover all the functionality you need. Various projects start with the design and set up various models using various techniques and algorithms in your company.

    Which of these techniques/algorithms is not supported by BigQuery ML?
    Wide-and-Deep DNN models
    ARIMA
    Ensamble Boosted Model
    CNN
    The convolutional neural network (CNN) is a type of artificial neural network extensively used especially for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels.
    It is not supported because it is specialized for images.

    Your team is working on a great number of ML projects.

    You need to appropriately collect and transform data and then create and tune your ML models.

    In a second moment, these procedures will be inserted in an MLOps flow and therefore will have to be automated and be as simple as possible.

    What are the methodologies / services recommended by Google?
    Dataflow
    Dataflow is an optimal solution for compute-intensive preprocessing operations because it is a fully managed autoscaling service for batch and streaming data processing.
    BigQuery
    BigQuery is a strategic tool for GCP. BigData at scale, machine learning, preprocessing with plain SQL are all important factors.
    Tensorflow
    TensorFlow has many tools for data preprocessing and transformation operations.
    Main techniques are aimed to feature engineering (crossed_column, embedding_column, bucketized_column) and data transformation (tf.Transform library).
    Cloud Fusion
    Cloud Fusion is for ETL with a GUI, so with limited programming.
    Dataprep
    Dataprep is a tool for visual data cleaning and preparation.

    Your team is preparing a Deep Neural Network custom model with Tensorflow in Vertex AI that forecasts, based on diagnostic images, medical diagnoses. It is a complex and demanding job. You want to get help from GCP for hyperparameter tuning.

    What are the parameters that you must indicate?
    learning_rate
    parameterServerType
    parameterServerType is a parameter for infrastructure set up for a training job.
    machineType
    machineType is a parameter for infrastructure set up for a training job.
    num_hidden_layers

    Your team needs to create a model for managing security in restricted areas of campus.

    Everything that happens in these areas is filmed. Instead of having a physical surveillance service, the videos must be managed by a model capable of intercepting unauthorized people and vehicles, especially at particular times.

    What are the GCP services that allow you to achieve all this with minimal effort?
    AI Infrastructure
    AI Infrastructure allows you to manage hardware configurations for ML systems and, in particular, the processors used to accelerate machine learning workloads.
    Video Intelligence API
    Video Intelligence API is a pre-configured and ready-to-use service, therefore not configurable for specific needs.
    AutoML
    AutoML allows you to customize the pre-trained Video GCP system according to your specific needs.
    In particular, AutoML object tracking allows you to identify and locate particular entities of interest to you with your specific tags.

    Vision API
    Vision API is for images and not video.

    Your client has a large e-commerce Website that sells sports goods and especially scuba diving equipment.

    It has a seasonal business and has collected a lot of sales data from its structured ERP and market trend databases.

    It wants to predict the demand of its customers both to increase business and improve logistics processes.

    What managed and fast-to-use GCP products can be used for these types of models?
    AutoML
    BigQuery ML
    KubeFlow
    KubeFlow is an open-source libraries that work with Tensorflow. So, they are not managed and so simple.
    Moreover, it can work in an environment outside GCP that is a big advantage, but it is not in our requirements.
    Kubeflow is a system for deploying, scaling and managing complex Tensorflow systems on Kubernetes.
    TFX
    TFX is an open-source libraries that work with Tensorflow. So, they are not managed and so simple.
    Moreover, it can work in an environment outside GCP that is a big advantage, but it is not in our requirements.
    TFX is a platform that allows you to create scalable production ML pipelines for TensorFlow projects.

    You are consulting a CIO of a big firm regarding organization and cost optimization for his company's ML projects in GCP.

    He asked: “How can I get the most from ML services and the least costs?”

    What are the best practices recommended by Google in this regard?
    Use Notebooks as ephemeral instances
    It's incomplete
    Set up an automatic shutdown routine
    It's incomplete
    Use Preemptible VMs per long-running interrumpible tasks
    It's incomplete
    Get monitoring alerts about GPU usage
    It's incomplete
    All of the above
    Notebooks are used for a limited time, but they reserve VM and other resources. So you have to treat them as ephemeral instances, not as long-living ones.
    You can configure an automatic shutdown routine when your instance is idle, saving money.
    Preemptible VMs are far cheaper than normal instances and are OK for long-running (batch) large experiments.
    You can set up the GPU metrics reporting script; it is important because GPU is expensive.

    Your team is working with a great number of ML projects, especially with Tensorflow.

    You have to prepare a demo for the Manager and Stakeholders. You are certain that they will ask you about the understanding of the classification and regression mechanism. You’d like to show them an interactive demo with some cool interference.

    Which of these tools is best for all of this?
    Tensorboard
    Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard.
    Tableau
    Tableau is a graphical tool for data reporting.
    What-If Tool
    The What-If Tool (WIT) is an open-source tool that lets you visually understand classification and regression ML models.
    It lets you see data points distributions with different shapes and colors and interactively try new inferences.
    Moreover, it shows which features affect your model the most, together with many other characteristics.
    All without code.

    Looker
    Looker is a graphical tool for data reporting.
    LIT
    LIT is for NLP models.

    Your team is working with a great number of ML projects, especially with Tensorflow.

    You recently prepared an NLP model that works well and is about to be rolled out in production.

    You have to prepare a demo for the Manager and Stakeholders for your new system of text and sentiment interpretation. You are certain that they will ask you for explanations and understanding about how a software may capture human feelings. You’d like to show them an interactive demo with some cool interference.

    Which of these tools is best for all of this?
    Tensorboard
    Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard.
    Tableau
    Tableau is a graphical tool for data reporting.
    What-If Tool
    What-If Tool is for classification and regression models with structured data.
    Looker
    Looker is a graphical tool for data reporting.
    LIT
    The Language Interpretability Tool (LIT) is an open-source tool developed specifically to explain and visualize NLP natural language processing models.
    It is similar to the What-If tool, which instead targets classification and regression models with structured data.
    It offers visual explanations of the model's predictions and analysis with metrics, tests and validations.


    Your team is working with a great number of ML projects, especially with Tensorflow.

    You recently prepared a DNN model for image recognition that works well and is about to be rolled out in production.

    Your manager asked you to demonstrate the inner workings of the model.

    It is a big problem for you because you know that it is working well but you don’t have the explainability of the model.

    Which of these techniques could help you?
    Integrated Gradient
    Integrated Gradient is an explainability technique for deep neural networks which gives info about what contributes to the model’s prediction.
    Integrated Gradient works highlight the feature importance. It computes the gradient of the model’s prediction output regarding its input features without modification to the original model.
    LIT
    LIT is only for NLP models
    WIT
    What-If Tool is only for classification and regression models with structured data.
    PCA
    Principal component analysis (PCA) transforms and reduces the number of features by creating new variables, from linear combinations of the original variables.
    The new features will be all independent of each other.

    You are working on a linear regression model with data stored in BigQuery. You have a view with many columns. You want to make some simplifications for your work and avoid overfitting. You are planning to use regularization. You are working with Bigquery ML and preparing the query for model training. You need an SQL statement that allows you to have all fields in the view apart from the label.

    Which one do you choose?
    ROLLUP
    ROLLUP is a group function for subtotals.
    UNNEST
    UNNEST gives the elements of a structured file.
    EXCEPT
    SQL and Bigquery are powerful tools for querying and manipulating structured data.
    EXCEPT gives all rows or fields on the left side except the one coming from the right side of the query.
    Example:
    SELECT
    EXCEPT(mylabel) myvalue AS label
    LAG
    LAG returns the field value on a preceding row.

    Your team is preparing a multiclass logistic regression model with tabular data.

    The environment is Vertex AI with AutoML, and your data is stored in a CSV file in Cloud Storage.

    AutoML can perform transformations on the data to make the most of it.

    Which of the following types of transformations are you not allowed, based on your requirements?
    Categorical
    Text
    Timestamp
    Array
    With complex data like Arrays and Structs, transformations are available only by using BigQuery, which supports them natively.
    All the other kinds of data are also supported for CSV files, as stated in the referred documentation.
    Number

    You are a junior Data Scientist, and you work in a Governmental Institution.

    You are preparing data for a linear regression model for Demographic research. You need to choose and manage the correct feature.

    Your input data is in BigQuery.

    You know very well that you have to avoid multicollinearity and optimize categories. So you need to group some features together and create macro categories.

    In particular, you have to join country and language in one variable and divide data between 5 income classes.

    Which ones of the following options can you use?
    FEATURE_CROSS
    A feature cross is a new feature that joins two or more input features together. (The term cross comes from cross product.) Usually, numeric new features are created by multiplying two or more other features.
    ARRAY_CONCAT
    ARRAY_CONCAT joins one or more arrays (number or strings) into a single array.
    QUANTILE_BUCKETIZE
    QUANTILE_BUCKETIZE groups a continuous numerical feature into categories with the bucket name as the value based on quantiles.
    Example: ML.FEATURE_CROSS STRUCT(country, language) AS origin)
    and ML.QUANTILE_BUCKETIZE → income_class
    ST_AREA
    ST_AREA returns the number of square meters covered by a GEOGRAPHY area.

    You are a junior Data Scientist and you need to create a multi-class classification Machine Learning model with Keras Sequential model API.

    You have been asked which activation function to use.

    Which of the following do you choose?
    ReLU
    ReLU (Rectified Linear Unit): half rectified. f(z) is zero when z is less than zero and f(z) is equal to z when z. It returns one value
    Softmax
    Softmax is for multi-class classification what Sigmoid is for logistic regression. Softmax assigns decimal probabilities to each class so that their sum is 1.

    SIGMOID
    Sigmoid is for logistic regression and therefore returns one value from 0 to 1.
    TANH
    Tanh or hyperbolic tangent is like sigmoid but returns one value from -1 to 1.

    Your team is working on a great number of ML projects for an international consulting firm.

    The management has decided to store most of the data to be used for ML models in BigQuery.

    The motivation is that BigQuery allows for preprocessing and transformations easily and with standard SQL. It is highly structured; so it offers efficiency, integration and security.

    Your team must create and modify code to directly access BigQuery data for building models in different environments.

    What are the tools you can use?
    Tf.data.dataset
    tf.data.dataset reader for BigQuery is the way to connect directly to BigQuery from TensorFlow or Keras.
    BigQuery Omni
    BigQuery Omni is a multi-cloud analytics solution. You can access from BigQuery data across Google Cloud, Amazon Web Services (AWS), and Azure.
    BigQuery Python client library
    For any other framework, you can use BigQuery Python client library
    BigQuery I/O Connector
    BigQuery I/O Connector is the way to connect directly to BigQuery from Dataflow.

    Your team has prepared a Multiclass logistic regression model with tabular data in the Vertex AI with AutoML environment. Everything went very well. You appreciated the convenience of the platform and AutoML.

    What other types of models can you implement with AutoML?
    Image Data
    Text Data
    Cluster Data
    Cluster Data may be related to unsupervised learning; that is not supported by AutoML.
    Video Data

    With your team, you have to decide the strategy for implementing an online forecasting model in production. This template needs to work with both a web interface as well as DialogFlow and Google Assistant. A lot of requests are expected.

    You are concerned that the final system is not efficient and scalable enough. You are looking for the simplest and most managed GCP solution.

    Which of these can be the solution?
    Vertex AI online prediction
    The Vertex AI prediction service is fully managed and automatically scales machine learning models in the cloud.
    The service supports both online prediction and batch prediction.

    GKE e TensorFlow
    GKE e TensorFlow are not managed services.
    VMs and Autoscaling Groups with Application LB
    VMs and Autoscaling Groups with Application LB are not managed services.
    Kubeflow
    Kubeflow is not a managed service. It is used in Vertex AI and lets you deploy ML systems in various environments.

    You work in a medium-sized company as a developer and data scientist and use the managed ML platform, Vertex AI.

    You have updated an AutoML model and want to deploy it to production. But you want to maintain both the old and the new version at the same time. The new version should only serve a small portion of the traffic.

    What can you do?
    Save the model in a Docker container image
    You don’t have to create a Docker container image with AutoML.
    Deploy on the same endpoint
    Update the Traffic split percentage
    Create a Canary Deployment with Cloud Build
    Canary Deployment with Cloud Build is a procedure used in CI/CD pipelines. There is no need in such a managed environment.

    You and your team are working for a large consulting firm. You are preparing an NLP ML model to classify customer support needs and to assess the degree of satisfaction. The texts of the various communications are stored in different storage.

    What types of storage should you avoid in the managed environment of GCP ML, such as Vertex AI?
    Cloud Storage
    BigQuery
    Filestore
    Block Storage

    You are working with Vertex AI, the managed ML Platform in GCP.

    You want to leverage Explainable AI to understand which are the most essential features and how they influence the model.

    For what kind of model may you use Vertex Explainable AI?
    AutoML
    Image Classification
    DNN
    Decision Tree
    Decision Tree Models are explainable without any sophisticated tool for enlightenment.

    You work as a Data Scientist in a Startup and you work with several project with Python and Tensorflow;

    You need to increase the performance of the training sessions and you already use caching and prefetching.

    So now you want to use GPUs, but in a single machine, for cost reduction and experimentations.

    Which of the following is the correct strategy?
    tf.distribute.MirroredStrategy
    tf.distribute.Strategy is an API explicitly for training distribution among different processors and machines.
    tf.distribute.MirroredStrategy lets you use multiple GPUs in a single VM, with a replica for each CPU.

    tf.distribute.TPUStrategy
    tf.distribute.TPUStrategy let you use TPUs, not GPUs
    tf.distribute.MultiWorkerMirroredStrategy
    tf.distribute.MultiWorkerMirroredStrategy is for multiple machines
    tf.distribute.OneDeviceStrategy
    tf.distribute.OneDeviceStrategy, like the default strategy, is for a single device, so a single virtual CPU.

    You work as a junior Data Scientist in a Startup and work with several projects with Python and Tensorflow in Vertex AI. You deployed a new model in the test environment and detected some problems that are puzzling you.

    An experienced colleague of yours asked for the logs. You found out that there is no logging information available. What kind of logs do you need and how do you get them?
    You need to Use Container logging
    You need to Use Access logging
    You can enable logs dynamically
    You have to undeploy and redeploy

    You are a junior Data Scientist working on a logistic regression model to break down customer text messages into two categories: important / urgent and unimportant / non-urgent.

    You want to find a metric that allows you to evaluate your model for how well it separates the two classes. You are interested in finding a method that is scale invariant and classification threshold invariant.

    Which of the following is the optimal methodology?
    Log Loss
    Log Loss is a loss function used especially for logistic regression; it measures loss. So it is highly dependent on threshold values.
    One-hot encoding
    One-hot encoding is a method used in feature engineering for obtaining better regularization and independence.
    ROC- AUC
    The ROC curve (receiver operating characteristic curve) is a graph showing the behavior of the model with positive guesses at different classification thresholds.
    It plots and relates each others two different values:
    • TPR: true positives / all actual positives
    • FPR: false positives / all actual negatives
    The AUC (Area Under the Curve) index is the area under the ROC curve and indicates the capability of a binary classifier to discriminate between two categories. Being a probability, it is always a value between 0 and 1. Hence it is a scale invariant.
    It provides divisibility between classes. So it is independent of the chosen threshold value; in other words, it is threshold-invariant.
    When it is equal, it is 0.5 indicating that the model randomly foresees the division between two classes, similar to what happens with heads and tails when tossing coins.

    Mean Square Error
    Mean Square Error is the most frequently used loss function used for linear regression. It takes the square of the difference between predictions and real values.
    Mean Absolute Error
    Mean Absolute Error is a loss function, too. It takes the absolute value of the difference between predictions and actual outcomes.

    You work as a junior Data Scientist in a consulting company and work with several projects with Tensorflow. You prepared and tested a new model, and you are optimizing it before deploying it in production. You asked for advice from an experienced colleague of yours. He said that it is not advisable to deploy the model in eager mode.

    What can you do?
    Configure eager_execution=no
    There is no such parameter as eager_execution = no. Using graphs instead of eager execution is more complex than that.
    Use graphs
    Use tf.function decoration function
    Create a new tf.Graph

    In your company, you train and deploy several ML models with Tensorflow. You use on-prem servers, but you often find it challenging to manage the most expensive training.

    Checking and updating models create additional difficulties. You are undecided whether to use Vertex Pipelines and Kubeflow Pipelines. You wonder if starting from Kubeflow, you can later switch to a more automated and managed system like Vertex AI.

    Which of these answers are correct?
    Kubeflow pipelines and Vertex Pipelines are incompatible
    You may use Kubeflow Pipelines written with DSL in Vertex AI
    Kubeflow pipelines work only in GCP
    Kubeflow pipelines may work in any environment
    Kubeflow pipelines may use Kubernetes persistent volume claims (PVC)
    Vertex Pipelines can use Cloud Storage FUSE

    Your company runs a big retail website. You develop many ML models for all the business activities.

    You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow, and BigQuery ML. You also use BigTable and CloudSQL, and Cloud Storage, of course. You need to use input tabular data in CSV format. You are working with Vertex AI.

    How do you manage them in the best way?
    Vertex AI manage any CSV automatically, no operations needed
    You have to setup an header and column names may have only alphanumeric character and underscore
    Vertex AI cannot handle CSV files
    Delimiter must be a comma
    You can import only a file max 10GB
    You can import multiple files, each one max 10GB.

    Your company is a Financial Institution. You develop many ML models for all the business activities. You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow, and BigQuery ML.

    You are now working on an international project with other partners. You need to use the Vertex AI. You are asking experts which the capabilities of this managed suite of services are.

    Which elements are integrated into Vertex AI?
    Training environments and MLOps
    Training Pipelines, Datasets, Custom tooling, AutoML, Models Management and inference environments (endpoints)
    Vertex AI covers all the activities and functions listed: from Training Pipelines (so MLOps), to Data Management (Datasets), custom models and AutoML models management, custom tooling and libraries deployment and monitoring.
    So, all the other answers are wrong because they cover only a subset of Vertex functionalities.

    Deployment environments
    Training Pipelines and Datasets for data sources

    You are a junior data scientist working on a logistic regression model to break down customer text messages into important/urgent and important / not urgent. You want to use the best loss function that you can use to determine your model's performance.

    Which of the following is the optimal methodology?
    Log Loss
    With a logistic regression model, the optimal loss function is the log loss.
    The intuitive explanation is that when you want to emphasize the loss of bigger mistakes, you need to find a way to penalize such differences.
    In this case, it is often used the square loss. But in the case of probabilistic values (between 0 and 1), the squaring decreases the values; it does not make them bigger.
    On the other hand, with a logarithmic transformation, the process is reversed: decimal values get bigger.
    In addition, logarithmic transformations do not modify the minimum and maximum characteristics (monotonic functions).
    These are some of the reasons why they are widely used in ML.
    Pay attention to the difference between loss function and ROC/AUC, which is useful as a measure of how well the model can discriminate between two categories.
    You may have two models with the same AUC but different losses.

    Mean Square Error
    Mean Square Error, as explained, would penalize higher errors.
    Mean Absolute Error
    Mean Absolute Error takes the absolute value of the difference between predictions and actual outcomes. So, it would not empathize higher errors.
    Mean Bias Error
    Mean Bias Error takes just the value of the difference between predictions and actual outcomes. So, it compensates positive and negative differences between predicted/actual values. It is used to calculate the average bias in the model.
    Softmax
    softmax is used in multi-class classification models which is clearly not suitable in the case of a binary-class logarithmic loss.

    You have just started working as a junior Data Scientist in a Startup. You are involved in several projects with Python and Tensorflow in Vertex AI.
    You are starting to get interested in MLOps and are trying to understand the different processes involved.
    You have prepared a checklist, but inside there is a service that has nothing to do with MLOps.

    Which one?
    CI/CD
    Source Control Tools
    Data Pipelines
    CDN
    Cloud CDN is the service that caches and delivers static content from the closest locations (edge locations) to customers to accelerate web and mobile applications. This is a very important service for the Cloud but out of scope for MLOps.
    MLOps covers all processes related to ML models; experimentation, preparation, testing, deployment and above all continuous integration and delivery.
    The MLOps environment is designed to provide (some of) the following:
    • Environment for testing and experimentation
    • Source control, like Github
    • CI/CD Continuous integration/continuous delivery
    • Container registry: custom Docker images management
    • Feature Stores
    • Training services
    • Metadata repository
    • Artifacts repository
    • ML pipelines orchestrators
    • Data warehouse/ storage and scalable data processing for batch and streaming data.
    • Prediction service both batch and online.

    So, all the other answers describe MLOps functionalities.
    Artifact Registry, Container Registry

    You are working with Vertex AI, the managed ML Platform in GCP.

    You want to leverage Vertex Explainable AI to understand the most important features and how they influence the model.

    Which three methods does Vertex AI leverage for feature attributions?
    sampled Shapley
    integrated gradients
    Maximum Likelihood
    Maximum Likelihood is a probabilistic method for determining the parameters of a statistical distribution.
    XRAI

    Your company produces and sells a lot of different products.

    You work as a Data Scientist. You train and deploy several ML models.

    Your manager just asked you to find a simple method to determine affinities between different products and categories to give sellers and applications a wider range of suitable offerings for customers.

    The method should give good results even without a great amount of data.

    Which of the following different techniques may help you better?
    One-hot encoding
    One-hot encoding is a method used in feature engineering for obtaining better regularization and independence.
    Cosine Similarity
    In a recommendation system (like with the Netflix movies) it is important to discover similarities between products so that you may recommend a movie to another user because the different users like similar objects.
    So, the problem is to find similar products as a first step.
    You take two products and their characteristics (all transformed in numbers). So, you have two vectors.
    You may compute differences between vectors in the euclidean space. Geometrically, that means that they have different lengths and different angles.

    Matrix Factorization
    Matrix Factorization is correctly used in recommender systems. Still, it is used with a significant amount of data, and there is the problem of reducing dimensionality.
    PCA
    Principal component analysis is a technique to reduce the number of features by creating new variables.

    Your company runs a big retail website. You develop many ML models for all the business activities.

    You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow and BigQuery ML.

    You are now working on an international project with other partners.

    You need to let them use your Vertex AI dataset in Cloud Storage for a different organization.

    What can you do?
    Let them use your GCP Account
    It is wrong mainly for security reasons.
    Exporting metadata and annotations in a JSONL file
    Exporting metadata and annotations in a CSV file
    Annotations are written in JSON files.
    Give access (Service account or signed URL) to the Cloud Storage file
    Copy the data in a removable storage
    It is wrong mainly for security reasons.

    You work as a junior Data Scientist in a consulting company, and you work with several ML projects.

    You need to properly collect and transform data and then work on your ML models. You want to identify the services for data transformation that are most suitable for your needs. You need automatic procedures triggered before training.

    What are the methodologies / services recommended by Google?
    Dataflow
    BigQuery
    Tensorflow
    Cloud Composer
    Cloud Composer is often used in ML processes, but as a workflow tool, not for data transformation.

    You just started working as a junior Data Scientist in a consulting Company. You are in a project team that is building a new model and you are experimenting. But the results are absolutely unsatisfactory because your data is dirty and needs to be modified.
    In particular, you have various fields that have no value or report NaN. Your expert colleague told you that you need to carry out a procedure that modifies them at the time of acquisition. What kind of functionalities do you need to provide?
    Delete all records that have a null/NaN value in any field
    The common practice is to delete records / examples that are completely wrong or completely lacking information (all null values).
    Compute Mean / Median for numeric measures
    Replace Categories with the most frequent one
    Use another ML model for missing values guess

    You just started working as a junior Data Scientist in a consulting Company.

    The job they gave you is to perform Data cleaning and correction so that they will later be used in the best possible way for creating and updating ML models.

    Data is stored in files of different formats.

    Which GCP service is best to help you with this business?
    BigQuery
    BigQuery could obviously query and update data. But you need to preprocess data and prepare queries and procedures.
    Dataprep
    Dataprep is an end-user service that allows you to explore, clean and prepare structured and unstructured data for many purposes, especially for machine learning.
    It is completely serverless. You don’t need to write code or procedures.

    Cloud Compose
    Cloud Compose is for workflow management, not for Data preparation.
    Dataproc
    Dataproc is a fully managed service for the Apache Hadoop environment.

    You are supporting a group of data analysts who want to build ML models using a managed service. They also want the ability to customize their models and tune hyperparameters. What managed service in Google Cloud would you recommend?
    Vertex AI custom training
    Vertex AI custom training allows for tuning hyperparameters.
    Vertex AI AutoML
    Vertex AI AutoML training tunes hyperparameters for you.
    Cloud TPUs
    Cloud TPUs are accelerators you can use to train large deep learning models.
    Cloud GPUs
    Cloud GPUs are accelerators you can use to train large deep learning models.

    You have created a Compute Engine instance with an attached GPU but the GPU is not used when you train a Tensorflow model. What might you do to ensure the GPU can be used for training your models?
    Install GPU drivers
    GPU drivers need to be installed if they are not installed already when using GPUs.
    Deep Learning VM images have GPU drivers installed but if you don't use an image with GPU drivers installed, you will need to install them.
    Use Pytorch instead of Tensorflow
    Using Pytorch instead of Tensorflow will require work to recode and Pytorch would not be able to use GPUs either if the drivers are not installed.
    Grant the Editor basic role to the VM service account
    Granting a new role to the service account of the VM will not address the need to install GPU drivers.
    Update Python 2.7 on the VM
    Updating Python will not address the problem of missing drivers.

    A financial services company wants to implement a chatbot service to help direct customers to the best customer support team for their questions. What GCP service would you recommend?
    Text-to-Speech API
    Text-to-Speech converts text words to human voice-like sound.
    Speech-to-Text API
    Speech-to-Text converts spoken words to written words.
    AutoML
    AutoML is a machine learning service.
    Dialogflow
    Dialogflow is a service for creating conversational user interfaces.

    You lead a team of machine learning engineers working for an IoT startup. You need to create a machine learning model to predict the likelihood of a device failure in manufacturing environments. The device generates a stream of metrics every 60 seconds. The metrics include 2 categorical values, 7 integer values, and 1 floating point value. The floating point value ranges from 0 to 100. For the purposes of the model, the floating point value is more precise than needed. Mapping that value to a feature with possible values "high", "medium", and "low" is sufficient. What feature engineering technique would you use to transform the floating point value to high, medium, or low?
    L1 Regularization
    Regularization is the limiting of information captured by a model to prevent overfishing;
    L1 and L2 are two examples of regularization techniques.
    Normalization
    Normalization is a transformation that scales numeric values to the range 0 to 1.
    Bucketing
    In this case, values from 0 to 33 could be low, 34 to 66 could be medium, and values greater than 66 could be high.
    L2 Regularization
    Regularization is the limiting of information captured by a model to prevent overfishing;
    L1 and L2 are two examples of regularization techniques.

    You have trained a machine learning model. After training is complete, the model scores high on accuracy and F1 score when measured using training data; however, when validation data is used, the accuracy and F1 score are much lower. What is the likely cause of this problem?
    Overfitting
    This is an example of overfitting because the model has not generalized from the training data.
    Underfitting
    Underfitting would have resulted in poor performance with training data.
    Insufficiently complex model
    Insufficiently complex models can lead to underfitting but not overfitting.
    The learning rate is too small
    A small learning rate will lead to longer training times but would not cause the described problem.

    You are building a machine learning model using random forests. You haven't achieved the precision and recall you would like. What hyperparameter or hyperparameters would you try adjusting to improve accuracy?
    Number of trees only
    It's incomplete
    Number of trees and depth of trees
    Both are hyperparameters that could be adjusted to improve accuracy.
    Number of clusters
    Random forests do not use the concept of clusters.
    Learning rate
    Random forest algorithms does not use a learning rate hyperparameter.

    A logistics analyst wants to build a machine learning model to predict the number of units of a product that will need to be shipped to stores over the next 30 days. The features they will use are all stored in a relational database. The business analyst is familiar with reporting tools but not programming in general. What service would you recommend the analyst use to build a model?
    Spark ML
    Spark ML is suitable for modelers with programming skills.
    AutoML
    It uses structured data to build models with little input from users.
    Bigtable ML
    There is no Bigtable ML but BigQuery ML is a managed service for building machine learning models in BigQuery using SQL.
    TensorFlow
    Tensorflow is suitable for modelers with programming skills.

    When testing a regression model to predict the selling price of houses. After several iterations of model building, you note that small changes in a few features can lead to large differences in the output. This is an example of what kind of problem?
    Low variance
    Low variance is desired in ML models and is not a problem.
    High variance
    Low bias
    Low bias is desired in ML models and is not a problem.
    High bias
    High bias occurs when relationships are missed.

    You are an ML engineer with a startup building machine learning models for the pharmaceutical industry. You are currently developing a deep learning machine learning model to predict the toxicity of drug candidates. The training data set consists of a large number of chemical and physical attributes and there is a large number of instances. Training takes several days on an n2 type Compute Engine virtual machine. What would you recommend to reduce the training time without compromising the quality of the model?
    Use TPUs
    TPUs are designed to accelerate the dominant computation in deep learning model training.
    Randomly sample 20% of the training set and train on that smaller data set
    Using a smaller data set by sampling would reduce training time but would likely compromise the quality of the model.
    Increase the machine size to make more memory available
    Increasing memory may reduce training time if memory is constrained but it will not decrease training time as much as other option.
    Increase the machine size to make more CPUs available
    Increasing CPUs would improve performance but not as much or as other option.

    You want to evaluate a classification model using the True Positive Rate and the False Positive Rate. You want to view a graph showing the performance of the model at all classification thresholds. What evaluation metric would you use?
    Area under the ROC curve (AUC)
    Area under the ROC curve (AUC) is a graph of True Positive and False Positive rates.
    Precision
    Precision is a measure of the quality of positive predictions.
    F1 Score
    F1 Score is a harmonic mean of precision and recall.
    L2 Regularization
    L2 Regularization is a technique to prevent overfitting.

    You are building a machine learning model and during the data preparation stage, you preform normalization and standardization using the full data set. You then split the full data set into training, validation, and testing data sets. What problem could be introduced by performing the steps in the order described?
    Regularization
    Regularization is a technique to prevent overfitting.
    Data leakage
    This is an example of data leakage because you are making additional data available during training that would not be available when running predictions, in this case, additional information is used to perform normalization and standardization.
    Introduction of bias
    No bias is introduced
    Imbalanced classes
    There is no indication that classes are imbalanced

    A simple model based on hand-coded heuristics or a simple algorithms such as a linear model is often built early in the model training process. What is the purpose of such as model?
    It provides a baseline for the minimum performance to expect in an ML model
    It provides the maximum expected performance in an ML model
    Simple models do not provide indication of maximum performance.
    It provides a measure of the likelihood of underfitting
    A simple model could underfit and would be expected.
    It provides a measure of the likelihood of overfitting
    Simple models are not likely to overfit.

    What characteristics of feature values do we try to find when using descriptive statistics for data exploration?
    Central tendency only
    It's incomplete
    Spread of values only
    It's incomplete
    Central tendency and spread of values
    Descriptive statistics are used to measure both central tendency and the spread of values.
    Likelihood to contribute to a prediction
    The likelihood of contributing to a prediction is not measured until after a model is created.

    You are building a classification model to detect fraud in credit card transactions. When exploring the training data set you notice that 2% of instances are examples of fraudulent transactions and 98% are legitimate transactions. This is an example of what kind of data set?
    An imbalanced data set
    This is an imbalanced data set because one class has significantly more instances than the others.
    A standardized data set
    Standardization is a technique for preparing the data set.
    A normalized data set
    Normalization is a technique for preparing the data set.
    A marginalized data set
    There is no such thing as a marginalized data set in machine learning.

    Which of the following techniques can be used when working with imbalanced data sets?
    Collecting more data
    It's incomplete
    Resampling
    It's incomplete
    Generating synthetic data using an algorithm such as SMOTE
    It's incomplete
    All of the above

    A team of machine learning engineers is training an image recognition model to detect defects in manufactured parts. The team has a data set of 10,000 images but wants to train with at least 30,000 images. They do not have time to wait for an additional set of 20,000 images to be collected on the factory floor. What type of technique could they use to produce a data set with 30,000 images?
    Normalization
    Normalization is a data preparation technique.
    Data augmentation
    Data augmentation is a set of techniques for artificially increasing the number of instances in a data set by manipulating other instances.
    Data leakage
    Data leakage is the use of data in training that is not available during prediction and is unwanted.
    Imbalanced classes
    Imbalanced classes is not a technique for expanding the size of a dataset.

    You are using distributed training with TensorFlow. What type of server stores parameters and coordinates shared model state across workers?
    Parameter servers
    Parameter servers store model parameters and share state.
    State servers
    There is no state servers.
    Evaluators
    Evaluators evaluate models.
    Primary replica
    Primary replicas manage other nodes.

    A dataset includes multiple categorical values. You want to train a deep learning neural network using the data set. Which of the following would be an appropriate data encoding scheme?
    One-hot encoding
    One-hot encoding is an appropriate encoding technique to map categorical values to a bit vector.
    Categorical encoding
    Categorical values themselves are not suitable input to a deep learning network.
    Regularization
    Regularization is is used to prevent overfitting.
    Normalization
    Normalization is a data preparation operation.

    A dataset you are using has categorical values mapped to integer values, such as red to 1, blue to 2, and green to 3. What kind of encoding scheme is this?
    One-hot encoding
    One-hot encoding maps to a bit vector with only one bit set to one.
    Feature hashing
    Feature hashing applies a hash function to compute a representation.
    Ordinal encoding
    Data augmentation
    Data augmentation is not an encoding scheme, it is a set of techniques for increasing the size of a data set.

    Which of the following are ways bias can be introduced in a machine learning model?
    Biased data distribution
    Biased data can introduce bias in a machine model.
    Proxy variables
    Proxy variables can introduce bias in a machine model.
    Data leakage
    Data leakage can cause problems but is not likely to introduce bias that isn't already in the data set.
    Data augmentation
    Data augmentation can continue to represent bias in a data set but does not introduce new bias.
    Normalization
    Normalization is a data preparation operations.

    A machine learning engineer detects non-linear relationships between two variables in a dataset. The dataset is relatively small and it is expensive to acquire new examples. What can the machine learning engineer do to increase the performance of the model with respect to the non-linear relationship detected?
    Use a deep learning network
    A deep learning network can also learn non-linear relationships but they require large volumes of data.
    Use regularization
    Regularization is a set of techniques for preventing overfitting.
    Create a feature cross
    A feature cross could capture the non-linear relationship.
    Use data leakage
    Data leakage is unwanted in a machine learning model.

    You have a dataset with more features than you believe you need to train a model. You would like to measure how well two numerical values linearly correlate so you can eliminate one of them if they highly correlate. What statistical test would you use?
    Pearson's Correlation
    The Pearson's Correlation is used for measuring the linear correlation between two variables.
    ANOVA
    ANOVA is used to measure the difference among means.
    Kendall's Rank Coefficient
    Kendall's Rank Coefficient is used for measuring numeric and categorical correlations.
    Chi-Squared Test
    The Chi-Squared test is used for measuring the correlation between categorical values.

    You have a dataset with more features than you believe you need to train a model. You would like to measure how well two categorical values linearly correlate so you can eliminate one of them if they highly correlate. What statistical test would you use?
    Pearson's Correlation
    Pearson's Correlation is used for measuring the linear correlation between two variables.
    ANOVA
    ANOVA is used to measure the difference among means.
    Chi-Squared Test
    The Chi-Squared test is used for measuring the correlation between categorical values.
    Kendall's Rank Coefficient
    Kendall's Rank Coefficient is used for measuring numeric and categorical correlations.

    Which of the following types of pre-built containers are available in Vertex AI?
    TensorFlow Optimized Runtime
    TensorFlow Optimized Runtime is available in Vertex AI pre-built containers.
    Theano
    Theano is a machine learning platforms but not available as pre-built containers.
    Hadoop Mahout
    Hadoop Mahout is a machine learning platforms but not available as pre-built containers.
    XGBoost
    XGBoost is available in Vertex AI pre-built containers.
    Scikit-Learn
    Scikit-Learn is available in Vertex AI pre-built containers.

    Which of the following are required of a custom container used with Vertex AI?
    Support for health checks and liveliness checks
    Custom container images running in Vertex AI must have support health checks and liveliness checks.
    Request and response message size may be no more than 10 MB
    Request and response message sizes must be 1.5MB or less.
    Running an HTTP server
    Custom container images running in Vertex AI must have an HTTP server
    Include GPU drivers
    Include support for TPUs or GPUs
    Support for GPUs or TPUs is not required.

    You are training large deep learning networks in Kubernetes Engine and want to use a cost-effective accelerator. You do not need high precision floating point operations. What would you choose?
    GPUs
    GPUs are high precision accelerators.
    TPUs
    Tensor processing units (TPUs) are lower precision accelerators designed for TensorFlow operations and cost less than GPUs.
    ASICs
    ASICs are a general class of application specific integrated circuits.
    CPUs
    CPUs are central processing units and are not considered accelerators.

    Several datasets you use for training ML models have missing data. You consider deleting rows with missing data. In which case would you not want to delete instances with missing data?
    When a significant portion of the instances are missing data
    You would not want to delete instance with missing data when a significant portion of the instances are missing data because you would lose many instances.
    When a small number of instances are missing data
    When a small number of instance are missing data, removing those instances would not adversely affect results.
    When instances are missing data for more than one feature
    Since all data for all features are removed when removing a row with any missing data, the number of features with missing data does not impact the final results.
    when instances are missing data for more than three features
    Since all data for all features are removed when removing a row with any missing data, the number of features with missing data does not impact the final results.

    When is it appropriate to use the Last Observed Value Carried Forward strategy for missing data?
    When working with time series data
    The Last Observed Value Carried Forward strategy works well with time series data.
    When working with categorical data and a small number of values
    Categorical values with a small number of possible values is not a good candidate since the previous value may have not relation to next instance in the data set.
    When overfitting is a high risk
    The technique is irrelevant to overfitting.
    When underfitting is a high risk
    The technique is irrelevant to underfitting.

    Which of the following are examples of hyperparameters?
    Maximum depth of a decision tree only
    It's incomplete
    Number of layers in a deep learning network only
    It's incomplete
    Learning rate of gradient descent
    It's incomplete
    All of the above

    You are validating a machine learning model and have decided you need to further tune hyperparamets. You would like to try analyze multiple hyperparameter combinations in parallel. Which of the following techniques could you use?
    Grid search and Bayesian search
    Bayesian search is a sequential method for searching hyperparameter combinations.
    Random search and Grid search
    Random search and grid search can both be applied in parallel.
    Bayesian search only
    Bayesian search is a sequential method for searching hyperparameter combinations.
    Random search only
    It's incomplete

    You spend a lot of time tuning hyperparameters by manually testing combinations of hyperparameters. You want to automate the process and use a technique that can learn from previous evaluations of other hyperparameter combinations. What algorithm would you use?
    Grid search
    Grid search is used for hyperparameter tuning but do not use prior knowledge.
    Data augmentation
    Data augmentation is not used for searching hyperparameters.
    Bayesian search
    Bayesian search uses knowledge from previous evaluations when selecting new hyperparameter values.
    Random search
    Random search is used for hyperparameter tuning but do not use prior knowledge.

    A dataset has been labeled by a crowd-sourced group of labelers. You want to evaluate the quality of the labeling process. You randomly select a group of labeled instances and find several are mislabled. You want to find other instances that are similar to the mislabeled instances. What kind of algorithm would you use to find similar instances?
    Approximate Nearest Neighbor
    Approximate Nearest Neighbor algorithms use clustering to group similar instances and would be the correct choice.
    XGBoost
    XGBoost is not clustering algorithms and would not be as good a choice as a clustering algorithm.
    Random Forest
    Random Forest is not clustering algorithms and would not be as good a choice as a clustering algorithm.
    Gradient descent
    Gradient descent is a technique used to optimize weights in deep learning.

    A company is migrating a machine learning model that is currently being served on premises to Google Cloud. The model runs in Spark ML. You have been asked to recommend a way to migrate the service with the least disruption in service and minimal effort. The company does not want to manage infrastructure if possible and prefers to use managed services. What would you recommend?
    BigQuery ML
    BigQuery supports BigQuery ML but that would require re-implmenting the model.
    Cloud Dataproc
    Cloud Dataproc is a managed Spark/Hadoop service and would be a good choice.
    Cloud Dataflow
    Cloud Dataflow is a managed service for batch and stream processing.
    Cloud Data Studio
    Cloud Data Studio is a visualization tool.

    A group of data analysts know SQL and want to build machine learning models using data stored on premises in relational databases. They want to load the data into the cloud and use a cloud-based service for machine learning. They want to build models as quickly as possible and use them for problems in classification, forecasting, and recommendations. They do not want to program in Python or Java. What Google Cloud service would you recommend?
    Cloud Dataproc
    Cloud Dataproc could be used for machine learning but requires programming in Java, Python or other programming languages.
    Cloud Dataflow
    Cloud Dataflow is for data processing, not machine learning.
    BigQuery ML
    BigQuery ML uses SQL to create and serve machine learning models and dose not require programming in a language such as Python or Java.
    Bigtable
    Bigtable does not support machine learning directly in the service.

    What feature representation is used when training machine learning models using text or image data?
    Feature vectors
    Feature vectors are the standard way of inputting data to a machine learning algorithm.
    Lists of categorical values
    Lists of categorical values are not accessible to many machine learning algorithms.
    2-dimensional arrays
    2-dimensional arrays are mapped to 1-dimensional feature vectors before submitting data to the machine learning training algorithm.
    3-dimensional arrays
    3-dimensional arrays are mapped to 1-dimensional feature vectors before submitting data to the machine learning training algorithm.

    An IoT company has developed a TensorFlow deep learning model to detect anomalies in machine sensor readings. The model will be deployed to edge devices. Machine learning engineers want to reduce the model size without significantly reducing the quality of the model. What technique could they use?
    ANOVA
    ANOVA is a statistical test for comparing the means of two or more populations.
    Quantization
    Quantization is a technique for reducing model size without reducing quality.
    Data augmentation
    Data augmentation is used to create new training instances based on existing instances.
    Bucketing
    Bucketing is a technique of mapping feature values into a smaller set of values.

    You have created a machine learning model to identify defective parts in an image. Users will send images to an endpoint used to serve the model. You want to follow Google Cloud recommendations. How would you encode the image when making a request of the prediction service?
    CSV
    CSV is a file formats for structured data.
    Avro
    Avro is a file formats for structured data.
    base64
    Base64 is the recommended encoding for images.
    Capacitor format
    Capacitor format is used by BigQuery to store data in compressed, columnar format.

    You are making a large number of predictions using an API endpoint. Several of the services making requests could send batches of requests instead of individual requests to the endpoint. How could you improve the efficiency of serving predictions?
    Use batches with a large batch size to take advantage of vectorization
    Using batches with large batch size will take advantage of vectorization and improve efficiency.
    Vertically scale the API server
    Vertically scaling will increase throughput but using the API and single requests will still use more compute resources than using batch processing.
    Train with additional data to improve accuracy
    Training with additional data will not change serving efficiency.
    Release re-trained models more frequently
    Re-training more frequently will not change serving efficiency.

    Which component of the Vertex AI platform provides for the orchestration of machine learning operations in Vertex AI?
    Vertex AI Prediction
    Vertex AI Prediction is for serving models
    Vertex AI Pipelines
    Vertex AI Experiments
    Vertex AI Experiments is for tracking training experiments
    Vertex AI Workbench
    Vertex Workbench provides managed and user managed notebooks for development.

    A team of researchers have built a TensorFlow model for predicting near-term weather changes. They are using TPUs but are not achieving the throughput they would like. Which of the following might improve the efficiency of processing?
    Using the tf.data API to maximize the efficiency of data pipelines using GPUs and TPUs
    Use distributed XGBoost
    XGBoost is a machine learning platform and will not improve the efficiency of a TensorFlow model.
    Use early stopping
    Early stopping is an optimization for training, not serving.
    Scale up CPUs before scaling out the number of CPUs
    Scaling up CPUs or adding more CPUs will not significantly change the efficiency of using GPUs or TPUs.

    Managed data sets in Vertex AI provided which of the following benefits?
    Manage data sets in a central location only
    There are no enhanced predefined roles for Vertex AI datasets.
    Managed data sets in a central location and create labels and annotations only
    Managed data sets in a central location, create labels and annotations, and apply enhanced predefined IAM roles only
    There are no enhanced predefined roles for Vertex AI datasets.
    Managed data sets in a central location, create labels and annotations, apply enhanced predefined IAM roles, and track the lineage of models
    There are no enhanced predefined roles for Vertex AI datasets.

    Which of the following are options for tabular datasets in Vertex AI Datasets?
    CSV files only
    It's incomplete
    CSV files and BigQuery tables and views
    Vertex AI Datasets support CSV files and BigQuery tables and views for tabular data.
    CSv files, BigQuery tables and views, and Bigtable tables
    Bigtable tables are not supported.
    CSV files, BigQuery tables and views, and Avro files
    Avro files are not supported.

    A team of reviewers is analyzing a training data set for sensitive information that should not be used when training models. Which of the following are types of sensitive information that should be removed from the training set?
    Credit card numbers
    Government ID numbers
    Purchase history
    Purchase history is not sensitive information.
    Faces in images
    Customer segment identifier
    Customer segment identifiers are not sensitive information.

    Which of the follwoing techniques can be used to mask sensitive data?
    Substitution cipher
    Tokenization
    Data augmentation
    Data augmentation is used to increase the size of training sets.
    Regularization
    Regularization is used to prevent overfitting.
    Principal component analysis

    Which of the following is a type of security risk to machine learning models?
    Data poisoning
    Data poisoning is a security risk associated with an attacker compromising the training process in order to train the model to behave in ways the attacker wants.
    Missing data
    Missing data and inconsistent data are data risks that can compromise a model but they are not security risks.
    Inconsistent labeling
    Insufficiently agreed upon objectives
    Insufficiently agreed upon objectives is a process risk but not a security risk.

    You are training a classifier using XGBoost in Vertex AI. Training is proceeding slower than expected so you add GPUs to your training server. There is no noticeable difference in the training time. Why is this?
    GPUs are only useful for improving serving efficiency
    GPUs are useful for improving training performance.
    TPUs should have been used instead
    Using TPUs would not improve performance.
    GPUs are not used with XGBoost in Vertex AI
    You did not install GPU drivers on the server
    Vertex AI manages images used for training and serving so there is no need to manually install GPU drivers.

    Aerospace engineers are building a model to predict turbulence and impact on a new airplane wing design. They have large, multi-dimensional data sets. What file format would you recommend they use for training data?
    Parquet
    Parquet is a columnar format and could be used but there is a better option.
    Petastorm
    Petastorm is designed for multi-dimensional data.
    ORC
    ORC is a columnar format and could be used but there is a better option.
    CSV
    CSV is inefficient for large data sets.

    You would like to use a nested file format for training data that will be used with TensorFlow. You would like to use the most efficient format. Which of the following would you choose?
    JSON
    JSON is a plain text format and not as efficient as other option.
    XML
    XML is a plain text format and not as efficient as other option.
    CSV
    CSV is not a nested file format.
    TFRecords
    TFRecords is based on protobuf, a binary nested file format and optimized for TensorFlow.

    A robotics developer has created a machine learning model to detect unripe apples in images. Robots use this information to remove unripe apples from a conveyor belt. The engineers who developed this model are using it as a starting model for training a model to detect unripe pears. This is an example of what kind of learning?
    Unsupervised learning
    Unsupervised learning uses data sets without labels.
    Regression
    Regression models predict a continuous value.
    Reinforcement learning
    Reinforcement learning uses feedback from the environment to learn.
    Transfer learning

    A retailer has deployed a machine learning model to predict when a customer is likely to abandon a shopping cart. A MLOps engineer notices that the feature data distribution in production deviates from feature data distribution in the latest training data set. This is an example of what kind of problem?
    Skew
    Skew is the problem of feature data distribution in production deviating from feature data distribution in training data.
    Drift
    Drift occurs when feature data distribution in production changes significantly over time.
    Data leakage
    Data leakage is a problem in training when data not available when making predictions is used in training.
    Underfitting
    Underfitting occurs when a model does not perform well even on training data set because the model is unable to learn.

    Space Y is launching its hundredth satellite to build its StarSphere network. They have designed an accurate orbit (launching speed/time/and so on) for it based on the existing 99 satellite orbits to cover the Earth’s scope. What’s the best solution to forecast the position of the 100 satellites after the hundredth launch?
    Use ML algorithms and train ML models to forecast
    To decide whether ML is the best method for a problem, we need to see whether traditional science modeling would be very difficult or impossible to solve the problem and whether plenty of data exists.
    Use neural networks to train the model to forecast
    Use physical laws and actual environmental data to model and forecast
    When we start, science modeling will be our first choice since it builds the most accurate model based on science and natural laws.
    For example, given the initial position and speed of an object, as well as its mass and the forces acting on it, we can precisely predict its position at any time. For this case, the mathematical model works much better than any ML model!
    Use a linear regression model to forecast

    A financial company is building an ML model to detect credit card fraud based on their historical dataset, which contains 20 positives and 4,990 negatives.
    Due to the imbalanced classes, the model training is not working as desired. What’s the best way to resolve this issue?
    Data augmentation
    Early stopping
    Downsampling and upweighting
    Regularization

    A chemical manufacturer is using a GCP ML pipeline to detect real-time sensor anomalies by queuing the inputs and analyzing and visualizing the data. Which one will you choose for the pipeline?
    Dataproc | Vertex AI | BQ
    Check feedback section
    Dataflow | AutoML | Cloud SQL
    Check feedback section
    Dataflow | Vertex AI | BQ
    Check feedback section
    Dataproc | AutoML | Bigtable
    Check feedback section

    A real estate company, Zeellow, does great business buying and selling properties in the United States. Over the past few years, they have accumulated a big amount of historical data for US houses.
    Zeellow is using ML training to predict housing prices, and they retrain the models every month by integrating new data. The company does not want to write any code in the ML process. What method best suits their needs?
    AutoML
    AutoML serves the purpose of no coding during the ML process
    BigQuery ML
    Vertex AI Custom training

    The data scientist team is building a deep learning model for a customer support center of a big Enterprise Resource Planning (ERP) company, which has many ERP products and modules. The DL model will input customers’ chat texts and categorize them into products before routing them to the corresponding team. The company wants to minimize the model development time and data preprocessing time. What strategy/platform should they choose?
    Vertex AI custom training
    AutoML
    NLP API
    Vertex AI Custom notebooks

    A real estate company, Zeellow, does great business buying and selling properties in the United States. Over the past few years, they have accumulated a big amount of historical data for US houses.
    Zeellow wants to use ML to forecast future sales by leveraging their historical sales data. The historical data is stored in cloud storage. You want to rapidly experiment with all the available data. How should you build and train your model?
    Load data into BigQuery and use BigQuery ML
    BQ and BQML are the best options here since all the others will take a long time to build and train the model.
    Convert the data into CSV and use AutoML
    Convert the data into TFRecords and use TensorFlow
    Convert and refactor the data into CSV format and use the built-in XGBoost library

    A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. New data is coming in every week, and Zeellow needs to make sure the model is continually retrained to reflect the marketing trend. What should they do with the historical data and new data?
    Only use the new data for retraining
    Update the datasets weekly with new data
    Update the datasets with new data when model evaluation metrics do not meet the required criteria
    We need to retrain the model when the performance metrics do not meet the requirements using the integrated datasets, including existing and new data.
    Update the datasets monthly with new data

    A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. Their data science team trained and deployed a DL model in production half a year ago. Recently, the model is suffering from performance issues due to data distribution changes.
    The team is working on a strategy for model retraining. What is your suggestion?
    Monitor data skew and retrain the model
    Model retraining is based on data value skews, which are significant changes in the statistical properties of data.
    When data skew is detected, this means that data patterns are changing, and we need to retrain the model to capture these changes.
    Retrain the model with fewer model features
    Retrain the model to fix overfitting
    Retrain the model with new data coming in every month

    Recent research has indicated that when a certain kind of cancer, X, is developed in a human liver, there are usually other symptoms that can be identified as objects Y and Z from CT scan images. A hospital is using this research to train ML models with a label map of (X, Y, Z) on CT images. What cost functions should be used in this case?
    Binary cross-entropy
    Binary cross-entropy is used for binary classification problems.
    Categorical cross-entropy
    Categorical entropy is better to use when you want to prevent the model from giving more importance to a certain class – the same as the one-hot encoding idea.
    Sparse categorical cross-entropy
    Sparse categorical entropy is more optimal when your classes are mutually exclusive (for example, when each sample belongs exactly to one class)
    Dense categorical cross-entropy

    The data science team in your company has built a DNN model to forecast the sales value for an automobile company, based on historical data. As a Google ML Engineer, you need to verify that the features selected are good enough for the ML model
    Train the model with L1 regularization and verify that the loss is constant
    Train the model with no regularization and verify that the loss is constant
    Train the model with L2 regularization and verify that the loss is decreasing
    Train the model with no regularization and verify that the loss is close to zero
    The loss function is the measurement for model prediction accuracy and is used as an index for the ML training process.

    The data science team in your company has built a DNN model to forecast the sales value for a real estate company, based on historical data. As a Google ML Engineer, you find that the model has over 300 features and that you wish to remove some features that are not contributing to the target. What will you do?
    Use Explainable AI to understand the feature contributions and reduce the non-contributing ones.
    Explainable AI is one of the ways to understand which features are contributing and which ones are not
    Use L1 regularization to reduce features.
    L1 is a method for resolving model overfitting issues and not feature selection in data engineering.
    Use L2 regularization to reduce features.
    L2 is a method for resolving model overfitting issues and not feature selection in data engineering.
    Drop a feature at a time, train the model, and verify that it does not degrade the model. Remove these features.

    The data science team in your company has built a DNN model to forecast the sales value for a real estate company, based on historical data. They found that the model fits the training dataset well, but not the validation dataset. What would you do to improve the model?
    Apply a dropout parameter of 0.3 and decrease the learning rate by a factor of 10
    Apply an L2 regularization parameter of 0.3 and decrease the learning rate by a factor of 10
    Apply an L1 regularization parameter of 0.3 and increase the learning rate by a factor of 10
    Tune the hyperparameters to optimize the L2 regularization and dropout parameters

    You are building a DL model for a customer service center. The model will input customers’ chat text and analyze their sentiments. What algorithm should be used for the model?
    MLP
    Regression
    CNN
    RNN
    Text processing for sentiment analysis needs to process sequential data (time series)

    A health insurance company scans customers' hand-filled claim forms and stores them in Google Cloud Storage buckets in real time. They use ML models to recognize the handwritten texts. Since the claims may contain Personally Identifiable Information (PII), company policies require only authorized persons to access the information. What’s the best way to store and process this streaming data?
    Create two buckets and label them as sensitive and non-sensitive. Store data in the non-sensitive bucket first. Periodically scan it using the DLP API and move the sensitive data to the sensitive bucket.
    Create one bucket to store the data. Only allow the ML service account access to it.
    Create three buckets – quarantine, sensitive, and non-sensitive. Store all the data in the quarantine bucket first. Then, periodically scan it using the DLP API and move the data to either the sensitive or non-sensitive bucket.
    Create three buckets – quarantine, sensitive, and non-sensitive. Store all the data in the quarantine bucket first. Then, once the file has been uploaded, trigger the DLP API to scan it, and move the data to either the sensitive or non-sensitive bucket.

    A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. The recent model training was able to achieve the desired forecast accuracy objective, but it took the data science team a long time. They want to decrease the training time without affecting the achieved model accuracy. What hyperparameter should the team adjust?
    Learning rate
    Epochs
    Machine type
    Changing the other three parameters will change the model’s prediction accuracy.
    Batch size

    The data science team has built a DNN model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company. As a Google ML Engineer, you need to measure the performance of the ML model for the test dataset/images. Which of the following would you choose?
    The AUC value
    It measures how well the predictions are ranked rather than their absolute values. It is a classification threshold invariant and thus is the best way to measure the model’s performance.
    The recall value
    The precision value
    The TP value

    The data science team has built a DL model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company. Over time, the team has built multiple model versions in Vertex AI. As a Google ML Engineer, how will you compare the model versions?
    Compare the mean average precision for the model versions
    It measures how well the different model versions perform over time: deploy your model as a model version and then create an evaluation job for that version. By comparing the mean average precision across the model versions, you can find the best performer.
    Compare the model loss functions on the training dataset
    Compare the model loss functions on the validation dataset
    Compare the model loss functions on the testing dataset

    The data science team is building a recommendation engine for an e-commerce website using ML models to increase its business revenue, based on users’ similarities. What model would you choose?
    Collaborative filtering
    Collaborative filtering uses similarities between users to provide recommendations.
    Regression
    Classification
    Content-based filtering
    Content-based filtering uses the similarity between items to recommend items that are similar to what the user likes.

    The data science team is building a fraud-detection model for a credit card company, whose objective is to detect as much fraud as possible and avoid as many false alarms as possible. What confusion matrix index would you maximize for this model performance evaluation?
    Precision
    Recall
    The area under the PR curve
    In this fraud-detection problem, it asks you to focus on detecting fraudulent transactions - maximize True Positive rate and minimize False Negative - maximize recall (Recall = TruePositives / (TruePositives + FalseNegatives))
    It also asks you to minimize false alarms (false positives) - maximize precision (Precision = TruePositives / (TruePositives + FalsePositives)).
    So, you want to maximize both precision and recall.
    The area under the ROC curve

    The data science team is building a data pipeline for an auto manufacturing company, whose objective is to integrate all the data sources that exist in their on-premise facilities, via a codeless data ETL interface. What GCP service will you use?
    Dataproc
    Dataflow
    Dataprep
    Data Fusion
    Data Fusion is the best choice for data integration with a codeless interface

    The data science team has built a TensorFlow model in BigQuery for a real estate company, whose objective is to integrate all their data models into the new Google Vertex AI platform. What’s the best strategy?
    Export the model from BigQuery ML
    Register the BQML model to Vertex AI
    Vertex AI allows you to register a BQML model in it.
    Import the model into Vertex AI
    Use Vertex AI as the middle stage

    A real estate company, Zeellow, uses ML to forecast future house sale prices by leveraging their historical data. The data science team needs to build a model to predict US house sale prices based on the house location (US city-specific) and house type. What strategy is the best for feature engineering in this case?
    One feature cross: [latitude X longitude X housetype]
    Two feature crosses: [binned latitude X binned housetype] and [binned longitude X binned housetype]
    Three separate binned features: [binned latitude], [binned longitude], [binned housetype]
    One feature cross: [binned latitude X binned longitude X binned housetype]
    Crossing binned latitude with binned longitude enables the model to learn city-specific effects on house types. It prevents a change in latitude from producing the same result as a change in longitude
    Depending on the granularity of the bins, this feature cross could learn city-specific housing effects.

    A health insurance company scans customer’s hand-filled claim forms and stores them in Google Cloud Storage buckets in real time. The data scientist team has developed an AI documentation model to digitize the images. By the end of each day, the submitted forms need to be processed automatically. The model is ready for deployment. What strategy should the team use to process the forms?
    Vertex AI batch prediction
    We need to run the process at the end of each day, which implies batch processing
    Vertex AI online prediction
    Vertex AI ML pipeline prediction
    Cloud Run to trigger prediction

    A real estate company, Zeellow, uses GCP ML to forecast future house sale prices by leveraging their historical data. Their data science team has about 30 members and each member has developed multiple versions of models using Vertex AI custom notebooks. What’s the best strategy to manage these different models and different versions developed by the team members?
    Set up IAM permissions to allow each member access to their notebooks, models, and versions
    Create a GCP project for each member for clean management
    Create a map from each member to their GCP resources using BQ
    Apply label/tags to the resources when they’re created for scalable inventory/cost/access management
    Resource tagging/labeling is the best way to manage ML resources for medium/big data science teams

    Starbucks is an international coffee shop selling multiple products A, B, C… at different stores (1, 2, 3… using one-hot encoding and location binning). They are building stores and want to leverage ML models to predict product sales based on historical data (A1 is the data for product A sales at store 1). Following the best practices of splitting data into a training subset, validation subset, and testing subset, how should the data be distributed into these subsets?
    Distribute data randomly across the subsets:
    • Training set: [A1, B2, F1, E2, ...]
    • Testing set: [A2, C3, D2, F4, ...]
    • Validation set: [B1, C1, D9, C2...]
    If we distribute the data randomly into the training, validation, and test sets, the model will be able to learn specific qualities about the products.
    Distribute products randomly across the subsets:
    • Training set: [A1, A2, A3, E1, E2, ...]
    • Testing set: [B1, B2, C1, C2, C3, ...]
    • Validation set: [D1, D2, F1, F2, F3, ...]
    If we divided things up at the product level so that the given products were only in the training subset, the validation subset, or the testing subset, the model would find it more difficult to get high accuracy on the validation since it would need to focus on the product characteristics/qualities
    Distribute stores randomly across subsets:
    • Training set: [A1, B1, C1, ...]
    • Testing set: [A2, C2, F2, ...]
    • Validation set: [D3, A3, C3, ...]
    Aggregate the data groups by the cities where the stores are allocated and distribute cities randomly across subsets

    You are building a DL model with Keras that looks as follows:

    model = tf.keras.Sequential()
    model.add(tf.keras.layers.Dense(128, activation='relu', input_shape=(200,)))
    model.add(tf.keras.layers.Dropout(rate=0.25))
    model.add(tf.keras.layers.Dense(4, activation='relu'))
    model.add(tf.keras.layers.Dropout(rate=0.25))
    model.add(tf.keras.layers.Dense(2))

    How many trainable weights does this model have?
    200x128+128x4+4x2
    200x128+128x4+2
    200x128+129x4+5x2
    200x128x0.25+128x4x0.25+4x2
    200x128+128+128x4+4+4x2+2
    Trainable params: 26,254

    The data science team is building a DL model for a customer support center of a big ERP company, which has many ERP products and modules. The company receives over a million customer service calls every day and stores them in GCS. The call data must not leave the region in which the call originated and no PII can be stored/analyzed. The model will analyze calls for customer sentiments. How should you design a data pipeline for call processing, analyzing, and visualizing?
    GCS -> Speech2Text -> DLP -> BigQuery
    GCS -> Pub/Sub -> Speech2Text -> DLP -> Datastore
    GCS -> Speech2Text -> DLP -> BigTable
    GCS -> Speech2Text -> DLP -> Cloud SQL

    The data science team is building an ML model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company, which does not have reliable Wi-Fi near the assembly line. As a Google ML Engineer, you need to reduce the amount of time spent by quality control inspectors utilizing the model’s fast defect detection. Your company wants to implement the new ML model as soon as possible. Which model should you use?
    AutoML
    AutoML Edge mobile-versatile-1
    AutoML Edge mobile-low-latency-1
    The question asks for a quick inspection time and prioritizes latency reduction
    AutoML Edge mobile-high-accuracy-1

    A national hospital is leveraging Google Cloud and a cell phone app to build an ML model to forecast heart attacks based on age, gender, exercise, heart rate, blood pressure, and more. Since the health data is highly sensitive personal information and cannot be stored in cloud databases, how should you train and deploy the ML model?
    IoT with data encryption
    Federated learning
    With federated learning, all the data is collected, and the model is trained with algorithms across multiple decentralized edge devices such as cell phones or websites, without exchanging them.
    Encrypted BQML
    DLP API