Machine Learning questions

You are developing a proof of concept for a real-time fraud detection model. After undersampling the training set to achieve a 50% fraud rate, you train and tune a tree classifier using area under the curve (AUC) as the metric, and then calibrate the model. You need to share metrics that represent your model’s effectiveness with business stakeholders in a way that is easily interpreted. Which approach should you take?

Calculate the AUC on the holdout dataset at a classification threshold of 0.5, and report true positive rate, false positive rate, and false negative rate.

You need business directions about the cost of misclassification to define the optimal threshold for both balanced and imbalanced classification.

Undersample the minority class to achieve a 50% fraud rate in the holdout set. Plot the confusion matrix at a classification threshold of 0.5, and report precision and recall.

The holdout dataset needs to represent real-world transactions to have a meaningful model evaluation, and you should never change its distribution.

Select all transactions in the holdout dataset. Plot the area under the receiver operating characteristic curve (AUC ROC), and report the F1 score for all available thresholds.

Classes in the holdout dataset are not balanced, so the ROC curve is not appropriate; also, neither F1 score nor ROC curve is recommended for communicating to business stakeholders. The F1 score aggregates precision and recall, but it is important to look at each metric separately to evaluate the model’s performance when the cost of misclassification is highly unbalanced between labels.

Select all transactions in the holdout dataset. Plot the precision-recall curve with associated average precision, and report the true positive rate, false positive rate, and false negative rate for all available thresholds.

The precision-recall curve is an appropriate metric for imbalanced classification when the output can be set using different thresholds. Presenting the precision-recall curve together with the mentioned rates provides business stakeholders with all the information necessary to evaluate model performance.

References:

Your organization’s marketing team wants to send biweekly scheduled emails to customers that are expected to spend above a variable threshold. This is the first machine learning (ML) use case for the marketing team, and you have been tasked with the implementation. After setting up a new Google Cloud project, you use Vertex AI Workbench to develop model training and batch inference with an XGBoost model on the transactional data stored in Cloud Storage. You want to automate the end-to-end pipeline that will securely provide the predictions to the marketing team, while minimizing cost and code maintenance. What should you do?

Create a scheduled pipeline on Vertex AI Pipelines that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.

Vertex AI Pipelines and Cloud Storage are cost-effective and secure solutions. The solution requires the least number of code interactions because the marketing team can update the pipeline and schedule parameters from the Google Cloud console.

Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, copies the data to BigQuery, uses BigQuery ML to perform training and batch prediction, and outputs a table in BigQuery with customer emails and expected spending.

Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. In addition, using BigQuery is not the most cost-effective solution.

Create a scheduled notebook on Vertex AI Workbench that accesses the data from Cloud Storage, performs training and batch prediction on the managed notebook instance, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.

The marketing team would have to enter the Vertex AI Workbench instance to update a pipeline parameter, which does not minimize code interactions.

Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and sends an email to the marketing team’s Gmail group email with an attachment that contains an encrypted list of all customer emails and expected spending.

Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. Also, using email to send personally identifiable information (PII) is not a recommended approach.

References:

You have developed a very large network in TensorFlow Keras that is expected to train for multiple days. The model uses only built-in TensorFlow operations to perform training with high-precision arithmetic. You want to update the code to run distributed training using tf.distribute.Strategy and configure a corresponding machine instance in Compute Engine to minimize training time. What should you do?

Select an instance with an attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add MirroredStrategy to the code, and create the model in the strategy’s scope with batch size dependent on the number of replicas.

It is suboptimal in minimizing execution time for model training. MirroredStrategy only supports multiple GPUs on one instance, which may not be as performant as running on multiple instances.

Create an instance group with one instance with attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add TF_CONFIG and MultiWorkerMirroredStrategy to the code, create the model in the strategy’s scope, and set up data autosharding.

GPUs are the correct hardware for deep learning training with high-precision training, and distributing training with multiple instances will allow maximum flexibility in fine-tuning the accelerator selection to minimize execution time. Note that one worker could still be the best setting if the overhead of synchronizing the gradients across machines is too high, in which case this approach will be equivalent to MirroredStrategy.

Create a TPU virtual machine, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.

TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months.

Create a TPU node, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.

TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months. Also, TPU nodes are not recommended unless required by the application.

References:

You developed a tree model based on an extensive feature set of user behavioral data. The model has been in production for 6 months. New regulations were just introduced that require anonymizing personally identifiable information (PII), which you have identified in your feature set using the Cloud Data Loss Prevention API. You want to update your model pipeline to adhere to the new regulations while minimizing a reduction in model performance. What should you do?

Redact the features containing PII data, and train the model from scratch.

Removing features from the model does not keep referential integrity by maintaining the original relationship between records, and is likely to cause a drop in performance.

Mask the features containing PII data, and tune the model from the last checkpoint.

Masking does not enforce referential integrity, and a drop in model performance may happen. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.

Use key-based hashes to tokenize the features containing PII data, and train the model from scratch.

Hashing is an irreversible transformation that ensures anonymization and does not lead to an expected drop in model performance because you keep the same feature set while enforcing referential integrity.

Use deterministic encryption to tokenize the features containing PII data, and tune the model from the last checkpoint.

Deterministic encryption is reversible, and anonymization requires irreversibility. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.

References:

You set up a Vertex AI Workbench instance with a TensorFlow Enterprise environment to perform exploratory data analysis for a new use case. Your training and evaluation datasets are stored in multiple partitioned CSV files in Cloud Storage. You want to use TensorFlow Data Validation (TFDV) to explore problems in your data before model tuning. You want to fix these problems as quickly as possible. What should you do?

1. Use TFDV to generate statistics, and use Pandas to infer the schema for the training dataset that has been loaded from Cloud Storage.
2. Visualize both statistics and schema, and manually fix anomalies in the dataset’s schema and values.

You also need to use the evaluation dataset for analysis. If the features do not belong to approximately the same range as the training dataset, the accuracy of the model will be affected.

1. Use TFDV to generate statistics and infer the schema for the training and evaluation datasets that have been loaded from Cloud Storage by using URI.
2. Visualize statistics for both datasets simultaneously to fix the datasets’ values, and fix the training dataset’s schema after displaying it together with anomalies in the evaluation dataset.

It takes the minimum number of steps to correctly fix problems in the data with TFDV before model tuning. This process involves installing tensorflow_data_validation, loading the training and evaluation datasets directly from Cloud Storage, and fixing schema and values for both. Note that the schema is only stored for the training set because it is expected to match at evaluation.

1. Use TFDV to generate statistics, and use Pandas to infer the schema for the training dataset that has been loaded from Cloud Storage.
2. Use TFRecordWriter to convert the training dataset into a TFRecord.
3. Visualize both statistics and schema, and manually fix anomalies in the dataset’s schema and values.

Transforming into TFRecord is an unnecessary step. Also, you need to use the evaluation dataset for analysis. If the features do not belong to approximately the same range as the training dataset, the accuracy of the model will be affected.

1. Use TFDV to generate statistics and infer the schema for the training and evaluation datasets that have been loaded with Pandas.
2. Use TFRecordWriter to convert the training and evaluation datasets into TFRecords.
3. Visualize statistics for both datasets simultaneously to fix the datasets’ values, and fix the training dataset’s schema after displaying it together with anomalies in the evaluation dataset.

Transforming into TFRecord is an unnecessary step.

References:

You have developed a simple feedforward network on a very wide dataset. You trained the model with mini-batch gradient descent and L1 regularization. During training, you noticed the loss steadily decreasing before moving back to the top at a very sharp angle and starting to oscillate. You want to fix this behavior with minimal changes to the model. What should you do?

Shuffle the data before training, and iteratively adjust the batch size until the loss improves.

divergence due to repetitive behavior in the data typically shows a loss that starts oscillating after some steps but does not jump back to the top.

Explore the feature set to remove NaNs and clip any noisy outliers. Shuffle the data before retraining.

A large increase in loss is typically caused by anomalous values in the input data that cause NaN traps or exploding gradients.

Switch from L1 to L2 regularization, and iteratively adjust the L2 penalty until the loss improves.

L2 is not clearly a better solution than L1 regularization for wide models. L1 helps with sparsity, and L2 helps with collinearity.

Adjust the learning rate to exponentially decay with a larger decrease at the step where the loss jumped, and iteratively adjust the initial learning rate until the loss improves.

A learning rate schedule that is not tuned typically shows a loss that starts oscillating after some steps but does not jump back to the top.

References:

You trained a neural network on a small normalized wide dataset. The model performs well without overfitting, but you want to improve how the model pipeline processes the features because they are not all expected to be relevant for the prediction. You want to implement changes that minimize model complexity while maintaining or improving the model’s offline performance. What should you do?

Keep the original feature set, and add L1 regularization to the loss function.

Although the approach lets you reduce RAM requirements by pushing the weights for meaningless features to 0, regularization tends to cause the training error to increase. Consequently, the model performance is expected to decrease.

Use principal component analysis (PCA), and select the first n components that explain 99% of the variance.

PCA is an unsupervised approach, and it is a valid method of feature selection only if the most important variables are the ones that also have the most variation. This is usually not true, and disregarding the last few components is likely to decrease model performance.

Perform correlation analysis. Remove features that are highly correlated to one another and features that are not correlated to the target.

Removing irrelevant features reduces model complexity and is expected to boost performance by removing noise.

Ensure that categorical features are one-hot encoded and that continuous variables are binned, and create feature crosses for a subset of relevant features.

This approach can make the model converge faster but it increases model RAM requirements, and it is not expected to boost model performance because neural networks inherently learn feature crosses.

References:

You trained a model in a Vertex AI Workbench notebook that has good validation RMSE. You defined 20 parameters with the associated search spaces that you plan to use for model tuning. You want to use a tuning approach that maximizes tuning job speed. You also want to optimize cost, reproducibility, model performance, and scalability where possible if they do not affect speed. What should you do?

Set up a cell to run a hyperparameter tuning job using Vertex AI Vizier with val_rmse specified as the metric in the study configuration.

Vertex AI Vizier should be used for systems that do not have a known objective function or are too costly to evaluate using the objective function. Neither applies to the specified use case. Vizier requires sequential trials and does not optimize for cost or tuning time.

Using a dedicated Python library such as Hyperopt or Optuna, configure a cell to run a local hyperparameter tuning job with Bayesian optimization.

Bayesian optimization can converge in fewer iterations than the other algorithms but not necessarily in a faster time because trials are dependent and thus require sequentiality. Also, running tuning locally does not optimize for reproducibility and scalability.

Refactor the notebook into a parametrized and dockerized Python script, and push it to Container Registry. Use the UI to set up a hyperparameter tuning job in Vertex AI. Use the created image and include Grid Search as an algorithm.

Grid Search is a brute-force approach and it is not feasible to fully parallelize. Because you need to try all hyperparameter combinations, that is an exponential number of trials with respect to the number of hyperparameters, Grid Search is inefficient for high spaces in time, cost, and computing power.

Refactor the notebook into a parametrized and dockerized Python script, and push it to Container Registry. Use the command line to set up a hyperparameter tuning job in Vertex AI. Use the created image and include Random Search as an algorithm where maximum trial count is equal to parallel trial count.

Random Search can limit the search iterations on time and parallelize all trials so that the execution time of the tuning job corresponds to the longest training produced by your hyperparameter combination. This approach also optimizes for the other mentioned metrics.

References:

You trained a deep model for a regression task. The model predicts the expected sale price for a house based on features that are not guaranteed to be independent. You want to evaluate your model by defining a baseline approach and selecting an evaluation metric for comparison that detects high variance in the model. What should you do?

Use a heuristic that predicts the mean value as the baseline, and compare the trained model’s mean absolute error against the baseline.

Always predicting the mean value is not expected to be a strong baseline; house prices could assume a wide range of values. Also, mean absolute error is not the best metric to detect variance because it gives the same weight to all errors.

Use a linear model trained on the most predictive features as the baseline, and compare the trained model’s root mean squared error against the baseline.

A linear model is not expected to perform well with multicollinearity. Also, root mean squared error does not penalize high variance as much as mean squared error because the root operation reduces the importance of higher values.

Determine the maximum acceptable mean absolute percentage error (MAPE) as the baseline, and compare the model’s MAPE against the baseline.

While defining a threshold for acceptable performance is a good practice for blessing models, a baseline should aim to test statistically a model’s ability to learn by comparing it to a less complex data-driven approach. Also, this approach does not detect high variance in the model.

Use a simple neural network with one fully connected hidden layer as the baseline, and compare the trained model’s mean squared error against the baseline.

A one-layer neural network can handle collinearity and is a good baseline. The mean square error is a good metric because it gives more weight to errors with larger absolute values than to errors with smaller absolute values.

References:

You designed a 5-billion-parameter language model in TensorFlow Keras that used autotuned tf.data to load the data in memory. You created a distributed training job in Vertex AI with tf.distribute.MirroredStrategy, and set the large_model_v100 machine for the primary instance. The training job fails with the following error:
“The replica 0 ran out of memory with a non-zero status of 9.”
You want to fix this error without vertically increasing the memory of the replicas. What should you do?

Keep MirroredStrategy. Increase the number of attached V100 accelerators until the memory error is resolved.

MirroredStrategy is a data-parallel approach. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.

Switch to ParameterServerStrategy, and add a parameter server worker pool with large_model_v100 instance type.

The parameter server alleviates some workload from the primary replica by coordinating the shared model state between the workers, but it still requires the whole model to be shared with workers. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.

Switch to tf.distribute.MultiWorkerMirroredStrategy with Reduction Server. Increase the number of workers until the memory error is resolved.

MultiWorkerMirroredStrategy is a data-parallel approach. This approach is not expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself. Reduction Server increases throughput and reduces latency of communication, but it does not help with memory issues.

Switch to a custom distribution strategy that uses TF_CONFIG to equally split model layers between workers. Increase the number of workers until the memory error is resolved.

This is an example of a model-parallel approach that splits the model between workers. You can use DTensors to implement this. This approach is expected to fix the error because the memory issues in the primary replica are caused by the size of the model itself.

References:

You need to develop an online model prediction service that accesses pre-computed near-real-time features and returns a customer churn probability value. The features are saved in BigQuery and updated hourly using a scheduled query. You want this service to be low latency and scalable and require minimal maintenance. What should you do?

1. Configure a Cloud Function that exports features from BigQuery to Memorystore.
2. Use Memorystore to perform feature lookup. Deploy the model as a custom prediction endpoint in Vertex AI, and enable automatic scaling.

This approach creates a fully managed autoscalable service that minimizes maintenance while providing low latency with the use of Memorystore.

1. Configure a Cloud Function that exports features from BigQuery to Memorystore.
2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Memorystore and performs inference with an in-memory model.

Feature lookup and model inference can be performed in Cloud Functions, and using Google Kubernetes Engine increases maintenance.

1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store.
2. Use the online service API from Vertex AI Feature Store to perform feature lookup. Deploy the model as a custom prediction endpoint in Vertex AI, and enable automatic scaling.

Vertex AI Feature Store is not as low-latency as Memorystore.

1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store.
2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Vertex AI Feature Store’s online serving API and performs inference with an in-memory model.

Feature lookup and model inference can be performed in Cloud Functions, and using Google Kubernetes Engine increases maintenance. Also, Vertex AI Feature Store is not as low-latency as Memorystore.

References:

You are logged into the Vertex AI Pipeline UI and noticed that an automated production TensorFlow training pipeline finished three hours earlier than a typical run. You do not have access to production data for security reasons, but you have verified that no alert was logged in any of the ML system’s monitoring systems and that the pipeline code has not been updated recently. You want to debug the pipeline as quickly as possible so you can determine whether to deploy the trained model. What should you do?

Navigate to Vertex AI Pipelines, and open Vertex AI TensorBoard. Check whether the training regime and metrics converge.

TensorBoard provides a compact and complete overview of training metrics such as loss and accuracy over time. If the training converges with the model’s expected accuracy, the model can be deployed.

Access the Pipeline run analysis pane from Vertex AI Pipelines, and check whether the input configuration and pipeline steps have the expected values.

Checking input configuration is a good test, but it is not sufficient to ensure that model performance is acceptable. You can access logs and outputs for each pipeline step to review model performance, but it would involve more steps than using TensorBoard.

Determine the trained model’s location from the pipeline’s metadata in Vertex ML Metadata, and compare the trained model’s size to the previous model.

Model size is a good indicator of health but does not provide a complete overview to make sure that the model can be safely deployed. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.

Request access to production systems. Get the training data’s location from the pipeline’s metadata in Vertex ML Metadata, and compare data volumes of the current run to the previous run.

Data is the most probable cause of this behavior, but it is not the only possible cause. Also, access requests could take a long time and are not the most secure option. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.

References:

You recently developed a custom ML model that was trained in Vertex AI on a post-processed training dataset stored in BigQuery. You used a Cloud Run container to deploy the prediction service. The service performs feature lookup and pre-processing and sends a prediction request to a model endpoint in Vertex AI. You want to configure a comprehensive monitoring solution for training-serving skew that requires minimal maintenance. What should you do?

Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses email to send alerts. When an alert is received, use the console to diagnose the issue.

Vertex AI Model Monitoring is a fully managed solution for monitoring training-serving skew that, by definition, requires minimal maintenance. Using the console for diagnostics is recommended for a comprehensive monitoring solution because there could be multiple causes for the skew that require manual review.

Update the model hosted in Vertex AI to enable request-response logging. Create a Data Studio dashboard that compares training data and logged data for potential training-serving skew and uses email to send a daily scheduled report.

This solution does not minimize maintenance. It involves multiple custom components that require additional updates for any schema change.

Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.

A model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.

Update the model hosted in Vertex AI to enable request-response logging. Schedule a daily DataFlow Flex job that uses Tensorflow Data Validation to detect training-serving skew and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.

This solution does not minimize maintenance. It involves multiple components that require additional updates for any schema change. Also, a model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.

References:

You have a historical data set of the sale price of 10,000 houses and the 10 most important features resulting from principal component analysis (PCA). You need to develop a model that predicts whether a house will sell at one of the following equally distributed price ranges: 200-300k, 300-400k, 400-500k, 500-600k, or 600-700k. You want to use the simplest algorithmic and evaluative approach. What should you do?

Define a one-vs-one classification task where each price range is a categorical label. Use F1 score as the metric.

This approach is more complex than the classification approach suggested in response B. F1 score is not useful with equally distributed labels, and one-vs-one classification is used for multi-label classification, but the use case would require only one label to be correct.

Define a multi-class classification task where each price range is a categorical label. Use accuracy as the metric.

The use case is an ordinal classification task which is most simply solved using multi-class classification. Accuracy as a metric is the best match for a use case with discrete and balanced labels.

Define a regression task where the label is the sale price represented as an integer. Use mean absolute error as the metric.

Regression is not the recommended approach when solving an ordinal classification task with a small number of discrete values. This specific regression approach adds complexity in comparison to the regression approach suggested in response D because it uses the exact sale price to predict a range. Finally, the mean absolute error would not be the recommended metric because it gives the same penalty for errors of any magnitude.

Define a regression task where the label is the average of the price range that corresponds to the house sale price represented as an integer. Use root mean squared error as the metric.

Regression is not the recommended approach when solving an ordinal classification task with a small number of discrete values. This specific regression approach would be recommended in comparison to the regression approach suggested in response C because it uses a less complex label and a recommended metric to minimize variance and bias.

References:

You downloaded a TensorFlow language model pre-trained on a proprietary dataset by another company, and you tuned the model with Vertex AI Training by replacing the last layer with a custom dense layer. The model achieves the expected offline accuracy; however, it exceeds the required online prediction latency by 20ms. You want to optimize the model to reduce latency while minimizing the offline performance drop before deploying the model to production. What should you do?

Apply post-training quantization on the tuned model, and serve the quantized model.

Post-training quantization is the recommended option for reducing model latency when re-training is not possible. Post-training quantization can minimally decrease model performance.

Use quantization-aware training to tune the pre-trained model on your dataset, and serve the quantized model.

Tuning the whole model on the custom dataset only will cause a drop in offline performance.

Use pruning to tune the pre-trained model on your dataset, and serve the pruned model after stripping it of training variables.

Tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, pruning helps in compressing model size, but it is expected to provide less latency improvements than quantization.

Use clustering to tune the pre-trained model on your dataset, and serve the clustered model after stripping it of training variables.

Tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, clustering helps in compressing model size, but it does not reduce latency.

References:

You developed a model for a classification task where the minority class appears in 10% of the data set. You ran the training on the original imbalanced data set and have checked the resulting model performance. The confusion matrix indicates that the model did not learn the minority class. You want to improve the model performance while minimizing run time and keeping the predictions calibrated. What should you do?

Update the weights of the classification function to penalize misclassifications of the minority class.

This approach does not guarantee calibrated predictions and does not improve training run time.

Tune the classification threshold, and calibrate the model with isotonic regression on the validation set.

This approach increases run time by adding threshold tuning and calibration on top of model training.

Upsample the minority class in the training set, and update the weight of the upsampled class by the same sampling factor.

Upsampling increases training run time by providing more data samples during training.

Downsample the majority class in the training set, and update the weight of the downsampled class by the same sampling factor.

Downsampling with upweighting improves performance on the minority class while speeding up convergence and keeping the predictions calibrated.

References:

You have a dataset that is split into training, validation, and test sets. All the sets have similar distributions. You have sub-selected the most relevant features and trained a neural network in TensorFlow. TensorBoard plots show the training loss oscillating around 0.9, with the validation loss higher than the training loss by 0.3. You want to update the training regime to maximize the convergence of both losses and reduce overfitting. What should you do?

Decrease the learning rate to fix the validation loss, and increase the number of training epochs to improve the convergence of both losses.

Changing the learning rate does not reduce overfitting. Increasing the number of training epochs is not expected to improve the losses significantly.

Decrease the learning rate to fix the validation loss, and increase the number and dimension of the layers in the network to improve the convergence of both losses.

Changing the learning rate does not reduce overfitting.

Introduce L1 regularization to fix the validation loss, and increase the learning rate and the number of training epochs to improve the convergence of both losses.

Increasing the number of training epochs is not expected to improve the losses significantly, and increasing the learning rate could also make the model training unstable. L1 regularization could be used to stabilize the learning, but it is not expected to be particularly helpful because only the most relevant features have been used for training.

Introduce L2 regularization to fix the validation loss, and increase the number and dimension of the layers in the network to improve the convergence of both losses.

L2 regularization prevents overfitting. Increasing the model’s complexity boosts the predictive ability of the model, which is expected to optimize loss convergence when underfitting.

References:

You recently used Vertex AI Prediction to deploy a custom-trained model in production. The automated re-training pipeline made available a new model version that passed all unit and infrastructure tests. You want to define a rollout strategy for the new model version that guarantees an optimal user experience with zero downtime. What should you do?

Release the new model version in the same Vertex AI endpoint. Use traffic splitting in Vertex AI Prediction to route a small random subset of requests to the new version and, if the new version is successful, gradually route the remaining traffic to it.

Canary deployments may affect user experience, even if on a small subset of users.

Release the new model version in a new Vertex AI endpoint. Update the application to send all requests to both Vertex AI endpoints, and log the predictions from the new endpoint. If the new version is successful, route all traffic to the new application.

Shadow deployments minimize the risk of affecting user experience while ensuring zero downtime.

Deploy the current model version with an Istio resource in Google Kubernetes Engine, and route production traffic to it. Deploy the new model version, and use Istio to route a small random subset of traffic to it. If the new version is successful, gradually route the remaining traffic to it.

Canary deployments may affect user experience, even if on a small subset of users. This approach is a less managed alternative to response A and could cause downtime when moving between services.

Install Seldon Core and deploy an Istio resource in Google Kubernetes Engine. Deploy the current model version and the new model version using the multi-armed bandit algorithm in Seldon to dynamically route requests between the two versions before eventually routing all traffic over to the best-performing version.

The multi-armed bandit approach may affect user experience, even if on a small subset of users. This approach could cause downtime when moving between services.

References:

You trained a model for sentiment analysis in TensorFlow Keras, saved it in SavedModel format, and deployed it with Vertex AI Predictions as a custom container. You selected a random sentence from the test set, and used a REST API call to send a prediction request. The service returned the error:
“Could not find matching concrete function to call loaded from the SavedModel. Got: Tensor("inputs:0", shape=(None,), dtype=string). Expected: TensorSpec(shape=(None, None), dtype=tf.int64, name='inputs')”.
You want to update the model’s code and fix the error while following Google-recommended best practices. What should you do?

Combine all preprocessing steps in a function, and call the function on the string input before requesting the model’s prediction on the processed input.

Duplicating the preprocessing adds unnecessary dependencies between the training and serving code and could cause training-serving skew.

Combine all preprocessing steps in a function, and update the default serving signature to accept a string input wrapped into the preprocessing function call.

This approach efficiently updates the model while ensuring no training-serving skew.

Create a custom layer that performs all preprocessing steps, and update the Keras model to accept a string input followed by the custom preprocessing layer.

This approach adds unnecessary complexity. Because you update the model directly, you will need to re-train the model.

Combine all preprocessing steps in a function, and update the Keras model to accept a string input followed by a Lambda layer wrapping the preprocessing function.

This approach adds unnecessary complexity. Because you update the model directly, you will need to re-train the model. Note that using Lambda layers over custom layers is recommended for simple operations or quick experimentation only.

References:

You used Vertex AI Workbench user-managed notebooks to develop a TensorFlow model. The model pipeline accesses data from Cloud Storage, performs feature engineering and training locally, and outputs the trained model in Vertex AI Model Registry. The end-to-end pipeline takes 10 hours on the attached optimized instance type. You want to introduce model and data lineage for automated re-training runs for this pipeline only while minimizing the cost to run the pipeline. What should you do?

1. Use the Vertex AI SDK to create an experiment for the pipeline runs, and save metadata throughout the pipeline.
2. Configure a scheduled recurring execution for the notebook.
3. Access data and model metadata in Vertex ML Metadata.

A managed solution does not minimize running costs, and Vertex AI ML Metadata is more managed than Cloud Storage.

1. Use the Vertex AI SDK to create an experiment, launch a custom training job in Vertex training service with the same instance type configuration as the notebook, and save metadata throughout the pipeline.
2. Configure a scheduled recurring execution for the notebook.
3. Access data and model metadata in Vertex ML Metadata.

A managed solution does not minimize running costs, and this approach introduces Vertex training service with Vertex ML Metadata, which are both managed services.

1. Create a Cloud Storage bucket to store metadata.
2. Write a function that saves data and model metadata by using TensorFlow ML Metadata in one time-stamped subfolder per pipeline run.
3. Configure a scheduled recurring execution for the notebook.
4. Access data and model metadata in Cloud Storage.

This approach minimizes running costs by being self-managed. This approach is recommended to minimize running costs only for simple use cases such as deploying one pipeline only. When optimizing for maintenance and development costs or scaling to more than one pipeline or performing experimentation, using Vertex ML Metadata and Vertex AI Pipelines are recommended

1. Refactor the pipeline code into a TensorFlow Extended (TFX) pipeline.
2. Load the TFX pipeline in Vertex AI Pipelines, and configure the pipeline to use the same instance type configuration as the notebook.
3. Use Cloud Scheduler to configure a recurring execution for the pipeline.
4. Access data and model metadata in Vertex AI Pipelines.

A managed solution does not minimize running costs, and this approach introduces Vertex AI Pipelines, which is a fully managed service.

References:

You work for a manufacturing company that owns a high-value machine which has several machine settings and multiple sensors. A history of the machine’s hourly sensor readings and known failure event data are stored in BigQuery. You need to predict if the machine will fail within the next 3 days in order to schedule maintenance before the machine fails. Which data preparation and model training steps should you take?

Data preparation: Daily max value feature engineering with DataPrep; Model training: AutoML classification with BQML

DataPrep is not appropriate.

Data preparation: Daily min value feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to True

DataPrep is not appropriate.

Data preparation: Rolling average feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to False

Model training does not balance class labels for unbalanced data sets

Data preparation: Rolling average feature engineering with DataPrep; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to True

Considering the noise and fluctuations of the data, the moving average is more appropriate than min/max to show the trend.
Model training: BQML allows you to create and run machine learning models using standard SQL queries in BigQuery.
The 'auto_class_weights=TRUE' option balances class labels in the training data. By default, the training data is not weighted. If the training data labels are out of balance, the model can train to predict by weighting the most popular label classes more.
It is correct because it uses a moving average of the sensor data and balances the weights using the parameters of BQML, AUTO_CLASS_WEIGHTS.

You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company’s mobile app. You have reviewed old chat logs and tagged each conversation for intent based on each customer’s stated intention for contacting customer service. About 70% of customer inquiries are simple requests that are solved within 10 intents. The remaining 30% of inquiries require much longer and more complicated requests. Which intents should you automate first?

Automate a blend of the shortest and longest intents to be representative of all intents.

You should not automate the higher value requests.

Automate the more complicated requests first because those require more of the agents’ time.

Live agents are better suited to handle these complicated requests.

Automate the 10 intents that cover 70% of the requests so that live agents can handle the more complicated requests.

It enables a machine to handle the most simple requests and gives the live agents more opportunity to handle higher value requests.

Automate intents in places where common words such as “payment” only appear once to avoid confusing the software.

Dialogflow can handle the same word in multiple intents.

You work for a maintenance company and have built and trained a deep learning model that identifies defects based on thermal images of underground electric cables. Your dataset contains 10,000 images, 100 of which contain visible defects. How should you evaluate the performance of the model on a test dataset?

Calculate the Area Under the Curve (AUC) value.

It is scale-invariant. AUC measures how well predictions are ranked, rather than their absolute values. AUC is also classification-threshold invariant. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen.

Calculate the number of true positive results predicted by the model.

Calculating the number of true positives without considering false positives can lead to misleading results. For instance, the model could classify nearly every image as a defect. This would result in many true positives, but the model would in fact be a very poor discriminator.

Calculate the fraction of images predicted by the model to have a visible defect.

Merely calculating the fraction of images that contain defects doesn’t indicate whether your model is accurate or not.

Calculate the Cosine Similarity to compare the model’s performance on the test dataset to the model’s performance on the training dataset.

This metric is more commonly used in distance-based models (e.g., K Nearest Neighbors). This isn’t an appropriate metric for checking the performance of an image classification model.

You are an ML engineer at a media company. You need to build an ML model to analyze video content frame by frame, identify objects, and alert users if there is inappropriate content. Which Google Cloud products should you use to build this project?

Pub/Sub, Cloud Functions, and Vision API

There is no tool for alerting and notifying.

Pub/Sub, Cloud IoT, Dataflow, Vision API, and Cloud Logging

It uses Vision API for processing videos.

Pub/Sub, Cloud Functions, Video Intelligence API, and Cloud Logging

Video Intelligence API can find inappropriate components and other components satisfy the requirements of real-time processing and notification.

Pub/Sub, Cloud Functions, AutoML, and Cloud Logging

AutoML is for cases where you wish to customize models with Google’s model and your data.

- Vision API is for images not videos
- They want something generic not specific content/nothing custom (if custom use AutoML)

References:

https://cloud.google.com/video-intelligence

You need to write a generic test to verify whether Deep Neural Network (DNN) models automatically released by your team have a sufficient number of parameters to learn the task for which they were built. What should you do?

Train the model for a few iterations, and check for NaN values.

The test does not check that the model has enough parameters to learn the task.

Train the model for a few iterations, and verify that the loss is constant.

The loss should decrease if you have enough parameters to learn the task.

Train a simple linear model, and determine if the DNN model outperforms it.

Outperforming the linear model does not guarantee that the model has enough parameters to learn tasks with non-linear data representations. The option also doesn’t quantify a metric to give an indication of how well the model performed.

Train the model with no regularization, and verify that the loss function is close to zero.

The test can check that the model has enough parameters to memorize the task.

You work for a textile manufacturer and have been asked to build a model to detect and classify fabric defects. You trained a machine learning model with high recall based on high resolution images taken at the end of the production line. You want quality control inspectors to gain trust in your model. Which technique should you use to understand the rationale of your classifier?

Use K-fold cross validation to understand how the model performs on different test datasets.

K-fold cross validation offers no explanation on the predictions made by the model.

Use the Integrated Gradients method to efficiently compute feature attributions for each predicted image.

It identifies the pixel of the input image that leads to the classification of the image itself.

Use PCA (Principal Component Analysis) to reduce the original feature set to a smaller set of easily understood features.

PCA simplifies higher dimensional datasets but offers no added benefit to the scenario.

Use k-means clustering to group similar images together, and calculate the Davies-Bouldin index to evaluate the separation between clusters.

clustering images does not provide any insight into why the classification model made the predictions that it did.

You work for a large retailer. You want to use ML to forecast future sales leveraging 10 years of historical sales data. The historical data is stored in Cloud Storage in Avro format. You want to rapidly experiment with all the available data. How should you build and train your model for the sales forecast?

Load data into BigQuery and use the ARIMA model type on BigQuery ML.

BigQuery ML is designed for fast and rapid experimentation and it is possible to use federated queries to read data directly from Cloud Storage. Moreover, ARIMA is considered one of the best in class for time series forecasting.

Convert the data into CSV format and create a regression model on AutoML.

AutoML is not ideal for fast iteration and rapid experimentation. Even if it does not require data cleanup and hyperparameter tuning, it takes at least one hour to create a model.

Convert the data into TFRecords and create an RNN model on TensorFlow on Vertex AI Workbench.

In order to build a custom TensorFlow model, you would still need to do data cleanup and hyperparameter tuning.

Convert and refactor the data into CSV format and use the built-in XGBoost algorithm on Vertex AI custom training.

Using Vertex AI custom training requires preprocessing your data in a particular CSV structure and it is not ideal for fast iteration, as training times can take a long time because it cannot be distributed on multiple machines.

You need to build an object detection model for a small startup company to identify if and where the company’s logo appears in an image. You were given a large repository of images, some with logos and some without. These images are not yet labelled. You need to label these pictures, and then train and deploy the model. What should you do?

Use Google Cloud’s Data Labelling Service to label your data. Use AutoML Object Detection to train and deploy the model.

This will allow you to easily create a request for a labelling task and deploy a high-performance model.

Use Vision API to detect and identify logos in pictures and use it as a label. Use Vertex AI to build and train a convolutional neural network.

Vision API is not guaranteed to work with any company logos, and in the statement it explicitly mentions a small startup, which will further decrease the chance of success.

Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use Vertex AI to build and train a convolutional neural network.

The task of manually labelling the data is time consuming and should be avoided if possible.

Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use Vertex AI to build and train a real time object detection model.

The task of labelling object detection data is very tedious, and real time object detection is designed detecting objects in videos rather than in images.

You work for a gaming company that develops and manages a popular massively multiplayer online (MMO) game. The game’s environment is open-ended, and a large number of positions and moves can be taken by a player. Your team has developed an ML model with TensorFlow that predicts the next move of each player. Edge deployment is not possible, but low-latency serving is required. How should you configure the deployment?

Use a Cloud TPU to optimize model training speed.

Use Vertex AI Endpoint with an NVIDIA GPU.

Use Vertex AI Endpoint with a high-CPU machine type to get a batch prediction for the players.

Use Vertex AI Endpoint with a high-memory machine type to get a batch prediction for the players.

Your team is using a TensorFlow Inception-v3 CNN model pretrained on ImageNet for an image classification prediction challenge on 10,000 images. You will use Vertex AI to perform the model training. What TensorFlow distribution strategy and Vertex AI custom training job configuration should you use to train the model and optimize for wall-clock time?

Default Strategy; Custom tier with a single master node and four v100 GPUs.

Default Strategy does not distribute training across multiple devices.

One Device Strategy; Custom tier with a single master node and four v100 GPUs.

One Device Strategy does not distribute training across multiple devices.

One Device Strategy; Custom tier with a single master node and eight v100 GPUs.

One Device Strategy does not distribute training across multiple devices.

MirroredStrategy; Custom tier with a single master node and four v100 GPUs.

This is the only strategy that can perform distributed training; albeit there is only a single copy of the variables on the CPU host.

You work on a team where the process for deploying a model into production starts with data scientists training different versions of models in a Kubeflow pipeline. The workflow then stores the new model artifact into the corresponding Cloud Storage bucket. You need to build the next steps of the pipeline after the submitted model is ready to be tested and deployed in production on Vertex AI. How should you configure the architecture before deploying the model to production?

Deploy model in test environment -> Evaluate and test model -> Create a new Vertex AI model version

The model can be validated after it is deployed to the test environment, and the release version is established before the model is deployed in production.

Validate model -> Deploy model in test environment -> Create a new Vertex AI model version

The model cannot be validated before being deployed to the test environment.

Create a new Vertex AI model version -> Evaluate and test model -> Deploy model in test environment

The model version is being set up for the release candidate before the model is validated. Moreover, the model cannot be validated before being deployed to the test environment.

Create a new Vertex AI model version - > Deploy model in test environment -> Validate model

The model version is being set up for the release candidate before the model is validated.

AutoML, Vertex AI Workbench, and TensorFlow align to which stage of the data-to-AI workflow?

Ingestion and process

Analytics

Storage

Machine learning

Compute Engine, Google Kubernetes Engine, App Engine, and Cloud Functions represent which type of services?

Database and storage

Networking

Compute

Machine learning

Which data storage class is best for storing data that needs to be accessed less than once a year, such as online backups and disaster recovery?

Standard storage

Coldline storage

Nearline storage

Archive storage

Which Google hardware innovation tailors architecture to meet the computation needs on a domain, such as the matrix multiplication in machine learning?

CPUs (central processing units)

TPUs (Tensor Processing Units)

GPUs (graphic processing units)

DPUs (data processing units)

Pub/Sub, Dataflow, Dataproc, and Cloud Data Fusion align to which stage of the data-to-AI workflow?

Ingestion and process

Analytics

Storage

Machine learning

Cloud Storage, Cloud Bigtable, Cloud SQL, Cloud Spanner, and Firestore represent which type of services?

Machine learning

Database and storage

Networking

Compute

Due to several data types and sources, big data often has many data dimensions. This can introduce data inconsistencies and uncertainties. Which type of challenge might this present to data engineers?

Volume

Veracity

Velocity

Variety

Which Google Cloud product acts as an execution engine to process and implement data processing pipelines?

Looker

Dataflow

Looker Studio

Apache Beam

Select the correct streaming data workflow.

Visualize the data, process the data, and ingest the streaming data.

Process the data, visualize the data, and ingest the data.

Ingest the streaming data, visualize the data, and process the data.

Ingest the streaming data, process the data, and visualize the results.

When you build scalable and reliable pipelines, data often needs to be processed in near-real time, as soon as it reaches the system. Which type of challenge might this present to data engineers?

Volume

Veracity

Velocity

Variety

Which Google Cloud product is a distributed messaging service that is designed to ingest messages from multiple device streams such as gaming events, IoT devices, and application streams?

Pub/Sub

Apache Beam

Looker Studio

Looker

In a supervised machine learning model, what provides historical data that can be used to predict future data?

Data points

Labels

Features

Examples

Which BigQuery feature leverages geography data types and standard SQL geography functions to analyze a data set?

Building machine learning models

Ad hoc analysis

Building business intelligence dashboards

Geospatial analysis

BigQuery is a fully managed data warehouse. What does “fully managed” refer to?

BigQuery manages the cost for you.

BigQuery manages the data quality for you.

BigQuery manages the data source for you.

BigQuery manages the underlying structure for you.

Which two services does BigQuery provide?

Application services and storage

Storage and compute

Storage and analytics

Application services and analytics

You want to use machine learning to identify whether an email is spam. Which should you use?

Supervised learning, logistic regression

Unsupervised learning, cluster analysis

Unsupervised learning, dimensionality reduction

Supervised learning, linear regression

You want to use machine learning to group random photos into similar groups. Which should you use?

Supervised learning, logistic regression

Unsupervised learning, cluster analysis

Unsupervised learning, dimensionality reduction

Supervised learning, linear regression

Which pattern describes source data that is moved into a BigQuery table in a single operation?

Spot load

Batch load

Generated data

Streaming

Data has been loaded into BigQuery, and the features have been selected and preprocessed. What should happen next when you use BigQuery ML to develop a machine learning model?

Evaluate the performance of the trained ML model.

Use the ML model to make predictions.

Classify labels to train on historical data.

Create the ML model inside BigQuery.

You work for a video production company and want to use machine learning to categorize event footage, but don’t want to train your own ML model. Which option can help you get started?

Custom training

Pre-built APIs

AutoML

BigQuery ML

Which Google Cloud product lets users create, deploy, and manage machine learning models in one unified platform?

Vertex AI

TensorFlow

AI Platform

Document AI

You work for a global hotel chain that has recently loaded some guest data into BigQuery. You have experience writing SQL and want to leverage machine learning to help predict guest trends for the next few months. Which option is best?

Custom training

Pre-built APIs

AutoML

BigQuery ML

Which code-based solution offered with Vertex AI gives data scientists full control over the development environment and process?

AI Solutions

Custom training

AI Platform

AutoML

Your company has a lot of data, and you want to train your own machine model to see what insights ML can provide. Due to resource constraints, you require a codeless solution. Which option is best?

Custom training

Pre-built APIs

AutoML

BigQuery ML

Which stage of the machine learning workflow includes model evaluation?

Model training

Model serving

Data preparation

Which Vertex AI tool automates, monitors, and governs machine learning systems by orchestrating the workflow in a serverless manner?

Vertex AI console

Vertex AI Feature Store

Vertex AI Pipelines

Vertex AI Workbench

A hospital uses Google’s machine learning technology to help pre-diagnose cancer by feeding historical patient medical data to the model. The goal is to identify as many potential cases as possible. Which metric should the model focus on?

Recall

Confusion matrix

Feature importance

Precision

Which stage of the machine learning workflow includes feature engineering?

Model training

Model serving

Data preparation

A farm uses Google’s machine learning technology to detect defective apples in their crop, such as those that are irregular in size or have scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?

Recall

Confusion matrix

Feature importance

Precision

Select the correct machine learning workflow.

Data preparation, model serving, model training

Data preparation, model training, model serving

Model serving, data preparation, model training

Model training, data preparation, model serving

What would you use to replace user input by machine learning?

Neural networks.

Labeled data.

Pre-trained models.

All of the options.

Which of the following is not part of the ML training phase?

Evaluating the models

Create the models

Connecting Neural Networks

Data management

Which of the following are best practices for Data preparation?

Avoid target leakage

Partially correct.

Provide a time signal

Partially correct.

Avoid training-serving skew

Partially correct.

All of the options.

Which of the following refers to the type of data used in ML models?

Unlabeled data

Partially correct.

Flagged data

Labeled data

Partially correct.

Both Labeled & Unlabeled data

What’s the most efficient way to transcribe speech?

You can collect audio data, train it and predict with it.

Use a Dictionary website for a partial transcription, then using ML to fill in what’s missing.

You can use a speech API.

All of the options.

Which of the following are facets that differentiate deep learning networks in multilayer networks?

More complex ways of connecting layers

Partially correct.

Automatic feature extraction

Partially correct.

All of the options.

Cambrian explosion of computing power to train

Partially correct.

Which of the following statement is incorrect?

Machine learning performs some core and numerical tasks

Machine learning doesn't have unit tests of its own.

None of the options are correct.

Machine learning doesn't serve that task in a website.

Which of the following statement is true about ML systems?

It generates a lot of value for the organization, for customers and for end users.

Partially correct.

Almost every single one has a team of people reviewing the algorithms, reviewing their responses and doing random sub-samples and it generates a lot of value for the organization, for customers and for end users.

None of the options are correct.

Almost every single one has a team of people reviewing the algorithms, reviewing their responses and doing random sub-samples.

Partially correct.

Which of the following networks is used in identifying faces, objects, and traffic signs?

Convolutional Neural Networks

Recurrent Neural Networks

None of the options are correct.

Deep Neural Networks

Vertex AI is flexible. You choose your training method. _____________ lets you create a training application optimized for your targeted outcome. You have complete control over training application functionality; you can target any objective, use any algorithm, develop your own loss functions or metrics, or do any other customization.

Containerized training

AutoML

Custom training

Custom training and AutoML

What is a managed dataset in Vertex AI?

Data loaded into Python - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.

Data loaded into AutoML Tables - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.

Data loaded into Vertex AI - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.

Data loaded into a Pandas Dataframe - whether it be from Google Cloud Storage or BigQuery. This means, for example, that it can be linked to a model.

Typically, ML practitioners train models using different architectures, input data sets, hyperparameters, and hardware. What architectural type would you use for cyber-security, pattern recognition, self-driving cars, and reinforced learning?

GANS or Generative Adversarial Networks

RNNs or Recurrent Neural Networks

Sorting/Clustering

CNNs or Convolutional Neural Networks

References:

https://www.bmc.com/blogs/machine-learning-architecture/

The way you deploy a TensorFlow model is different from how you deploy a PyTorch model, and even TensorFlow models might differ based on whether they were created using AutoML or by means of code. True or False: In the unified set of APIs that Vertex AI provides, you can treat all these models in the same way.

False

True

Which Vertex AI service lets you access data, process data in a Dataproc cluster, train a model, share your results, and more, all without leaving the JupyterLab interface?

Models

Datasets

Workbench

Pipelines

Moving from experimentation to production requires packaging, deploying and monitoring your model - which can give you confidence that your model is making useful predictions in production. Monitoring measures key model performance metrics and includes:

TPU drift, RNN performance, CPU outliers and data quality.

Architectural drift, TPU performance, zone outliers and RNNs.

Model drift, model performance, model outliers and data quality.

Architectural drift, TPU hyperparameter performance, zone outliers and RNNs and CNNS.

In Machine learning development, which phase identifies your use case?

Evaluating the Model

Experimenting

Prepare training Data

Framing the problem

Vertex AI Workbench provides two Jupyter notebook-based options for your data science workflow. __________________are Deep Learning VM Images instances that are heavily customizable and are therefore ideal for users who need a lot of control over their environment.

User-Managed notebook instances

Managed notebook instances

UnManaged notebooks and User-defined notebooks

Managed notebooks and already created notebooks

Vertex AI Workbench provides two Jupyter notebook-based options for your data science workflow. __________________ are Google-managed environments with integrations and features that help you set up and work in an end-to-end notebook-based production environment.

Managed notebook instances

User Managed notebook instances

UnManaged notebooks and User-defined notebooks

Managed notebooks and already created notebooks

Which statement is correct regarding Vertex AI Workbench Notebooks?

Both options are pre-packaged with JupyterLab and have a pre-installed suite of deep learning packages, including support for the TensorFlow and PyTorch frameworks.

Partially correct.

Both options support GPU accelerators and the ability to sync with a GitHub repository.

Partially correct.

Both options are protected by Google Cloud authentication and authorization.

Partially correct.

All of the options.

True or False. In a Vertex AI Workbench Jupyter Notebook, you can access your data without leaving the JupyterLab interface.

True

False

Where can you find the Cloud Storage and Bigquery extension to browse data?

Left side-bar

Top menu-bar

Bottom

In the notebook

For users who have specific networking and security needs, ______ can be the best option. You can use VPC Service Controls to set up a ______ within a service perimeter and implement other built-in networking and security features. You can also configure user-managed notebooks instances manually to satisfy some specific networking and security needs.

User-Managed notebook instances

Managed notebook instances

UnManaged notebooks and User-defined notebooks

Managed notebooks and already created notebooks

Which of the following statements is correct for Explainable AI?

It helps you better understand your model's data.

It offers feature attributions to provide insights into why models generate predictions.

It details the importance of one feature that a model uses as input to make predictions.

It supports only pre-trained models based on tabular and image data.

Your dataset is considered small, less than 5,000 rows and around 10MB. You are not using AutoML but a Jupyter Notebook instance. Which of the following is a Best Practice for Training a model with a small dataset?

For small datasets, train the model using the Vertex AI training service.

For small datasets, train the model within the notebook instance.

For small datasets, train the model within the notebook instance, the Vertex AI training service, and the containerized training service.

For small datasets, train the model within the notebook instance and use the Vertex AI training service.

True or False: Use BigQuery to process tabular data and use Dataflow to process unstructured data.

False

True

The data used to train a model can originate from any number of systems, for example, logs from an online service system, images from a local device, or documents scraped from the web. Which of the following is a Best Practice for Preparing and Storing unstructured data such as images, audio, and video?

In BigQuery

In Cloud storage

In Cloud SQL

In BigTable

Which approach is followed to achieve a better performance across subgroups?

Evaluation metrics

None of the options are correct.

Equality of opportunity

Confusion matrix

Human biases lead to bias in machine learning models. Unconscious biases exist in our data and exist in two forms. What are the two forms of unconscious biases in data?

There are the human biases that exist in data because data found in “data silos” has existing biases with regard to properties like gender, race, and sexual orientation. We can also run into human biases which arise as part of our data collection and labeling procedures.

All of the options.

There are the human biases that exist in data because data found in “the world” has existing biases with regard to properties like gender, race, and sexual orientation. For example, there may be reporting bias by our subjects because they only choose to reveal certain aspects about themselves or their opinions. We can also run into human biases which arise as part of our data collection and labeling procedures.

First, there is human bias as a result of reporting, data collection, and labeling. Second, there is human bias as a result of data visualization and analysis.

One of the key tools to help in understanding inclusion and how to introduce inclusion across different kinds of groups across your data is by understanding the __________________________.

Evaluation regression matrix

Equality of opportunity matrix

Confusion matrix

Sigmoid matrix

The confusion matrix helps which of the following?

Evaluating performance in machine learning

Partially correct.

None of the options are correct.

Understanding inclusion and how to introduce inclusion across different subgroups within your data

Partially correct.

Both of the options are correct.

Datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. The key here is to utilize visualizations that help unlock nuances and insights in large datasets. Which tool would be most appropriate?

Firebase

Pandas

SQL

Facets

The impact of biases in collecting data and labeling data affects the entire machine learning pipeline. The biases in the original data are going to be reflected downstream in our models and consequently are going to result in potentially biased outcomes. You need to create a checklist for situations where you should watch out for bias-related issues. What questions should this checklist include?

Does your use case or product use data that is likely to be highly correlated with any personal characteristics (for example, zip code or other geospatial data is often correlated with socioeconomic status and/or income; image/video data can reveal information about race, gender, and age)?

Partially correct.

All of the options.

Does your use case or product specifically use any of the following data: biometrics, race, skin color, religion, sexual orientation, socioeconomic status, income, country, location, health, language, or dialect?

Partially correct.

Could your use case or product negatively affect individuals’ economic or other important life opportunities?

Partially correct.

What is it called when the label says something doesn't exist, but the model says it exists?

False positive

None of the options are correct.

False negative

True positive

Which of the following is an example of a “false negative”?

When the label says something exists and the model doesn’t predict it—that’s a false negative. So, in the face detection example in this lesson, the model says that there is no face in the image—when the image’s label says there *is* a face.

The label says there is no face, and the model finds no face.

The label says there is no face, but the model finds a face. Perhaps there is a statue in the image and the model falsely identifies it as a face.

The label says there is a face, and the model finds a face.

What are the features of low data quality?

Duplicated data

Partially correct.

Unreliable info

Partially correct.

Incomplete data

Partially correct.

All of the options.

Exploratory Data Analysis is majorly performed using the following methods:

Both Univariate and Bivariate

Univariate

Partially correct.

Bivariate

Partially correct.

None of the options

Which of the following is not a component of Exploratory Data Analysis?

Statistical Analysis and Clustering

Accounting and Summarizing

Anomaly Detection

Hyperparameter tuning

What are the objectives of exploratory data analysis?

Uncover a parsimonious model, one which explains the data with a minimum number of predictor variables.

Partially correct.

Check for missing data and other mistakes.

Partially correct.

Gain maximum insight into the data set and its underlying structure.

Partially correct.

All of the options.

Which of the following are categories of data quality tools?

Both ‘Cleaning tools’ and ‘Monitoring tools’

Cleaning tools

Partially correct.

Monitoring tools

Partially correct.

None of the options

Why is regularization important in logistic regression?

Finds errors in the algorithm

Avoids overfitting

Keeps training time down by regulating the time allowed

Encourages the use of large weights

Which model would you use if your problem required a discrete number of values or classes?

Supervised Model

Regression Model

Unsupervised Model

Classification Model

What is the most essential metric a regression model uses?

Both ‘Mean squared error as their loss function’ & ‘Cross entropy’

Mean squared error as their loss function

Cross entropy

None of the options

Which of the following machine learning models have labels, or in other words, the correct answers to whatever it is that we want to learn to predict?

Reinforcement Model

Unsupervised Model

Supervised Model

None of the options

To predict the continuous value of our label, which of the following algorithms is used?

Unsupervised

Classification

Regression

None of the options

Which of the following are stages of the Machine Learning workflow that can be managed with Vertex AI?

Train an ML model on your data.

Partially correct.

Create a dataset and upload data.

Partially correct.

All of the options.

Deploy your trained model to an endpoint for serving predictions.

Partially correct.

What is the main benefit of using an automated Machine Learning workflow?

It makes the model run faster.

It makes the model perform better.

It reduces the time it takes to develop trained models and assess their performance.

It deploys the model into production.

What does the Feature Importance attribution in Vertex AI display?

How much each feature impacts the model, expressed as a ratio

How much each feature impacts the model, expressed as a percentage

How much each feature impacts the model, expressed as a decimal

How much each feature impacts the model, expressed as a ranked list

MAE, MAPE, RMSE, RMSLE and R2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?

Linear Regression Metrics

Forecasting Regression Metrics

Decision Trees Progression Metrics

Clustering Regression Metrics

For a user who can use SQL, has little Machine Learning experience and wants a ‘Low-Code’ solution, which Machine Learning framework should they use?

BigQuery ML

Scikit-Learn

Python

AutoML

If the business case is to predict fraud detection, which is the correct Objective to choose in Vertex AI?

Forecasting

Clustering

Segmentation

Regression/Classification

What is the default setting in AutoML Tables for the data split in model evaluation?

80% Training, 15% Validation, 5% Testing

70% Training, 20% Validation, 10% Testing

80% Training 10% Validation, 10% Testing

80% Training, 5% Validation, 15% Testing

If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?

Tabular

Image

Video

Text

Which of the following metrics can be used to find a suitable balance between precision and recall in a model?

ROC AUC

PR AUC

F1 Score

Log Loss

For Classification or Regression problems with decision trees, which of the following models is most relevant?

XGBoost

AutoML Tables

Wide and Deep NNs

Linear Regression

Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?

DNN Classifier (TensorFlow)

AutoML Tables

XGBoost

Logistic Regression

What are the 3 key steps for creating a Recommendation System with BigQuery ML?

Prepare training data in BigQuery, specify the model options in BigQuery ML, export the predictions to Google Analytics

Import training data to BigQuery, train a recommendation system with BigQuery ML, tune the hyperparameters

Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production

Prepare training data in BigQuery, select a recommendation system from BigQuery ML, deploy and test the model

Which of the following are advantages of BigQuery ML when compared to Python based ML frameworks?

All of the options.

BigQuery ML automates multiple steps in the ML workflow

Partially correct.

BigQuery ML custom models can be created without the use of multiple tools

Partially correct.

Moving and formatting large amounts of data takes longer with Python based models compared to model training in BigQuery

Partially correct.

Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?

Time Series Anomaly Detection

Recommendation - Matrix Factorization

Time Series Forecasting

K-Means Clustering

Which of the following loss functions is used for classification problems?

MSE

Both MSE & Cross entropy

Cross entropy

None of the options are correct.

Which of the following gradient descent methods is used to compute the entire dataset?

Batch gradient descent

Mini-batch gradient descent

Gradient descent

None of the options are correct.

Which of the following are benefits of Performance metrics over loss functions?

Performance metrics are easier to understand.

Partially correct.

Performance metrics are easier to understand and are directly connected to business goals.

Performance metrics are directly connected to business goals.

Partially correct.

None of the options are correct.

For the formula used to model the relationship i.e. y = mx + b, what does ‘m’ stand for?

It captures the amount of change we've observed in our label in response to a small change in our feature.

It refers to a bias term which can be used for regression and it captures the amount of change we've observed in our label in response to a small change in our feature.

It refers to a bias term which can be used for regression.

None of the options are correct.

What are the basic steps in an ML workflow (or process)?

Collect data

Partially correct.

Perform statistical analysis and initial visualization

Partially correct.

Check for anomalies, missing data and clean the data

Partially correct.

All of the options.

Which of the following allows you to split the dataset based upon a field in your data?

FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

BUCKETIZE, an open-source hashing algorithm that is implemented in BigQuery SQL.

ML_FEATURE FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

None of the options are correct.

Which of the following actions can you perform on your model when it is trained and validated?

You can write it once, and only once, against the independent test dataset.

You can write it once, and only once against the dependent test dataset.

You can write it multiple times against the independent test dataset.

You can write it multiple times against the dependent test dataset.

Which of the following allows you to create repeatable samples of your data?

Use the last few digits of a hash function on the field that you're using to split or bucketize your data.

Use the first few digits of a hash function on the field that you're using to split or bucketize your data.

Use the first few digits or the last few digits of a hash function on the field that you're using to split or bucketize your data.

None of the options are correct.

How do you decide when to stop training a model?

When your loss metrics start to increase

When your loss metrics start to decrease

When your loss metrics start to both increase and decrease

None of the options are correct

Which is the best way to assess the quality of a model?

Observing how well a model performs against a new dataset that it hasn't seen before.

Observing how well a model performs against an existing known dataset.

Observing how well a model performs against a new dataset that it hasn't seen before and observing how well a model performs against an existing known dataset.

None of the options are correct.

How does TensorFlow represent numeric computations?

Using a Directed Acyclic Graph (or DAG)

None of the options are correct

Both Using a Directed Acyclic Graph (or DAG) and Flow chart

Flow chart

Which are useful components when building custom Neural Network models?

tf.losses

Partially correct.

All of the options.

tf.optimizers

Partially correct.

tf.metrics

Partially correct.

Which API is used to build performant, complex input pipelines from simple, re-usable pieces that will feed your model's training or evaluation loops.

tf.Tensor

All of the options.

tf.device

tf.data.Dataset

What operations can be performed on tensors?

They can be reshaped

Partially correct.

None of the options are correct.

They can be both reshaped and sliced

They can be sliced

Partially correct.

Which of the following is true when we compute a loss gradient?

TensorFlow records all operations executed inside the context of a tf.GradientTape onto a tape.

Partially correct.

All of the options.

The computed gradient of a recorded computation will be used in reverse mode differentiation.

Partially correct.

It uses tape and the gradients associated with each recorded operation to compute the gradients.

Partially correct.

Which of the following statements is true of TensorFlow?

TensorFlow is a scalable and single-platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning.

TensorFlow is a scalable and multi platform programming interface for implementing and running machine learning algorithms, including convenience wrappers for deep learning.

Although able to run on other processing platforms, TensorFlow 2.0 is not yet able to run on Graphical Processing Units (or GPU's).

Although able to run on other processing platforms, TensorFlow 2.0 is not yet able to run on Tensor Processing Units (or TPU's).

What are distinct ways to create a dataset?

A data transformation constructs a dataset from one or more tf.data.Dataset objects.

Partially correct.

A data source constructs a Dataset from data stored in memory or in one or more files and a data transformation constructs a dataset from one or more tf.data.Dataset objects.

A data source constructs a Dataset from data stored in memory or in one or more files.

Partially correct.

None of the options are correct.

What is the use of tf.keras.layers.TextVectorization?

It turns continuous numerical features into bucket data with discrete ranges.

It turns raw strings into an encoded representation that can be read by an Embedding layer or Dense layer.

It performs feature-wise normalization of input features.

It turns string categorical values into encoded representations that can be read by an Embedding layer or Dense layer.

Which of the following is true about embedding?

Embedding is a handy adapter that allows a network to incorporate spores or categorical data.

Partially correct.

The number of embeddings is the hyperparameter to your machine learning model.

Partially correct.

An embedding is a weighted sum of the feature crossed values.

Partially correct.

All of the options.

Which is true regarding feature columns?

Feature columns describe how the model should use raw output data from your TPU's.

Feature columns describe how the model should use raw input data from your features dictionary.

Feature columns describe how the model should use raw output data from your features dictionary.

Feature columns describe how the model should use a graph to plot a line.

When should you avoid using the Keras function adapt()?

When using TextVectorization while training on a TPU pod

When using StringLookup while training on multiple machines via ParameterServerStrategy

When working with lookup layers with very large vocabularies

When working with lookup layers with very small vocabularies

Which of the following is a part of Keras preprocessing layers?

Image preprocessing

Partially correct.

Numerical features preprocessing

Partially correct.

Image data augmentation

Partially correct.

All of the options.

Which of the following layers is non-trainable?

Hashing

Partially correct.

Normalization

Partially correct.

Discretization

Partially correct.

StringLookup

Partially correct.

All of the options.

The original question confused adaptable layers with trainable layers.
"All of the options are correct." option was added.

References:

https://www.tensorflow.org/guide/keras/preprocessing_layers#the_adapt_method

Which of the following is not a part of Categorical features preprocessing?

tf.keras.layers.Hashing

tf.keras.layers.IntegerLookup

tf.keras.layers.CategoryEncoding

tf.keras.layers.Discretization

References:

https://www.tensorflow.org/guide/keras/preprocessing_layers#categorical_features_preprocessing

Select the correct statement regarding the Keras Functional API.

The Keras Functional API does not provide a more flexible way for defining models.

Unlike the Keras Sequential API, we do not have to provide the shape of the input to the model.

Unlike the Keras Sequential API, we have to provide the shape of the input to the model.

None of the options are correct.

The Keras Functional API can be characterized by having:

Multiple inputs and outputs and models with non-shared layers.

Multiple inputs and outputs and models with shared layers.

Single inputs and outputs and models with shared layers.

None of the options are correct.

What is the significance of the Fit method while training a Keras model?

Defines the validation steps

Defines the number of steps per epochs

Defines the number of epochs

Defines the batch size

The predict function in the tf.keras API returns what?

Both numpy array(s) of predictions & input_samples of predictions

Numpy array(s) of predictions

Input_samples of predictions

None of the options are correct.

During the training process, each additional layer in your network can successively reduce signal vs. noise. How can we fix this?

Use sigmoid or tanh activation functions.

Use non-saturating, linear activation functions.

Use non-saturating, nonlinear activation functions such as ReLUs.

None of the options are correct.

Non-linearity helps in training your model at a much faster rate and with more accuracy without the loss of your important information?

True

False

How does Adam (optimization algorithm) help in compiling the Keras model?

Both by updating network weights iteratively based on training data by diagonal rescaling of the gradients

By updating network weights iteratively based on training data

Partially correct.

By diagonal rescaling of the gradients

Partially correct.

None of the options are correct.

How does regularization help build generalizable models ?

By adding dropout layers to our neural networks and by using image processing APIS to find out accuracy

By adding dropout layers to our neural networks

By using image processing APIS to find out accuracy

None of the options are correct.

The L2 regularization provides which of the following?

It adds a sum of the squared parameter weights term to the loss function.

It subtracts a sum of the squared parameter weights term to the loss function.

It multiplies a sum of the squared parameter weights term to the loss function.

None of the options are correct.

When sending training jobs to Vertex AI, it is common to split most of the logic into a _________ and a ___________ file.

task.py, model.py

task.xml, model.xml

task.json, model.json

task.avro, model.avro

When you package up a TensorFlow model as a Python Package, what statement should every Python module contain in every folder?

model.py

an __init__.py

tmodel.json

tmodel.avro

References:

https://cloud.google.com/vertex-ai/docs/training/create-python-pre-built-container

To make your code compatible with Vertex AI, there are three basic steps that must be completed in a specific order. Choose the answer that best describes those steps.

First, upload data to Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI. Next, move code into a trainer Python package.

First, move code into a trainer Python package. Next, upload data to Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI.

First, download data from Google Cloud Storage. Then submit your training job with gcloud to train on Vertex AI. Next, move code into a trainer Python package.

First, upload data to Google Cloud Storage. Next, move code into a trainer Python package. Then submit your training job with gcloud to train on Vertex AI.

You can use either pre-built containers or custom containers to run training jobs. Both containers require you specify settings that Vertex AI needs to run your training code, including __________, ____________, and ________.

Source distribution name, job name, worker pool

Cloud storage bucket name, display-name, worker-pool-spec

Region, source distribution, custom URI

Region, display-name, worker-pool-spec

Which file is the entry point to your code that Vertex AI will start and contains details such as “how to parse command-line arguments and where to write model outputs?

model.py

task.py

tmodel.json

tmodel.avro

Where are the features registered?

Feature registry

Online Store

Feature Monitoring

Offline Store

Which of the following is an instance of an entity type?

Feature

Online Store

Featurestore

Entity

What is one definition of a feature in machine learning?

A value that you receive from a model as an output

A method of feature store

A place to store any data

A value that is passed as input to a model

Vertex AI Feature Store provides a centralized repository for organizing, storing, and serving ML features. Using a central featurestore, enables an organization to efficiently share, discover, and re-use ML features at scale, which can increase the velocity of developing and deploying new ML applications. What are the key challenges that Vertex AI Feature Store solves?

Mitigate data storage silos, which occurs when you might have built and managed separate solutions for storage and the consumption of feature values.

Detect drift, as a result of significant changes to your feature data distribution over time.

Partially correct.

All of the options.

Mitigate training-serving skew, which occurs when the feature data distribution that you use in production differs from the feature data distribution that was used to train your model.

Partially correct.

Which of the following is the process of importing feature values computed by your feature engineering jobs into a featurestore?

Feature store

Feature Monitoring

Feature ingestion

Feature serving

What are the two methods feature store offers for serving features?

Online serving and Offline serving

Batch serving and Online serving

Batch serving and Stream serving

Offline serving and Stream serving

In what form can raw data be used inside ML models?

None of the options are correct.

After turning your raw data into a useful feature matrix

After turning your raw data into a useful feature vectors

After turning your raw data into multidimensional vectors

Which of the following statements is true about preprocessing?

None of the options are correct.

Preprocessing without the context of Cloud ML allows you to do it at scale.

Preprocessing within the context of Cloud ML allows you to do it at scale.

Both options are correct.

A good feature has which of the following characteristics?

All of the options.

It should be known at prediction time.

Partially correct.

It should be related to the objective.

Partially correct.

It should be numeric with meaningful magnitude.

Partially correct.

Which of the following are the requirements to build an effective machine learning model?

All of the options.

It should find good features.

Partially correct.

It should scale to a large dataset.

Partially correct.

It should be able to preprocess with Vertex AI Platform.

Partially correct.

Which of the following statements is true?

None of the options are correct.

Different problems in the same domain may need different features.

Same problems in the same domain may need different features.

Different problems in different domains may need the same features.

Which of the following statements are true regarding the ML.BUCKETIZE function?

ML.BUCKETIZE is a pre-processing function that creates buckets by returning a STRING as the bucket name after numerical_expression is split into buckets by array_split_points..

Partially correct.

None of the options are correct.

Both options are correct.

It bucketizes a continuous numerical feature into a string feature with bucket names as the value.

Partially correct.

True or False:
Feature Engineering is often one of the most valuable tasks a data scientist can do to improve model performance, for three main reasons:
1. You can isolate and highlight key information, which helps your algorithms "focus" on what’s important.
2. You can bring in your own domain expertise.
3. Once you understand the "vocabulary" of feature engineering, you can bring in other people’s domain expertise.

True

False

What is one-hot encoding?

One-hot encoding is a process by which categorical variables are converted into a form that could be provided to neural networks to do a better job in prediction.

One-hot encoding is a process by which only the hottest numeric variable is retained for use by the neural network.

One-hot encoding is a process by which numeric variables are converted into a categorical form that could be provided to neural networks to do a better job in prediction.

One-hot encoding is a process by which numeric variables are converted into a form that could be provided to neural networks to do a better job in prediction.

What is a feature cross?

A feature cross is a synthetic feature formed by adding (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.

None of the options are correct.

A feature cross is a synthetic feature formed by dividing (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.

A feature cross is a synthetic feature formed by multiplying (crossing) two or more features. Crossing combinations of features can provide predictive abilities beyond what those features can provide individually.

Which of the following is true about Feature Cross?

It is a process of combining features into a single feature.

Partially correct.

None of the options are correct.

Both options are correct.

Feature Cross enables a model to learn separate weights for each combination of features.

Partially correct.

What do you use the tf.keras.layers.Discretization function for?

To compute the hash buckets needed to one-hot encode categorical values

None of the options are correct.

To discretize floating point values into a smaller number of categorical bins

To count the number of unique buckets the input values falls into

What is the significance of ML.FEATURE_CROSS?

ML.FEATURE_CROSS generates a STRUCT feature with all combinations of crossed categorical features except for 1-degree items.

None of the options are correct.

ML.FEATURE_CROSS generates a STRUCT feature with all combinations of crossed categorical features including 1-degree items.

ML.FEATURE_CROSS generates a STRUCT feature with few combinations of crossed categorical features except for 1-degree items.

Which of the following statements are true regarding the ML.EVALUATE function?

The ML.EVALUATE function can be used with linear regression, logistic regression, k-means, matrix factorization, and ARIMA-based time series models.

Partially correct.

All of the options.

You can use the ML.EVALUATE function to evaluate model metrics.

Partially correct.

The ML.EVALUATE function evaluates the predicted values against the actual data.

Partially correct.

True or False:
A ParDo acts on all items at once (like a Map in MapReduce).

True

False. A ParDo acts on one item at a time (like a Map in MapReduce)

To run a pipeline you need something called a ______________.

pipeline

runner

Apache Beam

executor

What is the purpose of a Cloud Dataflow connector? .apply(TextIO.write().to(“gs://…”));

Connectors allow you to authenticate your pipeline as specific users who may have greater access to datasets.

Connectors allow you to output the results of a pipeline to a specific data sink like Bigtable, Google Cloud Storage, flat file, BigQuery, and more.

Connectors allow you to chain multiple data-processing steps together automatically so they process in parallel.

Which of these accurately describes the relationship between Apache Beam and Cloud Dataflow?

Cloud Dataflow is the proprietary version of the Apache Beam API and the two are not compatible.

Cloud Dataflow is the API for data pipeline building in java or python and Apache Beam is the implementation and execution framework.

They are the same.

True or False:
The Filter method can be carried out in parallel and autoscaled by the execution framework:

True: Anything in Map or FlatMap can be parallelized by the Beam execution framework.

False: Anything in Map or FlatMap can be parallelized by the Beam execution framework.

Your development team is about to execute this code block. What is your team about to do?

We are preparing a staging area in Google Cloud Storage for the output of our Cloud Dataflow pipeline and will be submitting our BigQuery job with a later command.

We are compiling our Cloud Dataflow pipeline written in Java and are submitting it to the cloud for execution. Notice that we are calling mvn compile and passing in --runner=DataflowRunner.

We are compiling our Cloud Dataflow pipeline written in Python and are loading the outputs of the executed pipeline inside of Google Cloud Storage (gs://)

What is one key advantage of preprocessing your features using Apache Beam?

Apache Beam code is often harder to maintain and run at scale than BigQuery preprocessing pipelines.

The same code you use to preprocess features in training and evaluation can also be used in serving.

Apache Beam transformations are written in Standard SQL which is scalable and easy to author.

In the __________ layers, the lines are colored by the __________ of the connections between neurons. Blue shows a _________ weight, which means the network is using that _________ of the neuron as given. An orange line shows that the network is assigning a __________ weight.

Hidden, weights, positive, output, negative

Output, weights, negative, hidden, positive

Weights, hidden, negative, output, positive

Hidden, weights, negative, output, positive

True or False:
We can create many different kinds of feature crosses.
For example:
• [A X B]: a feature cross formed by multiplying the values of two features.
• [A x B x C x D x E]: a feature cross formed by multiplying the values of five features.
• [A x A]: a feature cross formed by squaring a single feature.

False

True

True or False:
In TensorFlow Playground, the data points (represented by small circles) are initially colored orange or blue, which correspond to zero and negative one.

False

The answer is positive one to negative one.

True

True or False:
In TensorFlow Playground, orange and blue are used throughout the visualization in slightly different ways, but in general orange shows negative values while blue shows positive values.

False

True

Why might you create an embedding of a feature cross?

To identify similar sets of inputs for clustering

Partially correct.

All of the options.

To reuse weights learned in one problem in another problem

Partially correct.

To create a lower-dimensional representation of the input space

Partially correct.

True or False:
In TensorFlow Playground, in the output layer, the dots are colored orange or blue depending on their original values. The background color shows what the network is predicting for a particular area. The intensity of the color shows how confident that prediction is.

False

True

What is Tensorflow Transform a hybrid of?

Apache Beam and TensorFlow

Both options are correct.

Dataflow and Tensorflow

None of the options are correct.

What does tf.Transform do during the training and serving phase?

Provides a TensorFlow graph for preprocessing

Provides a transformation polynomial to train the data

Provides computation over the entire dataset, including on both internal and external data sources

None of the options are correct.

True or False:
One of the goals of tf.Transform is to provide a TensorFlow graph for preprocessing that can be incorporated into the serving graph (and, optionally, the training graph).

True

False

The ______________ _______________ is the most important concept of tf.Transform. The ______________ _______________ is a logical description of a transformation of the dataset. The ______________ _______________ accepts and returns a dictionary of tensors, where a tensor means Tensor or 2D SparseTensor.

Preprocessing function

Preprocessing method

Preprocessing variable

If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?

Training operationalization

Training formalization

Experimentation & prototyping

Training implementation

What is the correct process that data scientists use to develop the models on an experimentation platform?

Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation

Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation

Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering

Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation

Which two activities are involved in ML development?

Experimentation and version control

Partially correct.

Training formalization and training operationalization

Partially correct.

Experimentation and training operationalization

Version control and training operationalization

Partially correct.

Which process covers algorithm selection, model training, hyperparameter tuning, and model evaluation in the Experimentation and Prototyping activity?

Model validation

Data exploration

Feature engineering

Model prototyping

Which of the following is correct for Online serving?

Online serving is for high-latency data retrieval of small batches of data for real-time processing.

Online serving is for high throughput and serving large volumes of data for offline processing.

Online serving is for low-latency data retrieval of small batches of data for real-time processing.

Online serving is for low throughput and serving large volumes of data for offline processing.

Which of the following is not a part of Google’s enterprise data management and governance tool?

Data Catalog

Dataplex

Feature Store

Analytics Catalog

Which Data processing option can be used for transforming large unstructured data in Google Cloud?

Dataflow

Beam proc

Hadoop proc

Apache prep

Which of the following statements is not a feature of Analytics Hub?

You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery.

Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.

Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.

There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber.

What does the Aggregation Values contain in any feature?

The min, zeros, and Std.dev values for each features

The min, median, and max values for each features

The min, median, and Std.dev values for each features

The Count, median, and max values for each features

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too large?

Training may take a long time.

A large learning rate value may result in the model learning a sub-optimal set of weights too fast or an unstable training process.

If the learning rate value is too large, then the model will converge.

The model will not train..

The learning rate is a hyperparameter that controls how much to change the model in response to the estimated error each time the model weights are updated. Choosing the learning rate is challenging. What can happen if the value is too small?

Training may take a long time.

Smaller learning rates require less training epochs given the smaller changes made to the weights each update.

If the learning rate value is too small, then the model will diverge.

The model will train more quickly.

Which of the following is true?

Larger batch sizes require smaller learning rates.

Smaller batch sizes require larger learning rates.

Smaller batch sizes require smaller learning rates.

Larger batch sizes require larger learning rates.

What is "data parallelism” in distributed training?

Run the same model & computation on every device, but train each of them using the same training samples.

Run different models & computation on a single device, but train each of them using different training samples.

Run different models & computation on every device, but train each of them using only one training sample.

Run the same model & computation on every device, but train each of them using different training samples.

Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases?

Model is more likely to overfit.

Partially correct.

All of the options.

Model will not figure out general relationships in the data.

Partially correct.

Model performance on a test set is going to be poor.

Partially correct.

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______

1.0 and 3.0.

< 0.0 and > 1.00.

0.0 and 1.0.

> 0.0 and < 1.00.

Which of the following can make a huge difference in model quality?

Increasing the training time.

Setting hyperparameters to their optimal values for a given dataset.

Decreasing the number of epochs.

Increasing the learning rate.

Which of the following algorithms is useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?

Manual Search

Bayesian Optimization

Random Search

Grid Search

Which of the following is a black-box optimization service?

Early stopping

Vertex Vizier

AutoML

Manual Search

Black box optimization algorithms find the best operating parameters for any system whose ______________?

number of iterations is limited to train a model for validation.

execution time is less.

performance can be measured as a function of adjustable parameters.

iterations to get to the optimal set of hyperparameter values are less.

Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________.

All of the options.

requires less iterations to get to the optimal set of hyperparameter values.

Partially correct.

limits the number of times a model needs to be trained for validation.

Partially correct.

enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores.

Partially correct.

Which statements are correct for serving predictions using Pre-built containers?

All of the options.

Vertex AI provides Docker container images that you run as pre-built containers for serving predictions.

Pre-built containers provide HTTP prediction servers that you can use to serve prediction using minimal configurations.

Pre-built containers are organized by Machine learning framework and framework version.

Which of the following statements is invalid for a data source file in batch prediction?

You must use a regional BigQuery dataset.

The first line of the data source CSV file must contain the name of the columns.

If the Cloud Storage bucket is in a different project than where you use Vertex AI, you must provide the Storage Object Creator role to the Vertex AI service account in that project.

BigQuery data source tables must be no larger than 100 GB.

What are the features of Vertex AI model monitoring?

All of the options.

Drift in data quality

Partially correct.

Skew in training vs. serving data

Partially correct.

Feature Attribution and UI visualizations

Partially correct.

For which, the baseline is the statistical distribution of the feature's values seen in production in the recent past.

Skew detection

Categorical features

Numerical features

Drift detection

Which statement is correct regarding the maximum size for a CSV file during batch prediction?

Each data source file must include multiple files, up to a maximum amount of 50 GB.

The data source file must be no larger than 100 GB.

Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.

The data source file must be no larger than 50 GB. You can not include multiple files.

What should be done if the source table is in a different project?

You should provide the BigQuery Data Viewer role to the Vertex AI service account in your project.

You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

You should provide the BigQuery Data Viewer role to the Vertex AI service account in that project.

You should provide the BigQuery Data Editor role to the Vertex AI service account in your project.

How can you define the pipeline's workflow as a graph?

By using the outputs of a component as an input of another component

Use the previous pipeline's output as an input for the current pipeline.

By using predictive input for each component.

By using different inputs for each component.

What can you use to compile the pipeline?

kfp.Compiler

kfp.v2.compiler

kfp.v2.compiler.Compiler

compiler.Compiler

Which package is used to define and interact with pipelines and components?

kfp.components

kfp.compiler

kfp.containers

kfp.dsl package

What can you use to create a pipeline run on Vertex AI Pipelines?

kfp.v2.compiler.Compiler

Pipeline root path

Service account

Vertex AI python client

Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?

A Google Cloud Storage bucket that acts as an output for both AutoML, custom training jobs, serialized training jobs.

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.

A Google Cloud database that acts as an output for both AutoML and custom training jobs.

A Google Cloud database that acts as an input for both AutoML and custom training jobs.

When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature.

Transmutation

Translation

Duplication

Transformation

Match the three types of data ingest with an appropriate source of training data.

Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage)

You wouldn't ingest streaming data from BigQuery, although you could stream to it. Pub/Sub is a poor place to store your batch data, although you might use it to replay events.

Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine)

These are just made up terms.

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

On Google Cloud, the three types of data ingestion map to three different products. If you are ingesting streaming data, use Pub/Sub. If you are ingesting structured data directly into your ML model, use BigQuery, and if you are transforming data from training so that you can train on it later, read from Cloud Storage.

Which type of training do you use if your data set doesn’t change over time?

Dynamic training

Real-time training

Static training

Online training

Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?

Request-response logging

Container logging

Access logging

Cloud logging

What is the responsibility of model evaluation and validation components?

To ensure that the models are not good before moving them into a staging environment.

To ensure that the models are good after moving them into a production/staging environment.

To ensure that the models are good before moving them into a production/staging environment.

To ensure that the models are not good after moving them into a staging environment.

In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type.

False

True

What percent of system code does the ML model account for?

25%

50%

5%

90%

Which of the following tools help software users manage dependency issues?

Monolithic programs

Polylithic programs

Modular programs

Maven, Gradle, and Pip

Which component identifies anomalies in training and serving data and can automatically create a schema by examining the data?

Data validation

Data identifier

Data ingestion

Data transform

Which of the following models are susceptible to a feedback loop? Check all that apply.

A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).

Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.

A house's location, size, or number of bedrooms cannot be quickly changed in response to price forecasts, which makes a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be analyzed.

A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.

There is no feedback loop here, because model predictions don't have any impact on the photo database. However, versioning of the input data is a concern here, because these monthly updates could potentially have unforeseen effects on the model.

An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.

If the model does not publish its forecast until after the polls have closed, its predictions cannot affect voter behavior.

A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.

A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).

The model's rankings may drive additional interest to top-rated schools, increasing the number of applications they receive. If these schools continue to admit the same number of students, selectivity will increase (the percentage of students admitted will go down). This will boost these schools' rankings, which will further increase prospective student interest, and so on…

What is training skew caused by?

The Cloud Storage you load your data from in the training environment is physically closer than the Cloud Storage you load your data from in the production environment.

The distance of where the data is stored to the processing device does not impact prediction performance.

Starting and stopping of the processing when training the model.

Starting and stopping the processing makes no difference to the training.

Your development and production environments are different, or different code is used in the training environment than in the development environment.

Different versions may cause predictions to be significantly slower or consume more memory in the training environment than in the development environment. Different code may result in different performance.

The prediction environment is slower than the training environment.

Training may take longer in development than in production, but the training is the same.

Suppose you are building an ML-based system to predict the likelihood that a customer will leave a positive review. The user interface that customers leave reviews on changed a few months ago, but you don't know about this. Which of these is a potential consequence of mismanaging this data dependency?

Change in ability of model to be part of a streaming ingest

Your model structure doesn't change just because it's easier or harder to leave reviews.

Losses in prediction quality

For example, a review might be easier to write, and so your prediction of whether someone will leave a review (whether good or bad) is too low because it was trained on reviews that resulted from the older, harder-to-use user interface

Change in model serving signature

Your model structure doesn't change just because it's easier or harder to leave reviews.

Gradual drift is used for which of the following?

An old concept that incrementally changes to a new concept over a period of time

An old concept that may reoccur after some time

A new concept that occurs within a short time

A new concept that rapidly replaces an old one over a short period of time

What is the shift in the actual relationship between the model inputs and the output called?

Prediction drift

Label drift

Data drift

Concept drift

If each of your examples is large in terms of size and requires parsing, and your model is relatively simple and shallow, your model is likely to be:

I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.

Your ML training will be I/O bound if the number of inputs is large or heterogeneous (requires parsing) or if the model is so small that the compute requirements are trivial. This also tends to be the case if the input data is on a storage system with low throughput. If you are I/O bound, look at storing the data more efficiently, storing the data on a storage system with higher throughput, or parallelizing the reads. Although it is not ideal, you might consider reducing the batch size so that you are reading less data in each step.

CPU-bound, so you should use GPUs or TPUs.

This doesn't sound like computational power is your limiting factor.

Latency-bound, so you should use faster hardware

Review I/O-bound, CPU-bound and memory-bound models.

Which of the following indicates that ML training is CPU bound?

If you are running a model on accelerated hardware.

If I/O is complex, but the model involves lots of complex/expensive computations.

If you are running a model on powered hardware.

If I/O is simple, but the model involves lots of complex/expensive computations.

What does high-performance machine learning determine?

Training a model

Time taken to train a model

Reliability of a model

Deploying a model

For the fastest I/O performance in TensorFlow…

Prefetch the data

dataset.prefetch decouples the time data is produced from the time it is consumed. It prefetches the data into a buffer in parallel with the training step. This means that we have input data for the next training step before the current one is completed.

Read TF records into your model.

dataset tf.data.TFRecordDataset(...) TF Records are set for fast, efficient, batch reads, without the overhead of having to parse the data in Python.

Read in parallel threads.

dataset tf.data.TFRecordDataset(files, num_parallel_reads40) When you're dealing with a large dataset sharded across Cloud Storage, you can speed up by reading multiple files in parallel to increase the effective throughput. You can use this feature with a single option to the TFRecordDataset constructor called num_parallel_reads.

Optimize TensorFlow performance using the Profiler.

Which of the following determines the correct property of Tensorflow Lite?

Increased code footprint

Lower precision arithmetic

Higher precision arithmetic

Quantization

To copy the input data into TensorFlow, which of the following syntaxes is correct?

inferenceInterface.feed(floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize; inputSize, 3);

A key principle behind Kubeflow is portability so that you can:

Move your model from on-premises to Google Cloud.

Portability is at the container level, and you can move to any environment that offers Kubernetes.

Convert your model from CUDA to XLA.

Migrate your model from TensorFlow to PyTorch.

Which of these are reasons that you may not be able to perform machine learning solely on Google Cloud? Check all that apply.

You are tied to on-premises or multi-cloud infrastructure due to business reasons.

TensorFlow is not supported on Google Cloud.

of course Google Cloud supports TensorFlow.

You need to run inference on the edge.

How does OCR (optical character recognition) transform images into an electronic form?

OCR analyzes the color of the letters and numbers to turn the scanned image into text.

OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

OCR uses a magnetic ink for the letters and numbers to turn the scanned image into text.

OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text.

OCR examines the text of a document and translates the characters into code that can be used for data processing.

OCR analyzes the size of the letters and numbers to turn the scanned image into text.

OCR analyzes the patterns of light and dark that make up the letters and numbers to turn the scanned image into text. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Which pre-built ML API is used for language translations?

Natural Language Processing API

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Speech API

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Vision API

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Translation API

Translation API is built on parallel texts from language translations. It translates texts into more than one hundred languages.

What are the possible consequences for an ML model being trained with high resolution photos with high color depth?

It will increase the input size for an ML model but will reduce the training time.

If the ML model is trained with high resolution photos with high color depth, the input size will increase with longer training time for an ML model. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

Performance issues such as insufficient computing power will not occur.

If the ML model is trained with high resolution photos with high color depth, performance issues such as insufficient computing power will occur. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

It will increase the input size with longer training time for an ML model.

If the ML model is trained with high resolution photos with high color depth, there will be performance issues and an increase in input size with longer training time for the ML model.

It may lead to performance issues like insufficient computing power.

If the ML model is trained with high resolution photos with high color depth, there will be performance issues and an increase in input size with longer training time for the ML model.

How does instance segmentation help in classifying the images?

It identifies which objects are present in an image by outputting the class labels and class probabilities of objects present in that image.

Identifying which objects are present in an image by inputting the class labels and class probabilities of the objects present in that image is performed by object recognition. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

It partitions an image into multiple regions and segments all pixels in the image into different categories. Then it labels each pixel in the image, including the background and different colors.

Partitioning an image into multiple regions and segmenting all pixels in the image into different categories is done by semantic segmentation. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

It identifies the boundaries of an object and labels pixels with different colors.

Instance segmentation identifies the boundaries of an object and labels pixels with different colors. The exact outline of the object within an image is provided by image segmentation.

It assigns a class label to an image and creates a bounding box around a single object in an image.

Creation of a bounding box around a single object in an image is used in image classification with localization. Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

What does the Vision API do?

It only extracts the edges from an image by identifying the boundaries of objects within an image.

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

It compares the features of images, which may be different in orientation, perspective, lighting, size, and color.

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

It assigns labels to images and quickly classifies them into millions of predefined categories. It detects objects and faces, reads printed and handwritten text, and builds valuable metadata into the image catalog.

The API identifies labels within a video instead of images.

Review the module “Introduction to Computer Vision and Pre-built ML Models with Vision API”.

What method do you use to create and train a model with minimal technical effort to quickly prototype models and explore new datasets before investing in development?

Managed dataset

Managed dataset manages your datasets with training applications and models. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Endpoints

Endpoints promises to improve privacy and reduce latency for online prediction tasks by eliminating the need for data to go through any public networks before making it back into VPCs. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

AutoML

AutoML lets you create and train a model with minimal technical effort. Even if you want the flexibility of a custom training application, you can use AutoML to quickly prototype models and explore new datasets before investing in development.

Unmanaged dataset

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Which AutoML model type analyzes your video data and returns a list of shots and segments where objects are detected?

Image classification model

An image classification model analyzes image data and returns a list of content categories that apply to the image. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Video object tracking model

A video object tracking model analyzes video data and returns a list of shots and segments where certain objects were detected. For example, if it analyzes video data from a soccer game, it can identify and track the ball.

Video classification model

A video classification model analyzes video data to classify shots and segments or detect and track multiple objects. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Video action recognition model

A video action recognition model analyzes video data and returns a list of categorized actions with the moments the actions occurred. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

What is true about batch prediction?

Batch prediction is optimized to minimize the latency of serving predictions.

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Predictions returned in the response message.

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Batch prediction is a synchronous, or real-time, prediction, which means that it quickly returns a prediction.

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Batch prediction is useful for making several prediction requests at the same time and is optimized to handle a high volume of instances in a job.

Requesting a batch prediction is an asynchronous request, which means that the model waits until it processes all of the prediction requests before returning a response in JSON files in Cloud Storage buckets.

What prediction method do you use for synchronous or real-time prediction that quickly returns a prediction but only accepts one prediction request per API call?

Online prediction

Vertex AI online prediction is optimized to run your data through hosted models with as little latency as possible.

AutoML prediction

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Batch prediction

Batch prediction is useful for making several prediction requests at the same time. Requesting a batch prediction is an asynchronous request. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Online and batch prediction

Review the module “Vertex AI and AutoML Vision on Vertex AI”.

What does Vertex AI offer to achieve your ML goals?

Fast experimentation, accelerated deployment, and simplified model management

Fast experimentation, decelerated deployment, and simplified model management

Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Slow experimentation, accelerated deployment, and simplified model management

Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

Slow experimentation, accelerated training, and simplified model management

Vertex AI offers fast experimentation, accelerated deployment, and simplified model management to achieve your ML goals. Review the module “Vertex AI and AutoML Vision on Vertex AI”.

When is the dropout technique used?

Dropout is a technique used to prevent a model from underfitting.

Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

Dropout is a technique used to prevent a model from overfitting.

Dropout is a regularization technique that prevents neural networks from overfitting. During training, dropout randomly discards a portion of the neurons to avoid overfitting.

Dropout is a technique used to remove a small percentage of weights at each iteration. So weights will never be equal to zero.

Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

Dropout is a feature added between the layers of the neural network, and it continuously takes the output from the previous layer and normalizes it before sending it to the next layer.

Dropout is a technique used to prevent a model from overfitting. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

How does the batch normalization work?

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 0.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1.

Batch normalization normalizes the outputs using a mean equal to 0 and a standard deviation equal to 1 (μ0,σ1).

Batch normalization applies a transformation that maintains the mean output close to 1 and the output standard deviation close to 0.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

Batch normalization applies a transformation that maintains the mean output close to 1 and the output standard deviation close to 1.

Batch normalization applies a transformation that maintains the mean output close to 0 and the output standard deviation close to 1. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

When can a sequential model be used?

A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and multiple output tensors.

A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

A sequential model is appropriate for a plain stack of layers where each layer has multiple input tensors and one output tensor.

A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

A sequential model is appropriate for a plain stack of layers where each layer has multiple input tensors and multiple output tensors.

A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

A sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor.

A sequential model is not appropriate when the model has multiple inputs or multiple outputs.

What function can be used for a model to do prediction?

model.compile()

The model.compile() function configures the model with losses and metrics. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

model.fit()

The model.fit() function measures how well a machine learning model generalizes to data similar to the data it was trained on. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

model.evaluate()

The model.evaluate() function predicts the output for the given input and then computes the metrics function specified in the model. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

model.predict()

The model.predict() function is used for a model to do prediction.

What does the loss function do?

The loss function computes the updated model based on the data being observed.

The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

The loss function groups layers into an object with training and inference features.

The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

The loss function measures how accurate the model is during training.

The loss function is a measure of how accurately your prediction model predicts the expected outcome (or value).

The loss function computes the average value of the cost function over all the training samples.

The loss function measures how accurate the model is during training. Review the module “Custom Training with Linear, Neural Network and Deep Neural Network models”.

What kind of padding methods are available in Keras?

Same padding and valid padding

Keras has two padding methods available: one is called “same” and the other is called “valid.” In general, when small, square, and odd-numbered sizes are used for the kernels, the differences are not very meaningful. Also, Keras provides built-in support for padding.

Casual padding and valid padding

Please refer to the module “Convolutional Neural Networks”.

Casual padding

Please refer to the module “Convolutional Neural Networks”.

Same padding and casual padding

Please refer to the module “Convolutional Neural Networks”.

What does the max-pooling operation do in a convolutional neural network?

It returns the maximum value out of all the input data values passed to a kernel.

A pooling layer that relies on max-pooling does not require any weights, because the operation only cares about the largest of the input values evaluated by the kernel. This means that during training none of the parameters of the pooling layer need to change.

It returns the average value out of all the input data values passed to a kernel.

Please refer to the module “Convolutional Neural Networks”.

It returns the minimum value out of all the input data values passed to a kernel.

Please refer to the module “Convolutional Neural Networks”.

It calculates the ratio for each patch of the feature map.

Please refer to the module “Convolutional Neural Networks”.

Which factor does not affect the accuracy of the deep neural network, or DNN?

Inadequate data

Please refer to the module “Convolutional Neural Networks”.

Transfer function

Please refer to the module “Convolutional Neural Networks”.

Network architecture

Please refer to the module “Convolutional Neural Networks”.

Pixel randomization

The accuracy of the deep neural network, or DNN, is not affected by the pixel randomization because the data is not structured hierarchically.

It doesn't matter which neuron is trained to process which input values in a dense layer.
If you randomly reshuffle the order of the pixels in the images, the classification performance stays the same, because the corresponding weights are also reshuffled.
However, when human beings look at an image where the pixels are randomly reshuffled, the image looks like noise.
This phenomenon happens because the concept of hierarchy plays a significant role in the human brain.
Information is stored in sequence of patterns, in sequential order.
Similarly, you can expect the CNNs to perform poorly in contrast to DNN models if the image pixels are randomly permitted.
This is because hierarchy, or how pixels are placed next to each other, is a vital part of the CNN model design.

References:

https://youtu.be/4pcqScI1jhA?t=206

What is true about strides?

Stride refers to the number of pixels by which the input matrix slides over the filter matrix.

Please refer to the module “Convolutional Neural Networks”.

Larger strides will produce a larger feature map.

Please refer to the module “Convolutional Neural Networks”.

Strides are the size of the step by which the filter slides across the input image.

Using a larger step will skip input pixels and produce fewer output values.

Using a stride with a value greater than 1 will reduce the shape produced by the convolutional layer.

The size of the output will be divided along every dimension by the size of the stride step.

What is a kernel in a convolutional neural network?

Kernels are the size of the step by which the filter slides across the input image.

Please refer to the module “Convolutional Neural Networks”.

Kernels are the building blocks of CNNs because they are used to extract the right and relevant features from the input data using the convolution operation.

A kernel is only a filter that is used to extract the features from the images.

A kernel is a parameter that depends on the number of channels in the input image.

Please refer to the module “Convolutional Neural Networks”.

A kernel is the area of an image in which a convolutional neural network processes.

Please refer to the module “Convolutional Neural Networks”.

What is convolution?

Convolution is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data for which the true values are known.

Please refer to the module “Convolutional Neural Networks”.

Convolution is a function that is used to reduce the spatial size of a representation and increase the number of parameters and amount of computation in a network.

Please refer to the module “Convolutional Neural Networks”.

Convolution is a parameter that helps to maintain the same size across the input and the output.

Please refer to the module “Convolutional Neural Networks”.

Convolution is the process of "sliding" a kernel across an image.

A convolution is the mathematical combination of two functions to produce a third function.

Which CNN model parameter helps to maintain the same size across the input and the output of the convolutional layer?

Kernel size

The kernel size is the size or dimension of each filter and can be a single number, like 3 for a 3x3 filter, or a pair like (3,5) for a rectangular 3x5 filter. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.

Input channels

The input channels parameter depends on the number of channels in the input image. For example, for an input image of 256x256x3, the input channels are 3. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.

Padding

Padding extends the area of an image in which a convolutional neural network processes. The approach adds a border around the input values in the original image. Therefore, it helps you maintain the same size across the input and the output of the convolutional layer.

Strides

Strides are the size of the step by which the filter slides across the input image. The default step size is 1 pixel in both directions. Using a larger step will skip input pixels and produce fewer output values. It doesn't help to maintain the same size across the input and the output of the convolutional layer. Please refer to the module “Convolutional Neural Networks”.

How many learnable parameters does a pooling layer have?

One

Please refer to the module “Convolutional Neural Networks”.

Zero

The pooling layer doesn't have learnable parameters because it only calculates a specific number. Thus, the number of parameters in this layer is zero.

Four

Please refer to the module “Convolutional Neural Networks”.

Two

Please refer to the module “Convolutional Neural Networks”.

What is data augmentation?

Data augmentation is the amount of pixels added to an image when it is being processed by the kernel of a CNN.

Please refer to the module “Dealing with Image Data”.

Data augmentation is a set of techniques that enhance the size and quality of training datasets with the goal of creating more accurate ML models that generalize better.

Data augmentation improves the model's resilience and accuracy by creating more data.

Data augmentation is a technique where randomly selected neurons are ignored during training.

Please refer to the module “Dealing with Image Data”.

Data augmentation is the grouping together of resources for the purposes of maximizing advantage or minimizing risk to the users.

Please refer to the module “Dealing with Image Data”.

What is the proportion of the number of parameters in the entire network while computing?

A large number of parameters comes from the dense layers at the end, and the convolutional layers contain far fewer parameters.

To compute the number of parameters in a convolutional layer, multiply the number of parameters per filter by the number of filters and divide by the stride, and then finally add bias terms for each filter.

The number of parameters from dense layers and convolutional layers has the same proportion.

Please refer to the module “Dealing with Image Data”.

A small number of parameters comes from the dense layers at the end, and the convolutional layers also contain far fewer parameters.

Please refer to the module “Dealing with Image Data”.

A large number of parameters comes from the dense layers at the end, along with the convolutional layers.

Please refer to the module “Dealing with Image Data”.

What is negative transfer learning in computer vision?

When labeled data for a specific target task is scarce, the target performance is enhanced.

Please refer to the module “Dealing with Image Data”.

When labeled data for a specific target task is abundant, target performance is not degraded.

Please refer to the module “Dealing with Image Data”.

When knowledge is transferred from a less related source, the target performance is not degraded.

Please refer to the module “Dealing with Image Data”.

When knowledge is transferred from a less related source, the target performance might be degraded.

When labeled data is scarce for a specific target task, the target performance is affected.

How does transfer learning deal with the data scarcity problem?

Transfer learning boosts the need for data by initializing the parameters with better values.

Please refer to the module “Dealing with Image Data”.

Transfer learning transfers knowledge across tasks so, instead of creating more data, transfer learning decreases the need for data by initializing the parameters with better values.

Transfer learning uses knowledge acquired for one task to solve related tasks.

Transfer learning takes the pre-existing samples and changes them in some way to create new samples and also increase the number of training samples, and is typically used with image data.

Please refer to the module “Dealing with Image Data”.

Transfer learning increases the number of parameters to deal with data scarcity.

Please refer to the module “Dealing with Image Data”.

How does preprocessing help to improve the quality of the image?

Preprocessing increases unwanted noise and controls the quality of the image.

Please refer to the module “Dealing with Image Data”.

Preprocessing improves the image data by inducing missing values, noisy data, and other inconsistencies before executing it to the algorithm.

Please refer to the module “Dealing with Image Data”.

Preprocessing increases unwanted distortions and enhances the required features that are essential for the application.

Please refer to the module “Dealing with Image Data”.

Preprocessing suppresses unwanted distortions and enhances the required features that are essential for the application.

Before raw images can be fed into an image model, they usually have to be preprocessed. These preprocessing operations can include resizing, converting between color spaces, cropping, flipping, rotating and transposing for shape transformation or image adjustments, segmentation, and compression for quality enhancement.

What are the options to create a processor for Document AI?

Choose an existing processor created for general purposes.

Choose an existing processor created for a specialized task.

All of the options.

Create a custom processor and build it on your own.

What is NOT an application of NLP?

Text classification

Machine translation

Image recognition

Interactive conversation

What are the three major components that the Dialogflow API helps to identify in a conversation?

Intent (the topic), entity (the details), and context (the flow of the conversation).

Questions, answers, and feedback

End-user, Dialogflow, and fulfillment

Time, location, and participants

What are the three options provided by Google Cloud to develop an NLP project?

Pre-built APIs, AutoML, and custom training

BigQuery, Dataflow, and Looker

Dialogflow API, Contact Center AI, and Cloud Healthcare API

Dataflow, Dialogflow API, and Google Data Studio

What are the NLP tasks solved by AutoML?

Text classification

All of the options.

Entity extraction

Sentiment analysis

Vertex AI provides two solutions to build an NLP project. Which of the following is correct about these two solutions?

AutoML, which is a no-code solution, and custom training, which is a code-based solution

Document AI and the Dialogflow API

AutoML, which is a code-based solution, and custom training, which is a no-code solution

CCAI, which stands for Contact Center AI, and Document AI

What are the major stages of an end-to-end workflow to build an NLP project with Vertex AI?

Data preparation, model training, and model serving

Model deployment, model monitoring, and model serving

Dataset upload, feature engineering, and model training

Model training, model evaluation, and model deployment

Which of the following is NOT a major step of feature engineering in NLP?

Text representation

Tokenization

Preprocessing

Model testing

What is the difference between continuous bag-of-words (CBOW) and skip-gram, the two primary techniques of word2vec?

CBOW uses the next word to predict previous words, whereas skip-gram uses previous words to predict the next word.

CBOW uses a center word to predict surrounding words, whereas skip-gram uses surrounding words to predict a center word.

CBOW uses previous words to predict the next word, whereas skip-gram uses the next word to predict previous words.

CBOW uses surrounding words to predict a center word, whereas skip-gram uses a center word to predict surrounding words.

Which of the following is correct about one-hot encoding when you represent text with basic vectorization?

One-hot encoding divides a sentence to character-level.

One-hot encoding encodes the word to a vector where one corresponds to its position in the vocabulary and zeros to the rest.

One-hot encoding converts a sentence to a dense vector that retains the meaning of the sentence.

One-hot encoding encodes the word to the frequency it occurs in a sentence.

What are the benefits of using word embedding (such as word2vec) compared to basic vectorization (such as one-hot encoding) when you convert text to vectors?

All of the options.

Compared to basic vectorization, which converts text to sparse vectors, word embedding converts text to dense vectors.

You can use pre-trained word-embedding to represent text.

Compared to basic vectorization, which converts text to vectors without semantic meaning, word embeddings represent words in a vector space where the distance between them indicates semantic similarity and difference.

What are the major gates in a standard LSTM (long short-term memory) cell?

A standard LSTM cell includes two gates: the input gate to input information and the output gate to output information.

A standard LSTM cell includes two gates: the remember gate to remember relevant information and the forget gate to forget irrelevant information.

A standard LSTM cell includes three gates: the input gate to input information, the hidden gate to remember information, and the output gate to output information.

A standard LSTM cell includes three gates: the forget gate to forget irrelevant information, the input gate to remember relevant information, and the update gate to update new information.

What is the key feature to enable a “memory” of an RNN (recurrent neural network)?

An RNN uses a mechanism called hidden state to carry the previous information to the next learning iteration.

An RNN has a single lambda layer.

An RNN has multiple hidden layers.

An RNN has one hidden layer.

What is the coding in Keras to build the hidden layer of a GRU (gated recurrent unit) model?

gru_model = build_gru_model(embed_dim=EMBED_DIM)

GRU(units)

Lambda(lambda x: tf.reduce_mean(x, axis=1))

Dense(N_CLASSES, activation="softmax")

What is the major improvement of BERT (Bidirectional Encoder Representations) compared to transformers?

BERT considers the order of the words in a sentence, whereas a transformer doesn’t.

BERT doesn’t consider the order of the words in a sentence, whereas a transformer does.

BERT is a sequence-to-one model, whereas a transformer is a sequence-to-sequence model.

BERT is a sequence-to-sequence model, whereas a transformer is a sequence-to-one model.

Which of the following is correct about large language models?

Large in large language models refers to both huge training datasets and many parameters.

All of the options.

Transformers and BERT are examples of large language models.

Large language models can be pre-trained for general purpose and then fine-tuned for specific tasks

What is the problem that an encoder-decoder mainly solves?

Sequence-to-one problems such as email spam detection, where you use sequence of text to predict if an email is a spam

None of the above.

Sequence-to-sequence problems such as machine translation where you translate sentences to another language

One-to-sequence problems such as image captioning, where you generate a few sentences based on one image

What are some ways you can address the cold-start problem that can occur for new users of a collaborative filter recommendation system?

Rely on a content-based method instead for new users.

Content-based systems require that we either base our recommendations solely on the properties of the items, by looking for similar items for example, or that we have representations of our users in the same embedding space of our items. For a trivial example, by asking users which genres they prefer, we could make content-based recommendations using representations of items with genres as features.

Ask the user for some basic preferences.

With only a few preferences, we could classify users into different personas we've derived across our user-base and base our recommendations on the preferences of this entire group.

Give up and ask new users to make their own recommendations.

Power users may be willing to help but their capacity won't scale with your service.

Ask the new user's friends to recommend items they think would be relevant.

Our friends are not always the best product recommenders, as anyone who has ever gotten a weird gift knows. Additionally, it means your service would have little value to users who don't already have friends on the site, which might very likely be the case when the product launches in a new country.

Suppose you want to build a collaborative filter to suggest new hiking trails for users. The problem is you don't have any good explicit user ratings for trails. What feature might be useful for creating an implicit measure of a user's rating for a trail instead?

The length of the trail.

The length of the trail is constant across all users so it can't be a measure of an individual user's rating of a particular trail.

The number of times the user hiked that trail

The decision to hike a trail can be reasonably interpreted as an implicit measure of user preference.

The distance of the trail to the user's home.

Distance to the trail better captured using a knowledge-based recommendation system.

The number of times all users hiked that trail.

The number of times all users hiked a trail would be a useful measure of objective or consensus quality but it would not be able to capture an individual users' preference.

What are some potential techniques to determine how similar two items are?

Plot the two items in the embedding space and simply visually inspect to see how close they are on a graph by looking briefly.

Visual inspection is great for quick decisions but is not rigorous enough or scalable for anything formal.

Measure the cosine similarity between the two items in an embedding space.

Compare the norms (which are directionless) of the two items in an embedding space to see if they are similar.

Because norms are directionless scalars, two vectors can have identical norms and sit in completely different parts of the vector space.

Count how many features the two items have in common.

Compute the inner product between the two items in an embedding space.

When building a content-based recommender system, it's important to express both your items and users using the same embedding space (that is. the same dimensions and features).

True.

We calculate our recommendation using the product of the user embedding and the item embedding, so it's critical that the dimensions of these two embeddings are the same.

False

Think again about the computation we perform to generate predictions for a given user: we multiply a user-feature vector by an item-feature vector to get a prediction. In order for this multiplication to make sense, the shape of the vectors and the meaning of each dimension need to be the same.

ALS and WALS create embedding tables for both users and items. Because these are held in memory, it's important to plan for their size. How big would you expect the embedding table for the users to be?

Proportional to the number of users squared.

The embeddings table has a number of rows equal to the number of users and a number of columns equal to the embedding size, which is a hyper parameter. While the embedding size could theoretically be the same as the number of users, in practice, this number is never that big.

Proportional to k, the number of dimensions in your embedding space.

Proportional to the number of users.

Proportional to the number of users multiplied by the number of items.

The original sparse rating matrix has these dimensions, not the embeddings.

You want to create a hybrid recommendation system to suggest music for new users on your music streaming app that just launched. New users are asked to rate a few bands they like. You have reliable data for artist name, song name, album name, etc. Each song is labeled for genre at a coarse level (rock, pop, etc.). Which component of your recommendation system will likely perform the best?

The collaborative filtering component.

The performance of the collaborative filtering component is determined by the density of the ratings matrix. Because our app just launched, and because explicit feedback is so rare, our collaborative filtering recommendations will likely be poor.

The knowledge-based component.

Because the user has entered some basic preferences, and because we have reliable metadata for each song, we can recommend songs with the same metadata or allow users to find such songs on their own.

The content-based component.

The performance of the content-based component is determined by the quality of the representations of the content. In this case, we know very little about each song. Our genre labels are coarse and we have no representations of the raw audio itself.

In which of the following use cases is it recommended to go with a contextual bandit system?

Training two agents to cooperate with each other to win a multi-agent strategy game.

Forecasting demand for various products in a supermarket in a given time horizon.

Training a robot to walk.

Tailoring the results of a search engine to a specific user.

You would like to train an agent to drive a car. The action space consists of the following variables: the acceleration (between 0 and 300), the angular degree of turn or tilt (between 0 and 180 degrees), and the direction (either forward or reverse). Select the three algorithms which are appropriate.

Deep Q Networks

Proximal Policy Optimization (PPO)

Deep deterministic policy gradient (DDPG)

REINFORCE

Which of the following would make for suitable good value functions?

Scenario: You have a tennis video game. The reward is the negated value of the final score.

Scenario: You want to train an agent to win a race. The reward is the total time taken to run the race.

Scenario: You have a movie recommender system. The reward is the count of clicks.

Scenario: You want to train an agent to win a race. The reward is the negative value of the total time taken to run the race.

In which scenarios is reinforcement learning preferable over supervised learning?

When you have optimization or control problems with scarce data points and trial and error is impossible.

When you have predictive modeling problems with an offline static dataset.

When you have optimization or control problems where simulation trial and error is possible.

When you have predictive modeling with scarce data and a differentiable metric to be optimized.

Which of the following is not a motivating rationale to use replay buffers?

For achieving data efficiency.

For de-correlating experience trajectories.

For repeating rare experiences.

To keep the model policy well aligned with the newest experience

Which of the following steps is part of continuous integration and delivery (CI/CD) but not continuous training (CT)?

Measuring the model

Retraining the model

Building the model

Monitoring the model

Which of the following characteristics of delivering an ML model is considered as a characteristic of maturity level 0?

Manual, script-driven, and interactive process

Feature store integration

Pipeline continuous integration

Source control automation

What is the process of monitoring, measuring, retraining, and serving ML models automatically and continuously to adapt to changes in the data before they’re redeployed?

Continuous training

Continuous deployment

Continuous integration

Continuous delivery

What is the important aspect of MLOps which differs from DevOps?

MLOps constantly monitors, retrains, and serves the model.

MLOps focuses on a single software package or service.

MLOps deploys code and moves to another task.

MLOps tests and validates only the code and components.

What is the MLOps life cycle iterative process that retrains your production models with the new data?

Predictive serving

Continuous delivery

Continuous training

ML development

What component of an ML pipeline is responsible for deploying the model to any edge devices?

Analyze and transform

Upload model and deploy endpoint

Upload and track

Evaluate

How does end-to-end MLOps help ML practitioners with the machine learning life cycle?

End-to-end MLOps helps ML practitioners efficiently and responsibly manage, monitor, govern, and explain ML projects throughout the entire development lifecycle.

End-to-end MLOps lets ML practitioners only perform exploratory data analysis (EDA) and prototyping.

End-to-end MLOPs lets ML practitioners only monitor ML models.

End-to-end MLOPs lets ML practitioners only train and tune ML models.

Suppose you want to develop a supervised machine learning model to predict whether a given email is "spam" or "not spam." Which of the following statements are true?

Emails not marked as "spam" or "not spam" are unlabeled examples.

Because our label consists of the values "spam" and "not spam", any email not yet marked as spam or not spam is an unlabeled example.

Words in the subject header will make good labels.

Words in the subject header might make excellent features, but they won't make good labels.

We'll use unlabeled examples to train the model.

We'll use labeled examples to train the model. We can then run the trained model against unlabeled examples to infer whether the unlabeled email messages are spam or not spam.

The labels applied to some examples might be unreliable.

Definitely. It's important to check how reliable your data is. The labels for this dataset probably come from email users who mark particular email messages as spam. Since most users do not mark every suspicious email message as spam, we may have trouble knowing whether an email is spam. Furthermore, spammers could intentionally poison our model by providing faulty labels.

References:

https://developers.google.com/machine-learning/crash-course/framing/check-your-understanding

Suppose an online shoe store wants to create a supervised ML model that will provide personalized shoe recommendations to users. That is, the model will recommend certain pairs of shoes to Marty and different pairs of shoes to Janet. The system will use past user behavior data to generate training data. Which of the following statements are true?

"Shoe size" is a useful feature.

"Shoe size" is a quantifiable signal that likely has a strong impact on whether the user will like the recommended shoes. For example, if Marty wears size 9, the model shouldn't recommend size 7 shoes.

"Shoe beauty" is a useful feature.

Good features are concrete and quantifiable. Beauty is too vague a concept to serve as a useful feature. Beauty is probably a blend of certain concrete features, such as style and color. Style and color would each be better features than beauty.

"The user clicked on the shoe's description" is a useful label.

Users probably only want to read more about those shoes that they like. Clicks by users is, therefore, an observable, quantifiable metric that could serve as a good training label. Since our training data derives from past user behavior, our labels need to derive from objective behaviors like clicks that strongly correlate with user preferences

"Shoes that a user adores" is a useful label.

Adoration is not an observable, quantifiable metric. The best we can do is search for observable proxy metrics for adoration.

References:

https://developers.google.com/machine-learning/crash-course/framing/check-your-understanding

A plot of 10 points. A line runs through 6 of the points. 2 points are 1

A plot of 10 points. A line runs through 8 of the points. 1 point is 2

Which of the two data sets shown in the preceding plots has the higher Mean Squared Error (MSE)?

The dataset on the left.

The six examples on the line incur a total loss of 0. The four examples not on the line are not very far off the line, so even squaring their offset still yields a low value: $$ MSE = \frac{0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 1^2 + 0^2 + 0^2} {10} = 0.4$$

The dataset on the right.

The eight examples on the line incur a total loss of 0. However, although only two points lay off the line, both of those points are twice as far off the line as the outlier points in the left figure. Squared loss amplifies those differences, so an offset of two incurs a loss four times as great as an offset of one. $$ MSE = \frac{0^2 + 0^2 + 0^2 + 2^2 + 0^2 + 0^2 + 0^2 + 2^2 + 0^2 + 0^2} {10} = 0.8$$

References:

https://developers.google.com/machine-learning/crash-course/descending-into-ml/check-your-understanding

When performing gradient descent on a large data set, which of the following batch sizes will likely be more efficient?

The full batch.

Computing the gradient from a full batch is inefficient. That is, the gradient can usually be computed far more efficiently (and just as accurately) from a smaller batch than from a vastly bigger full batch.

A small batch or even a batch of one example (SGD).

Amazingly enough, performing gradient descent on a small batch or even a batch of one example is usually more efficient than the full batch. After all, finding the gradient of one example is far cheaper than finding the gradient of millions of examples. To ensure a good representative sample, the algorithm scoops up another random small batch (or batch of one) on every iteration.

References:

https://developers.google.com/machine-learning/crash-course/reducing-loss/check-your-understanding

We looked at a process of using a test set and a training set to drive iterations of model development. On each iteration, we'd train on the training data and evaluate on the test data, using the evaluation results on test data to guide choices of and changes to various model hyperparameters like learning rate and features. Is there anything wrong with this approach?

Totally fine, we're training on training data and evaluating on separate, held-out test data.

Actually, there's a subtle issue here. Think about what might happen if we did many, many iterations of this form.

Doing many rounds of this procedure might cause us to implicitly fit to the peculiarities of our specific test set.

Yes indeed! The more often we evaluate on a given test set, the more we are at risk for implicitly overfitting to that one test set.

This is computationally inefficient. We should just pick a default set of hyperparameters and live with them to save resources.

Although these sorts of iterations are expensive, they are a critical part of model development. Hyperparameter settings can make an enormous difference in model quality, and we should always budget some amount of time and computational resources to ensure we're getting the best quality we can.

References:

https://developers.google.com/machine-learning/crash-course/validation/check-your-intuition

Different cities in California have markedly different housing prices. Suppose you must create a model to predict housing prices. Which of the following sets of features or feature crosses could learn city-specific relationships between roomsPerPerson and housing price?

Three separate binned features: [binned latitude], [binned longitude], [binned roomsPerPerson]

Binning is good because it enables the model to learn nonlinear relationships within a single feature. However, a city exists in more than one dimension, so learning city-specific relationships requires crossing latitude and longitude.

One feature cross: [latitude X longitude X roomsPerPerson]

In this example, crossing real-valued features is not a good idea. Crossing the real value of, say, latitude with roomsPerPerson enables a 10% change in one feature (say, latitude) to be equivalent to a 10% change in the other feature (say, roomsPerPerson).

One feature cross: [binned latitude X binned longitude X binned roomsPerPerson]

Crossing binned latitude with binned longitude enables the model to learn city-specific effects of roomsPerPerson. Binning prevents a change in latitude producing the same result as a change in longitude. Depending on the granularity of the bins, this feature cross could learn city-specific or neighborhood-specific or even block-specific effects.

Two feature crosses: [binned latitude X binned roomsPerPerson] and [binned longitude X binned roomsPerPerson]

Binning is a good idea; however, a city is the conjunction of latitude and longitude, so separate feature crosses prevent the model from learning city-specific prices.

References:

https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding

Imagine a linear model with 100 input features:

10 are highly informative.

90 are non-informative.

Assume that all features have values between -1 and 1. Which of the following statements are true?

L₂ regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.

Yes, L₂ regularization encourages weights to be near 0.0, but not exactly 0.0.

L₂ regularization will encourage most of the non-informative weights to be exactly 0.0.

L₂ regularization does not tend to force weights to exactly 0.0. L₂ regularization penalizes larger weights more than smaller weights. As a weight gets close to 0.0, L₂ "pushes" less forcefully toward 0.0.

L₂ regularization may cause the model to learn a moderate weight for some non-informative features.

Surprisingly, this can happen when a non-informative feature happens to be correlated with the label. In this case, the model incorrectly gives such non-informative features some of the "credit" that should have gone to informative features.

References:

https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/check-your-understanding

Imagine a linear model with two strongly correlated features; that is, these two features are nearly identical copies of one another but one feature contains a small amount of random noise. If we train this model with L₂ regularization, what will happen to the weights for these two features?

Both features will have roughly equal, moderate weights.

L₂ regularization will force the features towards roughly equivalent weights that are approximately half of what they would have been had only one of the two features been in the model.

One feature will have a large weight; the other will have a weight of almost 0.0.

L₂ regularization penalizes large weights more than small weights. So, even if one weight started to drop faster than the other, L₂ regularization would tend to force the bigger weight to drop more quickly than the smaller weight.

One feature will have a large weight; the other will have a weight of exactly 0.0.

L₂ regularization rarely forces weights to exactly 0.0. By contrast, L₁ regularization does force weights to exactly 0.0.

References:

https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/check-your-understanding

In which of the following scenarios would a high accuracy value suggest that the ML model is doing a good job?

A deadly, but curable, medical condition afflicts .01% of the population. An ML model uses symptoms as features and predicts this affliction with an accuracy of 99.99%.

Accuracy is a poor metric here. After all, even a "dumb" model that always predicts "not sick" would still be 99.99% accurate. Mistakenly predicting "not sick" for a person who actually is sick could be deadly.

An expensive robotic chicken crosses a very busy road a thousand times per day. An ML model evaluates traffic patterns and predicts when this chicken can safely cross the street with an accuracy of 99.99%.

A 99.99% accuracy value on a very busy road strongly suggests that the ML model is far better than chance. In some settings, however, the cost of making even a small number of mistakes is still too high. 99.99% accuracy means that the expensive chicken will need to be replaced, on average, every 10 days. (The chicken might also cause extensive damage to cars that it hits.)

In the game of roulette, a ball is dropped on a spinning wheel and eventually lands in one of 38 slots. Using visual features (the spin of the ball, the position of the wheel when the ball was dropped, the height of the ball over the wheel), an ML model can predict the slot that the ball will land in with an accuracy of 4%.

This ML model is making predictions far better than chance; a random guess would be correct 1/38 of the time—yielding an accuracy of 2.6%. Although the model's accuracy is "only" 4%, the benefits of success far outweigh the disadvantages of failure.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-accuracy-precision-recall

Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to precision?

Definitely increase.

Raising the classification threshold typically increases precision; however, precision is not guaranteed to increase monotonically as we raise the threshold.

Probably increase.

In general, raising the classification threshold reduces false positives, thus raising precision.

Probably decrease.

In general, raising the classification threshold reduces false positives, thus raising precision.

Definitely decrease.

In general, raising the classification threshold reduces false positives, thus raising precision.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-accuracy-precision-recall

Consider a classification model that separates email into two categories: "spam" or "not spam." If you raise the classification threshold, what will happen to recall?

Always increase.

Raising the classification threshold will cause both of the following:

The number of true positives will decrease or stay the same.
The number of false negatives will increase or stay the same.

Thus, recall will never increase.

Always decrease or stay the same.

Raising our classification threshold will cause the number of true positives to decrease or stay the same and will cause the number of false negatives to increase or stay the same. Thus, recall will either stay constant or decrease.

Always stay constant.

Raising our classification threshold will cause the number of true positives to decrease or stay the same and will cause the number of false negatives to increase or stay the same. Thus, recall will either stay constant or decrease.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-accuracy-precision-recall

Consider two models—A and B—that each evaluate the same dataset. Which one of the following statements is true?

If Model A has better precision than model B, then model A is better.

While better precision is good, it might be coming at the expense of a large reduction in recall. In general, we need to look at both precision and recall together, or summary metrics like AUC which we'll talk about next.

If model A has better recall than model B, then model A is better.

While better recall is good, it might be coming at the expense of a large reduction in precision. In general, we need to look at both precision and recall together, or summary metrics like AUC, which we'll talk about next.

If model A has better precision and better recall than model B, then model A is probably better.

In general, a model that outperforms another model on both precision and recall is likely the better model. Obviously, we'll need to make sure that comparison is being done at a precision / recall point that is useful in practice for this to be meaningful. For example, suppose our spam detection model needs to have at least 90% precision to be useful and avoid unnecessary false alarms. In this case, comparing one model at {20% precision, 99% recall} to another at {15% precision, 98% recall} is not particularly instructive, as neither model meets the 90% precision requirement. But with that caveat in mind, this is a good way to think about comparing models when using precision and recall.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-accuracy-precision-recall

Which of the following ROC curves produce AUC values greater than 0.5?

An ROC curve with a vertical line running from (0,0) to (0,1), and a horizontal from (0,1) to (1,1). The TP rate is 1.0 for all FP rates.

This is the best possible ROC curve, as it ranks all positives above all negatives. It has an AUC of 1.0.

In practice, if you have a "perfect" classifier with an AUC of 1.0, you should be suspicious, as it likely indicates a bug in your model. For example, you may have overfit to your training data, or the label data may be replicated in one of your features.

An ROC curve with a horizontal line running from (0,0) to (1,0), and a vertical line from (1,0) to (1,1). The FP rate is 1.0 for all TP rates

This is the worst possible ROC curve; it ranks all negatives above all positives, and has an AUC of 0.0. If you were to reverse every prediction (flip negatives to positives and positives to negatives), you'd actually have a perfect classifier!

An ROC curve with one diagonal line running from (0,0) to (1,1). TP and FP rates increase linearly at the same rate.

This ROC curve has an AUC of 0.5, meaning it ranks a random positive example higher than a random negative example 50% of the time. As such, the corresponding classification model is basically worthless, as its predictive ability is no better than random guessing.

An ROC curve that arcs up and right from (0,0) to (1,1). TP rate increases at a faster rate than FP rate.

This ROC curve has an AUC between 0.5 and 1.0, meaning it ranks a random positive example higher than a random negative example more than 50% of the time. Real-world binary classification AUC values generally fall into this range.

An ROC curve that arcs right and up from (0,0) to (1,1). FP rate increases at a faster rate than TP rate.

This ROC curve has an AUC between 0 and 0.5, meaning it ranks a random positive example higher than a random negative example less than 50% of the time. The corresponding model actually performs worse than random guessing! If you see an ROC curve like this, it likely indicates there's a bug in your data.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-roc-and-auc

How would multiplying all of the predictions from a given model by 2.0 (for example, if the model predicts 0.4, we multiply by 2.0 to get a prediction of 0.8) change the model's performance as measured by AUC?

No change. AUC only cares about relative prediction scores.

Yes, AUC is based on the relative predictions, so any transformation of the predictions that preserves the relative ranking has no effect on AUC. This is clearly not the case for other metrics such as squared error, log loss, or prediction bias.

It would make AUC terrible, since the prediction values are now way off.

Interestingly enough, even though the prediction values are different (and likely farther from the truth), multiplying them all by 2.0 would keep the relative ordering of prediction values the same. Since AUC only cares about relative rankings, it is not impacted by any simple scaling of the predictions.

It would make AUC better, because the prediction values are all farther apart.

The amount of spread between predictions does not actually impact AUC. Even a prediction score for a randomly drawn true positive is only a tiny epsilon greater than a randomly drawn negative, that will count that as a success contributing to the overall AUC score.

References:

https://developers.google.com/machine-learning/crash-course/classification/check-your-understanding-roc-and-auc

Imagine a linear model with 100 input features:

10 are highly informative.

90 are non-informative.

Assume that all features have values between -1 and 1. Which of the following statements are true?

L1 regularization will encourage many of the non-informative weights to be nearly (but not exactly) 0.0.

In general, L1 regularization of sufficient lambda tends to encourage non-informative features to weights of exactly 0.0. Unlike L2 regularization, L1 regularization "pushes" just as hard toward 0.0 no matter how far the weight is from 0.0.

L1 regularization will encourage most of the non-informative weights to be exactly 0.0.

L1 regularization of sufficient lambda tends to encourage non-informative weights to become exactly 0.0. By doing so, these non-informative features leave the model.

L1 regularization may cause informative features to get a weight of exactly 0.0.

Be careful--L1 regularization may cause the following kinds of features to be given weights of exactly 0:

Weakly informative features.

Strongly informative features on different scales.

Informative features strongly correlated with other similarly informative features.

References:

Imagine a linear model with 100 input features, all having values between -1 and 1:

10 are highly informative.

90 are non-informative.

Which type of regularization will produce the smaller model?

L₂ regularization.

L₂ regularization rarely reduces the number of features. In other words, L₂ regularization rarely reduces the model size.

L₁ regularization.

L₁ regularization tends to reduce the number of features. In other words, L₁ regularization often reduces the model size.

References:

https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/check-your-understanding

Which one of the following statements is true of dynamic (online) training?

The model stays up to date as new data arrives.

This is the primary benefit of online training—we can avoid many staleness issues by allowing the model to train on new data as it comes in.

Very little monitoring of training jobs needs to be done.

Actually, you must continuously monitor training jobs to ensure that they are healthy and working as intended. You'll also need supporting infrastructure like the ability to roll a model back to a previous snapshot in case something goes wrong in training, such as a buggy job or corruption in input data.

Very little monitoring of input data needs to be done at inference time.

Just like a static, offline model, it is also important to monitor the inputs to the dynamically updated models. We are likely not at risk for large seasonality effects, but sudden, large changes to inputs (such as an upstream data source going down) can still cause unreliable predictions.

References:

https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-training/check-your-understanding

Which of the following statements are true about static (offline) training?

The model stays up to date as new data arrives.

Actually, if we train offline, then the model has no way to incorporate new data as it arrives. This can lead to model staleness, if the distribution we are trying to learn from changes over time.

You can verify the model before applying it in production.

Yes, offline training gives ample opportunity to verify model performance before introducing the model in production.

Offline training requires less monitoring of training jobs than online training.

In general, monitoring requirements at training time are more modest for offline training, which insulates us from many production considerations. However, the more frequently you train your model, the higher the investment you'll need to make in monitoring. You'll also want to validate regularly to ensure that changes to your code (and its dependencies) don't adversely affect model quality.

Very little monitoring of input data needs to be done at inference time.

Counterintuitively, you do need to monitor input data at serving time. If the input distributions change, then our model's predictions may become unreliable. Imagine, for example, a model trained only on summertime clothing data suddenly being used to predict clothing buying behavior in wintertime.

References:

https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-training/check-your-understanding

In offline inference, we make predictions on a big batch of data all at once. We then put those predictions in a look-up table for later use. Which of the following are true of offline inference?

We must create predictions for all possible inputs.

Yes, we will have to make predictions for all possible inputs and store them into a cache or lookup table to use offline inference. This is one of the drawbacks of offline inference. We will only be able to serve a prediction for those examples that we already know about. This is fine if the set of things that we're predicting is limited, like all world cities or all items in a database table. But for freeform inputs like user queries that have a long tail of unusual or rare items, we would not be able to provide full coverage with an offline-inference system.

After generating the predictions, we can verify them before applying them.

This is indeed one useful thing about offline inference. We can sanity check and verify all of our predictions before they are used.

For a given input, we can serve a prediction more quickly than with online inference.

We will need to carefully monitor our input signals over a long period of time.

This is the one case where we don't actually need to monitor input signals over a long period of time. This is because once the predictions have been written to a look-up table, we're no longer dependent on the input features. Note that any subsequent update of the model will require a new round of input verification.

We will be able to react quickly to changes in the world.

No, this is a drawback of offline inference. We'll need to wait until a new set of predictions have been written to the look-up table before we can respond differently based on any changes in the world.

References:

https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-inference/check-your-understanding

Dynamic (online) inference means making predictions on demand. That is, in online inference, we put the trained model on a server and issue inference requests as needed. Which of the following are true of dynamic inference?

You can provide predictions for all possible items.

Yes, this is a strength of online inference. Any request that comes in will be given a score. Online inference handles long-tail distributions (those with many rare items), like the space of all possible sentences written in movie reviews.

You can do post-verification of predictions before they are used.

In general, it's not possible to do a post-verification of all predictions before they get used because predictions are being made on demand. You can, however, potentially monitor aggregate prediction qualities to provide some level of sanity checking, but these will signal fire alarms only after the fire has already spread.

You must carefully monitor input signals.

Yes. Signals could change suddenly due to upstream issues, harming our predictions.

When performing online inference, you do not need to worry about prediction latency (the lag time for returning predictions) as much as when performing offline inference.

Prediction latency is often a real concern in online inference. Unfortunately, you can't necessarily fix prediction latency issues by adding more inference servers.

References:

https://developers.google.com/machine-learning/crash-course/static-vs-dynamic-inference/check-your-understanding

Which of the following models are susceptible to a feedback loop?

A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.

A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).

Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

A university-ranking model that rates schools in part by their selectivity—the percentage of students who applied that were admitted.

An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.

If the model does not publish its forecast until after the polls have closed, it is not possible for its predictions to affect voter behavior.

A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.

It is not possible to quickly change a house's location, size, or number of bedrooms in response to price forecasts, making a feedback loop unlikely. However, there is potentially a correlation between size and number of bedrooms (larger homes are likely to have more rooms) that may need to be teased apart.

A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.

There is no feedback loop here, as model predictions don't have any impact on our photo database. However, versioning of our input data is a concern here, as these monthly updates could potentially have unforeseen effects on the model.

References:

https://developers.google.com/machine-learning/crash-course/data-dependencies/check-your-understanding

Which of the following model's predictions have been affected by selection bias?

A German handwriting recognition smartphone app uses a model that frequently incorrectly classifies ß (Eszett) characters as B characters, because it was trained on a corpus of American handwriting samples, mostly written in English.

This model was affected by a type of selection bias called coverage bias: the training data (American English handwriting) was not representative of the type of data provided by the model's target audience (German handwriting).

Engineers built a model to predict the likelihood of a person developing diabetes based on their daily food intake. The model was trained on 10,000 "food diaries" collected from a randomly chosen group of people worldwide representing a variety of different age groups, ethnic backgrounds, and genders. However, when the model was deployed, it had very poor accuracy. Engineers subsequently discovered that food diary participants were reluctant to admit the true volume of unhealthy foods they ate, and were more likely to document consumption of nutritious food than less healthy snacks.

There is no selection bias in this model; participants who provided training data were a representative sampling of users and were chosen randomly. Instead, this model was affected by reporting bias. Ingestion of unhealthy foods was reported at a much lower frequency than true real-world occurrence.

Engineers at a company developed a model to predict staff turnover rates (the percentage of employees quitting their jobs each year) based on data collected from a survey sent to all employees. After several years of use, engineers determined that the model underestimated turnover by more than 20%. When conducting exit interviews with employees leaving the company, they learned that more than 80% of people who were dissatisfied with their jobs chose not to complete the survey, compared to a company-wide opt-out rate of 15%.

This model was affected by a type of selection bias called non-response bias. People who were dissatisfied with their jobs were underrepresented in the training data set because they opted out of the company-wide survey at much higher rates than the entire employee population.

Engineers developing a movie-recommendation system hypothesized that people who like horror movies will also like science-fiction movies. When they trained a model on 50,000 users' watchlists, however, it showed no such correlation between preferences for horror and for sci-fi; instead it showed a strong correlation between preferences for horror and for documentaries. This seemed odd to them, so they retrained the model five more times using different hyperparameters. Their final trained model showed a 70% correlation between preferences for horror and for sci-fi, so they confidently released it into production.

There is no evidence of selection bias, but this model may have instead been affected by experimenter's bias, as the engineers kept iterating on their model until it confirmed their preexisting hypothesis.

References:

https://developers.google.com/machine-learning/crash-course/fairness/check-your-understanding

A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic"):

Adults

True Positives (TPs): 512	False Positives (FPs): 51
False Negatives (FNs): 36	True Negatives (TNs): 9401
$$\text{Precision} = \frac{TP}{TP+FP} = 0.909$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.934$$

Minors

True Positives (TPs): 2147	False Positives (FPs): 96
False Negatives (FNs): 2177	True Negatives (TNs): 5580
$$\text{Precision} = \frac{TP}{TP+FP} = 0.957$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.497$$

Which of the following statements about the model's test-set performance are true?

Overall, the model performs better on examples from adults than on examples from minors.

The model achieves both precision and recall rates over 90% when detecting sarcasm in text messages from adults.

While the model achieves a slightly higher precision rate for minors than adults, the recall rate is substantially lower for minors, resulting in less reliable predictions for this group.

The model fails to classify approximately 50% of minors' sarcastic messages as "sarcastic."

The recall rate of 0.497 for minors indicates that the model predicts "not sarcastic" for approximately 50% of minors' sarcastic texts.

Approximately 50% of messages sent by minors are classified as "sarcastic" incorrectly.

The precision rate of 0.957 indicates that over 95% of minors' messages classified as "sarcastic" are actually sarcastic.

The 10,000 messages sent by adults are a class-imbalanced dataset.

If we compare the number of messages from adults that are actually sarcastic (TP+FN = 548) with the number of messages that are actually not sarcastic (TN + FP = 9452), we see that "not sarcastic" labels outnumber "sarcastic" labels by a ratio of approximately 17:1.

The 10,000 messages sent by minors are a class-imbalanced dataset.

If we compare the number of messages from minors that are actually sarcastic (TP+FN = 4324) with the number of messages that are actually not sarcastic (TN + FP = 5676), we see that there is a 1.3:1 ratio of "not sarcastic" labels to "sarcastic" labels. Given that the distribution of labels between the two classes is quite close to 50/50, this is not a class-imbalanced dataset.

References:

https://developers.google.com/machine-learning/crash-course/fairness/check-your-understanding

A sarcasm-detection model was trained on 80,000 text messages: 40,000 messages sent by adults (18 years and older) and 40,000 messages sent by minors (less than 18 years old). The model was then evaluated on a test set of 20,000 messages: 10,000 from adults and 10,000 from minors. The following confusion matrices show the results for each group (a positive prediction signifies a classification of "sarcastic"; a negative prediction signifies a classification of "not sarcastic"):

Adults

True Positives (TPs): 512	False Positives (FPs): 51
False Negatives (FNs): 36	True Negatives (TNs): 9401
$$\text{Precision} = \frac{TP}{TP+FP} = 0.909$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.934$$

Minors

True Positives (TPs): 2147	False Positives (FPs): 96
False Negatives (FNs): 2177	True Negatives (TNs): 5580
$$\text{Precision} = \frac{TP}{TP+FP} = 0.957$$
$$\text{Recall} = \frac{TP}{TP+FN} = 0.497$$

Engineers are working on retraining this model to address inconsistencies in sarcasm-detection accuracy across age demographics, but the model has already been released into production. Which of the following stopgap strategies will help mitigate errors in the model's predictions?

Restrict the model's usage to text messages sent by adults.

The model performs well on text messages from adults (with precision and recall rates both above 90%), so restricting its use to this group will sidestep the systematic errors in classifying minors' text messages.

When the model predicts "not sarcastic" for text messages sent by minors, adjust the output so the model returns a value of "unsure" instead.

The precision rate for text messages sent by minors is high, which means that when the model predicts "sarcastic" for this group, it is nearly always correct.

The problem is that recall is very low for minors; The model fails to identify sarcasm in approximately 50% of examples. Given that the model's negative predictions for minors are no better than random guesses, we can avoid these errors by not providing a prediction in these cases.

Restrict the model's usage to text messages sent by minors.

The systematic errors in this model are specific to text messages sent by minors. Restricting the model's use to the group more susceptible to error would not help.

Adjust the model output so that it returns "sarcastic" for all text messages sent by minors, regardless of what the model originally predicted.

Always predicting "sarcastic" for minors' text messages would increase the recall rate from 0.497 to 1.0, as the model would no longer fail to identify any messages as sarcastic. However, this increase in recall would come at the expense of precision. All the true negatives would be changed to false positives:

True Positives (TPs): 4324	False Positives (FPs): 5676
False Negatives (FNs): 0	True Negatives (TNs): 0

which would decrease the precision rate from 0.957 to 0.432. So, adding this calibration would change the type of error but would not mitigate the magnitude of the error.

References:

https://developers.google.com/machine-learning/crash-course/fairness/check-your-understanding

An industrial company wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases. During training, your custom model converges, but the tests are giving unsatisfactory results.

What do you think might be the problem, and how could you proceed to fix it?

You have used too few examples, you need to re-train with a larger set of images

When you have a different trend between training and validation, you have an overfitting problem. More data may help you, but you have to simplify the model first.

You have to change the type of algorithm and use XGBoost

The problem is not with the algorithm but is within feature management.

You have an overfitting problem

Decrease your Learning Rate hyperparameter

Decreasing the Learning Rate hyperparameter is useless. The model converges in training.

The model is too complex, you have to regularize the model and then make it simpler

Use L2 Ridge Regression

References:

You need to develop and train a model capable of analyzing snapshots taken from a moving vehicle and detecting if obstacles arise. Your work environment is Vertex AI.

Which technique or algorithm do you think is best to use?

TabNet algorithm with TensorFlow

TabNet is used with tabular data, not images. It is a neural network that chooses the best features at each decision step in such a way that the model is optimized simpler.

A linear learner with Tensorflow Estimator API

A linear learner is not suitable for images too. It can be applied to regression and classification predictions.

XGBoost with BigQuery ML

BigQuery ML is designed for structured data, not images.

TensorFlow Object Detection API

TensorFlow Object Detection API is designed to identify and localize multiple objects within an image. So it is the best solution.

References:

Your team works on a smart city project with wireless sensor networks and a set of gateways for transmitting sensor data. You have to cope with many design choices. You want, for each of the problems under study, to find the simplest solution.
For example, it is necessary to decide on the placement of nodes so that the result is the most economical and inclusive. An algorithm without data tagging must be used.

Which of the following choices do you think is the most suitable?

K-means

K-means is an unsupervised learning algorithm used for clustering problems. It is useful when you have to create similar groups of entities. So, even if there is no need to label data, it is not suitable for our scope.

Q-learning

Q-learning is an RL Reinforcement Learning algorithm. RL provides a software agent that evaluates possible solutions through a progressive reward in repeated attempts. It does not need to provide labels. But it requires a lot of data and several trials and the possibility to evaluate the validity of each attempt.
The main RL algorithms are deep Q-network (DQN) and deep deterministic policy gradient (DDPG).

K-Nearest Neighbors

K-NN is a supervised classification algorithm, therefore, labeled. New classifications are made by finding the closest known examples.

Support Vector Machine(SVM)

SVM is a supervised ML algorithm, too. K-NN distances are computed. These distances are not between data points, but with a hyper-plane, that better divides different classifications.

References:

The purpose of your current project is the recognition of genuine or forged signatures on checks and documents against regular signatures already stored by the Bank. There is obviously a very low incidence of fake signatures. The system must recognize which customer the signature belongs to and whether the signature is identified as genuine or skilled forged.

What kind of ML model do you think is best to use?

Binary logistic regression

Binary logistic regression deals with a classification problem that may result in true or false, like with spam emails. The issue here is far more complex.

Matrix Factorization

Matrix Factorization is used in recommender systems, like movies on Netflix. It is based on a user-item (movie) interaction matrix and the problem of reducing dimensionality.

Convolutional Neural Networks

A Convolutional Neural Network is a Deep Neural Network in which the layers are made up of processed sections of the source image. This technique allows you to simplify images and highlight shapes and features regardless of the physical position in which they may be found.
For example, if we have the same signature in the center or at the bottom right of an image, the object will be different. But the signature is the same. A neural network that compares these derived features and can simplify the model achieves the best results.

Multiclass logistic regression

Multiclass logistic regression deals with a classification problem with multiple solutions, fixed and finite classes. It is an extension of binary logistic regression with basically the same principles with the assumption of several independent variables. But in image recognition problems, the best results are achieved with CNN because they are capable of finding and relating patterns positioned in different ways on the images.

References:

The purpose of your current project is the recognition of genuine or forged signatures on checks and documents against regular signatures already stored by the Bank. There is obviously a very low incidence of fake signatures. The system must recognize which customer the signature belongs to and whether the signature is identified as genuine or skilled forged.

Which of the following technical specifications can't you use with CNN?

Kernel Selection

Filters or kernels are a computation on a sub-matrix of pixels.

Feature Cross

A cross of functions is a dome that creates new functions by multiplying (crossing) two or more functions.
It has proved to be an important technique and is also used to introduce non-linearity to the model. We don't need it in our case.

Stride

Stride is obtained by sliding the kernel by 1 pixel.

Max pooling layer

A Max pooling layer is created taking the max value of a small region. It is used for simplification.

Dropout is also for simplification or regularization. It randomly zeroes some of the matrix values in order to find out what can be discarded with minor loss (and no overfitting)

References:

https://towardsdatascience.com/convolution-neural-networks-a-beginners-guide-implementing-a-mnist-hand-written-digit-8aa60330d022

Your client has a large e-commerce Website that sells sports goods and especially scuba diving equipment. It has a seasonal business and has collected many sales data from its structured ERP and market trend databases. It wants to predict the demand of its customers both to increase business and improve logistics processes.

Which of the following types of models and techniques should you focus on to obtain results quickly and with minimum effort?

Custom Tensorflow model with an autoencoder neural network

A custom Tensorflow model needs more time and effort. Moreover, an autoencoder is a type of artificial neural network that is used in the case of unlabeled data (unsupervised learning). The autoencoder is an excellent system for generalization and therefore to reduce dimensionality, training the network to ignore insignificant data ("noise") is not our scope.

Bigquery ML ARIMA

We need to manage time-series data. Bigquery ML ARIMA_PLUS can manage time-series forecasts. The model automatically handles anomalies, seasonality, and holidays.

BigQuery Boosted Tree

Boosted Tree is an ensemble of Decision Trees, so not suitable for time series.

BigQuery Linear regression

Linear Regression cuts off seasonality. It is not what the customer wants.

References:

Your team is designing a fraud detection system for a major Bank. The requirements are:

Various banking applications will send transactions to the new system in real-time and in standard/normalized format.
The data will be stored in real-time with some statistical aggregations.
An ML model will be periodically trained for outlier detection.
The ML model will issue the probability of fraud for each transaction.
It is preferable to have no labeling and as little software development as possible.

Which products would you choose?

Dataprep

Dataproc

Dataflow Flex

Pub/Sub

Composer

BigQuery

BigTable

The Optimal procedure to achieve the goal is:

Pub / Sub to capture the data stream
Dataflow Flex to aggregate and extract insights in real-time in BigQuery
BigQuery ML to create the models

All the other solutions' usage will be sub-optimal and will need more effort.

References:

Your team is designing a fraud detection system for a major Bank. The requirements are:

Various banking applications will send transactions to the new system in real-time and in standard/normalized format.
The data will be stored in real-time with some statistical aggregations.
An ML model will be periodically trained for outlier detection.
The ML model will issue the probability of fraud for each transaction.
It is preferable to have no labeling and as little software development as possible.

Which kinds of ML model could be used?

K-means

The k-means clustering is a mathematical and statistical method on numerical vectors that divides and observes k clusters. Each example belongs to the cluster with the closest mean (cluster centroid).
In ML, it is an unsupervised classification method and is widely used to detect unusual or outlier movements. For these reasons, it is one of the main methods for fraud detection.
But it is not the only method because not all frauds are linked to strange movements. There may be other factors.

Decision Tree

Decision Tree is suboptimal because of just Decision Trees.

Random Forest

Random Forest is suboptimal because of just Decision Trees.

Matrix Factorization

Matrix Factorization is for recommender systems. So, it predicts the preference of an item based on the experience of other users. Not suitable for us.

Boosted Tree - XGBoost

XGBoost, which as you can see from the figure, is an evolution of the decision trees, has recently been widely used in this field and has had many positive results.

It is an open-source project and this is the description from its Github page:
XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems in a fast and accurate way. The same code runs on major distributed environments (Kubernetes, Hadoop, SGE, MPI, Dask) and can solve problems beyond billions of examples.

References:

In your company, you train and deploy several ML models with Tensorflow. You use on-prem servers, but you often find it challenging to manage the most expensive training and control and update the models. You are looking for a system that can handle all these tasks.

Which solutions can you adopt?

Kubeflow to run on Google Kubernetes Engine

Kubeflow Pipelines is an open-source platform designed specifically for creating and deploying ML workflows based on Docker containers.
Their main features:

Using packaged templates in Docker images in a K8s environment
Manage your various tests/experiments
Simplifying the orchestration of ML pipelines
Reuse components and pipelines

Vertex AI

Vertex AI is an integrated suite of ML services that:

Train an ML model both without code (AutoML) and with custom
Evaluate and tune a model
Deploy models
Manage prediction: Batch, Online and monitoring
Manage model versions: workflows and retraining
Manage the complete model maintenance cycle

Use Scikit-Learn that is simple and powerful

Scikit-learn is an ML platform with many standard algorithms easy and immediate to use. TensorFlow (from the official doc) is an end-to-end open-source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries, and community resources that lets researchers push the state-of-the-art into ML, and developers easily build and deploy ML-powered applications.
So, there are 2 different platforms, even if there is Scikit Flow that integrates the two.
Scikit-learn doesn't manage ML Pipelines.

Use SageMaker managed services

SageMaker is an AWS ML product.

References:

https://cloud.google.com/vertex-ai

You have an NLP model for your company's Customer Care and Support Office. This model evaluates the general satisfaction of customers on the main categories of services offered and has always provided satisfactory performances.

You have recently expanded the range of your services and want to refine / update your model. You also want to activate procedures that automate these processes.

Which choices among the following do you prefer in the Cloud GCP?

You don't need to change anything. If the model is well made and has no overfitting, it will be able to handle anything.

Retrain using information from the last week of work only.

Add examples with new product data and still regularly re-train and evaluate new models.

Creating and using templates is not a one-shot activity. But, like most processes, it is an ongoing one, because the underlying factors can vary over time.
Therefore, you need to continuously monitor the processes and retrain the model also on newer data, if you find that the frequency distributions of the data vary from the original configuration. It may also be necessary or desirable to create a new model.
Generally, a periodic schedule is adopted every month or week.
For this very reason, all the other answers are not exact.

Make a separate model with new product data and create the model ensemble.

References:

https://medium.com/kubeflow/automated-model-retraining-with-kubeflow-pipelines-691a5f211701

Your company is designing a series of models aimed at optimal customer care management.

For this purpose, all written and voice communications with customers are recorded so that they can be classified and managed.
The problem is that Clients often provide private information that cannot be distributed and disclosed.

Which of the following techniques can you use?

Cloud Data Loss Prevention API (DLP)

Cloud Data Loss Prevention is a managed service specially designed to discover sensitive data automatically that may be protected. It could be used for personal codes, credit card numbers, addresses and any private contact details, etc.

CNN - Convolutional Neural Network

A Convolutional Neural Network is a Deep Neural Network in which the layers are made up of processed sections of the source image. So, it is a successful method for image and shape classification.

Cloud Speech API

Cloud Speech API is useful if you have audio recordings as it is a speech-to-text service.

Vision API

Vision API has a built-in text-detection service. So you can get text from images.

References:

https://cloud.google.com/architecture/sensitive-data-and-ml-datasets

Your team is working for a major apparel company that is developing an online business with significant investments.

The company adopted Analytics-360. So, it can achieve a lot of data on the activities of its customers and on the interest of the various commercial initiatives of the websites, such as (from Google Analytics-360):

Average bounce rate per dimension
Average number of product page views by purchaser type
Average number of transactions per purchaser
Average amount of money spent per session
Sequence of hits (pathing analysis)
Multiple custom dimensions at hit or session level
Average number of user interactions before purchase

The first thing management wants is to categorize customers to determine which types are more likely to buy.

Subsequently, further models will be created to incentivize the most interesting customers better and boost sales.

You have a lot of work to do and you want to start quickly.

What techniques do you use in this first phase?

BigQuery e BigQuery ML

It is necessary to create different groups of customers based on purchases and their characteristics for these requirements.
We are in the field of unsupervised learning. BigQuery is already set up both for data acquisition and for training, validation and use of this kind of model.

Cloud Storage con AVRO

Vertex AI TensorBoard

Vertex AI TensorBoard is suitable to set up visualizations for ML experiments.

Binary Classification

K-means

The K-means model in BigQuery ML uses a technique called clustering. Clustering is a statistical technique that allows, in our case, to classify customers with similar behaviors for marketing automatically.

KNN

Deep Neural Network

All the other answers address more complex and more cumbersome solutions.
Furthermore, while the others are all supervised, we do not have ready-made solutions, but we want the model to provide us with the required categories.

References:

Your team prepared a custom model with Tensorflow that forecasts, based on diagnostic images, which cases need more analysis and medical support.

The accuracy of the model is very high. But when it is deployed in production, the medical staff is very dissatisfied.

What is the most likely motivation?

Logistic regression with a classification threshold too high

DNN Model with overfitting

DNN Model with underfitting

You have to perform feature crosses

When there is an imbalance between true and false ratios in binary classification, it is necessary to modify the classification threshold so that the most probable errors are those with minor consequences. In our case, it is better to be wrong with a healthy person than with a sick one.
Accuracy is the number of correct predictions on the total of predictions done.
Let’s imagine that we have 100 predictions, and 95 of them are correct. That is 95%. It looks almost perfect.
But we assume that the system has foreseen 94 true negative cases and only one true positive case, and one case of false positive, and 4 cases of false negative.
So, the model predicted 98 healthy when they were 95 and 2 suspected cases when they were 5.
The problem is that sick patients are, luckily, a minimal percentage. But it is important that they are intercepted. So, our model failed because it correctly identified only 1 case out of the total of 5 real positives that is 20% (recall). It also identified 2 positives, one of which was negative, i.e. 50% (precision).
It's not good at all.
Precision: Rate of correct positive identifications
Recall: Rate of real positives correctly identified
To calibrate the result, we need to change the threshold we use to decide between positive and negative. The model does not return 0 and 1 but a value between 0 and 1 (sigmoid activation function). In our case, we have to choose a threshold lower than 0.5 to classify it as positive. In this way, we risk carrying out further investigations on the healthy but being able to treat more sick patients. It is definitely the desired result.

References:

https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall

You work in a company that has acquired an advanced consulting services company. Management wants to analyze all past important projects and key customer relationships. The consulting company does not have an application that manages this data in a structured way but is certified for the quality of its services. All its documents follow specific rules.

It was decided to acquire structured information on projects, areas of expertise and customers through the analysis of these documents.

You're looking for ML methodologies that make this process quicker and easier.

What is the better choice in GCP?

Vision API

Cloud Natural Language API

Document AI

Document AI is the ideal broad-spectrum solution. It is a service that gives a complete solution with computer vision and OCR, NLP and data management. It allows you to extract and structure information automatically from documents. It can also enrich them with the Google Knowledge Graph to verify company names, addresses, and telephone numbers to draw additional or updated information.
All other answers are incorrect because their functions are already built into Document AI.

AutoML Natural Language

References:

Your customer has an online dating platform that, among other things, analyzes the degree of affinity between the various people. Obviously, it already uses ML models and uses, in particular, XGBoost, the gradient boosting decision tree algorithm, and is obtaining excellent results.

All its development processes follow CI / CD specifications and use Docker containers. The requirement is to classify users in various ways and update models frequently, based on new parameters entered into the platform by the users themselves.

So, the problem you are called to solve is how to optimize frequently re-trained operations with an optimized workflow system.

Which solution among these proposals can best solve your needs?

Deploy the model on BigQuery ML and setup a job

Use Kubeflow Pipelines to design and execute your workflow

Kubeflow Pipelines is the ideal solution because it is a platform designed specifically for creating and deploying ML workflows based on Docker containers. So, it is the only answer that meets all requirements.
The main functions of Kubeflow Pipelines are:

Using packaged templates in Docker images in a K8s environment
Manage your various tests/experiments
Simplifying the orchestration of ML pipelines
Reuse components and pipelines

It is within the Kubeflow ecosystem, which is the machine learning toolkit for Kubernetes

Vertex AI Model Monitoring is useful for detecting if the model is no longer suitable for your needs.
Creating ML workflows is possible with Vertex AI Pipelines.
The other answers may be partially correct but do not resolve all items or need to add more coding.

Use Vertex AI Monitoring

Orchestrate activities with Google Cloud Workflows

Develop procedures with Pub/Sub and Cloud Run

Schedule processes with Cloud Composer

References:

You have an ML model designed for an industrial company that provides the correct price to buy goods based on a series of elements, such as the quantity requested, the level of quality and other specific variables for different types of products.

You have built a linear regression model that works well but whose performance you want to optimize.

Which of these techniques could you use?

Clipping

Feature clipping eliminates outliers that are too high or too low.

Log scaling

When you don't have a fairly uniform distribution, you can instead use Log Scaling which can compress the data range: x1 = log (x)

Z-score

Z-Score is similar to scaling, but uses the deviation from the mean divided by the standard deviation, which is the classic index of variability. So, it gives how many standard deviations each value is away from the mean.

Scaling to a range

Scaling means transforming feature values into a standard range, from 0 and 1 or sometimes -1 to +1. It's okay when you have an even distribution between minimum and maximum.

All of them

All these methods maintain the differences between values, but limit the range. So the computation is lighter.

References:

https://developers.google.com/machine-learning/data-prep/transform/normalization

You are starting to operate as a Data Scientist and are working on a deep neural network model with Tensorflow to optimize customer satisfaction for after-sales services to create greater client loyalty.
You are doing Feature Engineering, and your focus is to minimize bias and increase accuracy. Your coordinator has told you that by doing so you risk having problems. He explained to you that, in addition to the bias, you must consider another factor to be optimized. Which one?

Blending

Blending indicates an ensemble of ML models.

Learning Rate

Learning Rate is a hyperparameter in neural networks.

Feature Cross

Feature Cross is the method for obtaining new features by multiplying other ones.

Bagging

Bagging is an ensemble method like Blending.

Variance

The variance indicates how much function f (X) can change with a different training dataset. Obviously, different estimates will correspond to different training datasets, but a good model should reduce this gap to a minimum.
The bias-variance dilemma is an attempt to minimize both bias and variance.
The bias error is the non-estimable part of the learning algorithm. The higher it is, the more underfitting there is.
Variance is the sensitivity to differences in the training set. The higher it is, the more overfitting there is.

References:

https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff

Your company supplies environmental management services and has a network of sensors that acquire information uploaded to the Cloud to be pre-processed and managed with some ML models with dynamic dashboards used by customers.

Periodically, the models are retrained and re-deployed, with a rather complex pipeline on VM clusters:

New data is streamed from Dataflow
Data is transformed through aggregations and normalizations (z-scores)
The model is periodically retrained and evaluated
New Docker images are created and stored

You want to simplify the pipeline as much as possible and use fully managed or even serverless services as far as you can.

Which do you choose from the following services?

Kubeflow

Vertex AI custom training

With Vertex AI you can use AutoML training and custom training in the same environment.
It's a managed but not a serverless service, especially for custom training.
It obviously has a rich set of features for managing ML pipelines.

BigQuery and BigQuery ML

BigQuery and BigQuery ML are powerful services for data analysis and machine learning.
They are fully serverless services that can process petabytes of data in public and private datasets and even data stored in files.
BigQuery works with standard SQL and has a CLI interface: bq.
You can use BigQuery jobs to automate and schedule tasks and operations.
With BigQuery ML, you can train models with a rich set of algorithms with data already stored in the Cloud. You may perform feature engineering and hyperparameter tuning and export a BigQuery ML model to a Docker image as required.

TFX

All other services are useful in ML pipelines, but they aren't that easy and ready to use.

References:

Your company runs an e-commerce site. You produced static deep learning models with Tensorflow that process Analytics-360 data. They have been in production for some time. Initially, they gave you excellent results, but gradually, the accuracy has progressively decreased. You retrained the models with the new data and solved the problem.

At this point, you want to automate the process using the Google Cloud environment.

Which of these solutions allows you to quickly reach your goal?

Cluster Compute Engine and KubeFlow

GKE and TFX

GKE and KubeFlow

Vertex AI Pipelines and TensorFlow Extended TFX

TFX is a platform that allows you to create scalable production ML pipelines for TensorFlow projects, therefore Kubeflow.
It, therefore, allows you to manage the entire life cycle seamlessly from modeling, training, and validation, up to production start-up and management of the inference service.
Vertex AI Pipelines can run pipelines built using TFX:

You can configure a Cluster
Select basic parameters and click create
You get your Kubeflow and Kubernetes launched

All the other answers are correct, but not optimal for a quick and managed solution.

References:

You have a Linear Regression model for the optimal management of supplies to a sales network based on a large number of different driving factors. You want to simplify the model to make it more efficient and faster. Your first goal is to synthesize the features without losing the information content that comes from them.

Which of these is the best technique?

Feature Crosses

Feature Crosses are for the same objective, but they add non-linearity.

Principal component analysis (PCA)

Principal component analysis is a technique to reduce the number of features by creating new variables obtained from linear combinations or mixes of the original variables, which can then replace them but retain most of the information useful for the model. In addition, the new features are all independent of each other.
The new variables are called principal components.
A linear model is assumed as a basis. Therefore, the variables are independent of each other.

Embeddings

Embeddings, which transform large sparse vectors into smaller vectors are used for categorical data.

Functional Data Analysis

Functional Data Analysis has the goal to cope with complexity, but it is used when it is possible to substitute features with functions- not our case.

References:

TerramEarth is a company that builds heavy equipment for mining and agriculture.

During maintenance services for vehicles produced by TerramEarth at the service centers, information relating to their use is downloaded. Every evening, this data flows into the data center, is consolidated and sent to the Cloud.

TerramEarth has an ML model that predicts component failures and optimizes the procurement of spare parts for service centers to offer customers the highest level of service. TerramEarth wants to automate the redevelopment and distribution process every time it receives a new file.

What is the best service to start the process?

Cloud Storage trigger with Cloud Functions

Files are received from Cloud Storage, which has native triggers for all the events related to its file management.
So, we may start a Cloud Function that may activate any Cloud Service as soon as the file is received.
Cloud Storage triggers may also activate a Pub/Sub notification, just a little more complex.
It is the simplest and most direct solution of all the answers.

Cloud Scheduler every night

Pub/Sub

Cloud Run and Cloud Build

References:

You work in a major banking institution. The Management has decided to rapidly launch a bank loan service, as the Government has created a series of “first home” facilities for the younger population.

The goal is to carry out the automatic management of the required documents (certificates, origin documents, legal information) so that the practice can be built and verified automatically using the data and documents provided by customers and can be managed in a short time and with the minimum contribution of the scarce specialized personnel.

Which of these GCP services can you use?

Dialogflow

Dialogflow is for speech Dialogs, not written documents.

Document AI

Document AI is the perfect solution because it is a complete service for the automatic understanding of documents and their management.
It integrates computer natural language processing, OCR, and vision and can create pre-trained templates aimed at intelligent document administration.

Cloud Natural Language API

NLP is integrated into Document AI.

AutoML

functions like AutoML are integrated into Document AI, too.

References:

Your company does not have an excellent ML experience. They want to start with a service that is as smooth, simple and managed as possible. The idea is to use BigQuery ML. Therefore, you are considering whether it can cover all the functionality you need.

Which of the following features are not present in BigQuery ML natively?

Exploratory data analysis

Feature selection

Model building

Training

Hyperparameter tuning

Automatic deployment and serving

BigQuery is perfect for Analytics. So, exploratory data analysis and feature selection are simple and very easy to perform with the power of SQL and the ability to query petabytes of data.
BigQuery ML offers all other features except automatic deployment and serving.
BigQuery ML can simply export a model (packaged in a container image) to Cloud Storage.

References:

Your client has an e-commerce site for commercial spare parts for cars with competitive prices. It started with the small car sector but is continually adding products. Since 80% of them operate in a B2B market, he wants to ensure that his customers are encouraged to use the new products that he gradually offers on the site quickly and profitably.

Which GCP service can be valuable in this regard and in what way?

Create a Tensorflow model using Matrix factorization

Create a Tensorflow model using Matrix factorization could be OK, but it needs a lot of work.

Use Recommendations AI

Recommendations AI is a ready-to-use service for all the requirements shown in the question. You don’t need to create models, tune, train, all that is done by the service with your data. Also, the delivery is automatically done, with high-quality recommendations via web, mobile, email. So, it can be used directly on websites during user sessions.

Import the Product Catalog

Import the Product Catalog deal only with data management, not creating recommendations.

Record / Import User events

Record / Import User events deal only with data management, not creating recommendations.

References:

Your client has an e-commerce site for commercial spare parts for cars with competitive prices. It started with the small car sector but is continually adding products. Since 80% of them operate in a B2B market, he wants to ensure that his customers are encouraged to use the new products that he gradually offers on the site quickly and profitably.

You decided on Recommendations AI.

What specific recommendation model type is not useful for new products?

Others You May Like

Frequently Bought Together

Recommended for You

Categories

Categories are not related to Data Validation. Usually, they are categorical, string variables that in ML usually are mapped in a numerical set before training.

Duplicate examples.

Duplicate examples may change fundamental statistics, too.
For example, we may have duplicates when a program loops and creates the same data several times.

Bad labels.

Having bad labels (with supervised learning) means obtaining a bad model.

Bad feature values

Having bad features means obtaining a bad model.

References:

https://developers.google.com/machine-learning/crash-course/representation/cleaning-data

You are a junior Data Scientist, and you are being interviewed for a new job.
A senior Data Scientist asked you:
Which metric for classification models evaluation gives you the percentage of real spam email that was recognized correctly?

What was the exact answer to this question?

Precision

Precision is the metric that shows the percentage of true positives related to all your positive class predictions.

Recall

Recall indicates how true positives were recalled (found).

Accuracy

Accuracy is the percentage of correct predictions on all outcomes.

F-Score

F1 score is the harmonic mean between precision and recall.

References:

You are working on an NLP model. So, you are dealing with words and sentences, not numbers. Your problem is to categorize these words and make sense of them. Your manager told you that you have to use embeddings.

Which of the following techniques is not related to embeddings?

Count Vector

A Count Vector gives a matrix with the count of every single word in every example. 0 if no occurrence. It is okay for small vocabularies.

TF-IDF Vector

TF-IDF vectorization counts words in the entire experiment, not a single example or sentence.

Co-Occurrence Matrix

Co-Occurrence Matrix puts together words that occur together. So, it is more useful for text understanding.

CoVariance Matrix

Covariance matrices are square matrices with the covariance between each pair of elements.
It measures how much the change of one with respect to another is related.

References:

Your company operates an innovative auction site for furniture from all times. You have to create a series of ML models that allow you to establish the period, style and type of the piece of furniture depicted starting from the photos. Furthermore, the model must be able to determine whether the furniture is interesting and require it to be subject to a more detailed estimate. You created the model, but your manager said that he wants to supply this service to mobile users when they go to the fly markets.

Which of the following services do you think is the most suitable?

AutoML Edge

AutoML Edge lets your model be deployed on edge devices and, therefore, mobile phones, too.
All the other answers refer to Cloud solutions; so, they are wrong.

Vision API

Vision API uses pre-trained models trained by Google.

Video Intelligence API

Video Intelligence API manages videos, not pictures. It can extract metadata from any streaming video, get insights in a far shorter time, and let trigger events.

AutoML

AutoML lets you train models to classify your images with your own characteristics and labels; so, you can tailor your work as you want.

References:

You are training a set of modes that should be simple, using regression techniques. During training, your model seems to work. But the tests are giving unsatisfactory results. You discover that you have several missing data. You need a tool that helps you cope with them.

Which GCP product would you choose?

Dataproc

Dataproc is a managed Spark and Hadoop service. Therefore, it is for BigData processing.

Dataprep

Dataprep is a serverless service that lets you examine clean and correct structured and unstructured data.
So, it is fully compliant with our requirements.

Dataflow

Cloud Dataflow is a managed service to run Apache Beam-based data pipeline, both batch and streaming.

Data Fusion

Data Fusion is for data pipelines too. But it is visual and simpler, and it integrates multiple data sources to produce new data.

References:

In your company you use Tensorflow and Keras as main libraries for Machine Learning and your data is stored in disk files, so block storage.

Recently there has been the migration of all the management computing systems to Google Cloud and management has requested that the files should be stored in Cloud Storage and that the tabular data should be stored in BigQuery and pre-processed with Dataflow.

Which of the following techniques is NOT suitable for accessing tabular data as required?

BigQuery Python client library

Python BigQuery client library allows you to get BigQuery data in Panda, so it's definitely useful in this environment.

BigQuery I/O Connector

BigQuery I/O Connector is used by Dataflow (Apache Beam) for reading Data for transformation and processing, as required.

tf.data.Iterator

tf.data.Iterator is used for enumerating elements in a Dataset, using Tensorflow API, so it is not suitable for accessing tabular data.

tf.data.dataset reader

tf.data.dataset reader is wrong because you must first access the data using the tf.data.dataset reader for BigQuery.

References:

You are a junior Data Scientist. You are working with a linear regression model with sklearn.

Your outcome model presented a good R-square - coefficient of determination, but the final results were poor.

When you asked for advice, your mentor laughed and said that you failed because of the Anscombe Quartet problem.

What are the other possible problems described by the famous Anscombe Quartet?

Not linear relation between independent and dependent variables

Outliers that change the result

Correlation among variables

Correlation prevent the model from working, but they do not give good theoretical results.

Uncorrect Data

Incorrect data prevent the model from working, but they do not give good theoretical results.

The most common problems are:

Not linear relation and
Outliers

References:

You are working on a deep neural network model with Tensorflow. Your model is complex, and you work with very large datasets full of numbers.
You want to increase performances. But you cannot use further resources.
You are afraid that you are not going to deliver your project in time.
Your mentor said to you that normalization could be a solution.

Which of the following choices do you think is not for data normalization?

Scaling to a range

Scaling to a range converts numbers into a standard range ( 0 to 1 or -1 to 1).

Feature Clipping

Feature Clipping caps all numbers outside a certain range.

Z-test

z-test is not correct because it is a statistic that is used to prove if a sample mean belongs to a specific population. For example, it is used in medical trials to prove whether a new drug is effective or not.

log scaling

Log Scaling uses the logarithms instead of your values to change the shape. This is possible because the log function preserves monotonicity.

Z-score

Z-score is a variation of scaling: the resulting number is divided by the standard deviations. It is aimed at obtaining distributions with mean = 0 and std = 1.

References:

Your team is designing a financial analysis model for a major Bank.

The requirements are:

Various banking applications will send transactions to the new system both in real-time and in batch in standard/normalized format
The data will be stored in a repository
Structured Data will be trained and retrained
Labels are drawn from the data.

You need to prepare the model quickly and decide to use AutoML for structured Data.

Which GCP Services could you use?

AutoML

Tensorflow Extended

Tensorflow Extended is for deploying production ML pipelines, and it doesn't have any AutoML Services

BigQuery ML

Vertex AI

AutoML Tables is aimed to automatically build and deploy models on your data in the fastest way possible.
It is integrated within BigQuery ML and is available in the unified Vertex AI.

References:

You are starting to operate as a Data Scientist and are working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services with the goal of creating greater client loyalty.
You have to follow the entire lifecycle: model development, design, and training, testing, deployment, and retraining.
You are looking for UI tools that can help you work and solve all issues faster.

Which solutions can you adopt?

Tensorboard

Tensorboard is aimed at model creation and experimentation:

Profiling
Monitoring metrics, weights, biases
Examine model graph
Working with embeddings

Jupyter notebooks

Jupyter notebooks are a wonderful tool to develop, experiment, and deploy. You may have the latest data science and machine learning frameworks with them.

KFServing

KFServing is an open-source library for Kubernetes that enables serverless inferencing. It works with TensorFlow, XGBoost, scikit-learn, PyTorch, and ONNX to solve issues linked to production model serving. So, no UI.

Kubeflow UI

The Kubeflow UIs is for ML pipelines and includes visual tools for:

Pipelines dashboards
Hyperparameter tuning
Artifact Store
Jupyter notebooks

Vertex AI

With Vertex AI you can use AutoML training and custom training in the same environment.

References:

You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases.
You work on this project. You need to deal with input data that is binary (images) and made by CSV files.
You are looking for the most convenient way to import and manage this type of data.

Which is the best solution that you can adopt?

tf.RaggedTensor

RaggedTensor is a tensor with ragged dimensions, that is with different lengths like this: [[6, 4, 7, 4], [], [8, 12, 5], [9], []]

Tf.quantization

Quantization is aimed to reduce CPU and TPU GCP latency, processing, and power.

tf.train.Feature

tf.train is a feature for Graph-based Neural Structured model training

tf.TFRecordReader

The TFRecord format is efficient for storing a sequence of binary and not-binary records using Protocol buffers for serialization of structured data.

References:

https://www.tensorflow.org/tutorials/load_data/tfrecord

You work for an industrial company that wants to improve its quality system. It has developed its own deep neural network model with Tensorflow to identify the semi-finished products to be discarded with images taken from the production lines in the various production phases.

You need to monitor the performance of your models and let them go faster.

Which is the best solution that you can adopt?

TF Profiler

TensorFlow Profiler is a tool for checking the performance of your TensorFlow models and helping you to obtain an optimized version.
In TensorFlow 2, the default is eager execution. So, one-off operations are faster, but recurring ones may be slower. So, you need to optimize the model.

TF function

TF function is a transformation tool used to make graphs out of your programs. It helps to create performant and portable models but is not a tool for optimization.

TF Trace

TF tracing lets you record TensorFlow Python operations in a graph.

TF Debugger

TF debugging is for Debugger V2 and creates a log of debug information.

TF Checkpoint

Checkpoints catch the value of all parameters in a serialized SavedModel format.

References:

You work for an important Banking group.

The purpose of your current project is the automatic and smart acquisition of data from documents and modules of different types.

You work on big datasets with a lot of private information that cannot be distributed and disclosed.

You are asked to replace sensitive data with specific surrogate characters.

Which of the following techniques do you think is best to use?

Format-preserving encryption

Format-preserving encryption (FPE) encrypts in the same format as the plaintext data.
For example, a 16-digit credit card number becomes another 16-digit number.

K-anonymity

k-anonymity is a way to anonymize data in such a way that it is impossible to identify person-specific information. Still, you maintain all the information contained in the record.

Replacement

Replacement just substitutes a sensitive element with a specified value.

Masking

Masking replaces sensitive values with a given surrogate character, like hash (#) or asterisk (*).

References:

You have a customer ranking ML model in production for an e-commerce site; the model used to work very well.

You use GCP managed services, specifically Vertex AI.

Suddenly, there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine.

Finally, you control the input data and notice that the frequency distributions have changed for a specific feature.

Which GCP service can be helpful for you to manage features in a more organized way?

Regularization against overfitting

Regularization against overfitting is wrong because the model is OK

Feature Store

Feature engineering means transforming input data, often strings, into a feature vector.
Lots of effort is spent in mapping categorical values in the best way: we have to convert strings to numeric values. We have to define a vocabulary of possible values, usually mapped to integer values.
We remember that in an ML model everything must be translated into numbers. Therefore it is easy to run into problems of this type.
Vertex Feature Store is a service to organize and store ML features through a central store.
This allows you to share and optimize ML features important for the specific environment and to reuse them at any time.
All these translate into the greater speed of the creation of ML services. But these also allow minimizing problems such as processing skew, which occurs when the distribution of data in production is different from that of training, often due to errors in the organization of the features.
For example, Training-serving skew may happen when your training data uses a different unit of measure than prediction requests.
So, Training-serving skew happens when you generate your training data differently than you generate the data you use to request predictions. For example, if you use an average value, and for training purposes, you average over 10 days, but you average over the last month when you request prediction.

Hyperparameters tuning

Hyperparameters tuning is wrong because the model is OK. So both Regularization against overfitting and Hyperparameters are tuned.

Model Monitoring

Monitor is suitable for Training-serving skew prevention, not organization.

References:

You have a customer ranking ML model in production for an e-commerce site; the model used to work very well. You use GCP managed services, specifically Vertex AI. Suddenly there is a sensible degradation in the quality of the inferences. You perform various checks, but the model seems to be perfectly fine.

Which of the following methods could you use to avoid such problems?

Regularization against overfitting

Regularization against overfitting is wrong because the model is OK.

Feature Store

Feature Store is suitable for feature organization, not for data skew prevention.

Hyperparameters tuning

Hyperparameters tuning is wrong because the model is OK.

Model Monitoring

Input data to ML models may change over time. This can be a serious problem, as performance will obviously degrade.
To avoid this, it is necessary to monitor the quality of the forecasts continuously.
Vertex Model Monitoring has been designed just for this.
The main goal is to cope with feature skew and drift detection.
For skew detection, it looks at and compares the feature's values distribution in the training data.
For drift detection, it looks at and compares the feature's values distribution in the production data.
It uses two main methods:

Jensen-Shannon divergence for numerical features.
L-infinity distance for categorical features. More details can be found here

References:

Your company runs an e-commerce site. You produced static deep learning models with Tensorflow that process Analytics-360 data. They have been in production for some time. Initially, they gave you excellent results, but then gradually, the accuracy has decreased.

You are using Compute Engine and GKE. You decided to use a library that lets you have more control over all processes, from development up to production.

Which tool is the best one for your needs?

TFX

TensorFlow Extended (TFX) is a set of open-source libraries to build and execute ML pipelines in production. Its main functions are:

Metadata management
Model validation
Deployment
Production execution.
The libraries can also be used individually.

Vertex AI

Vertex AI is an integrated suite of ML managed products, and you are looking for a library.
Vertex AI's main functions are:

Train custom and AutoML models
Evaluate and tune models
Deploy models
Manage prediction: Batch, Online and monitoring
Manage model versions: workflows and retraining

SageMaker

Sagemaker is a managed product in AWS, not GCP.

Kubeflow

Kubeflow Pipelines don’t deal with production control.
Kubeflow Pipelines is an open-source platform designed specifically for creating and deploying ML workflows based on Docker containers.
Their main features:

Using packaged templates in Docker images in a K8s environment
Manage your various tests / experiments
Simplifying the orchestration of ML pipelines
Reuse components and pipelines

References:

https://www.tensorflow.org/tfx

Your company runs a big retail website. You develop many ML models for all the business activities. You migrated to Google Cloud when you were using Vertex AI. Your models are developed with PyTorch, TensorFlow and BigQuery ML.

You also use BigTable and CloudSQL, and of course Cloud Storage. In many cases, the same data is used for multiple models and projects. And your data is continuously updated, sometimes in streaming mode.

Which is the best way to organize the input data?

Dataflow per Data Transformation sia in streaming che batch

Dataflow deals with Data Pipelines and is not a way to access and organize data.

CSV

CSV is just a data format, and an ML Dataset is made with data and metadata dealing with many different formats.

BigQuery

BigQuery and BigTable are just one of the ways in which you can store data. Moreover, BigTable is not currently supported for data store for Vertex datasets.

Datasets

Vertex AI integrates the following elements:

Datasets: data, metadata and annotations, structured or unstructured. For all kinds of libraries.
Training pipelines to build an ML model
ML models, imported or created in the environment
Endpoints for inference

Because Datasets are suitable for all kinds of libraries, it is a useful abstraction for this requirement.

BigTable

References:

You are a Data Scientist and working on a project with PyTorch. You need to save the model you are working on because you have to cope with an urgency. You, therefore, need to resume your work later.

What command will you use for this operation?

callbacks.ModelCheckpoint (keras)

ModelCheckpoint is used with keras.

save

PyTorch is a popular library for deep learning that you can leverage using GPUs and CPUs.
When you have to save a model for resuming training, you have to record both models and updated buffers and parameters in a checkpoint.
A checkpoint is an intermediate dump of a model’s entire internal state (its weights, current learning rate, etc.) so that the framework can resume the training from that very point.
In other words, you train for a few iterations, then evaluate the model, checkpoint it, then fit some more. When you are done, save the model and deploy it as normal.
To save checkpoints, you must use torch.save() to serialize the dictionary of all your state data,
In order to reload, the command is torch.load().

model.fit

model.fit is used to fit a model in scikit-learn best.

train.Checkpoint TF

train.Checkpoint is used with Tensorflow.

References:

You are a Data Scientist. You are going to develop an ML model with Python. Your company adopted GCP and Vertex AI, but you need to work with your developing tools.

What are you going to do?

Use an Emulator

Use an Emulator is wrong because there isn’t a specific Emulator for using the SDK

Work with the Console

Work with the Console is wrong because it was asked to create a local work environment.

Create a service account key

Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS

Client libraries are used by developers for calling the Vertex AI API in their code.
The client libraries reduce effort and boilerplate code.
The correct procedure is:

Enable the Vertex AI API & Prediction and Compute Engine APIs.
Enable the APIs
Create/Use a Service account and a service account key
Set the environment variable named GOOGLE_APPLICATION_CREDENTIALS

References:

https://cloud.google.com/vertex-ai/docs/start/client-libraries#python

You are working with Vertex AI, the managed ML Platform in GCP. You are dealing with custom training and you are looking and studying the job progresses during the training service lifecycle.

Which of the following states is not correct?

JOB_STATE_ACTIVE

JOB_STATE_RUNNING

JOB_STATE_QUEUED

JOB_STATE_SUCCEEDED

Queueing a new job
When you create a CustomJob or HyperparameterTuningJob, the job is in the JOB_STATE_QUEUED.
When a training job starts, Vertex AI schedules as many workers according to configuration, in parallel.
So Vertex AI starts running code as soon as a worker becomes available.
When all the workers are available, the job state will be: JOB_STATE_RUNNING.
A training job ends successfully when its primary replica exits with exit code 0.
Therefore all the other workers will be stopped. The state will be: JOB_STATE_SUCCEEDED.
So JOB_STATE_ACTIVE is wrong simply because this state doesn’t exist. All the other answers are correct.
Each replica in the training cluster is given a single role or task in distributed training. For example:
Primary replica: Only one replica, whose main task is to manage the workers.
Worker(s): Replicas that do part of the work.
Parameter server(s): Replicas that store model parameters (optional).
Evaluator(s): Replicas that evaluate your model (optional).

References:

https://cloud.google.com/vertex-ai/docs/reference/rest/v1/JobState

You work as a Data Scientist for a major banking institution that recently completed the first phase of migration in GCP.

You now have to work in the GCP Managed Platform for ML. You need to deploy a custom model with Vertex AI so that it will be available for online predictions.

Which is the correct procedure?

Save the model in a Docker container

Vertex AI Prediction can serve prediction deploying custom or pre-built containers on N1 Compute Engine Instances.
You create an "endpoint object" for your model and then you can deploy the various versions of your model.
Its main elements are given below:
Custom or Pre-built containers
Model
Vertex AI Prediction uses an architectural paradigm that is based on immutable instances of models and model versions.
Regional endpoint

Set a VM with a GPU processor

You don’t need to set any specific VM. You will point out the configuration and Vertex will manage everything.

Use TensorFlow Serving

TensorFlow Serving is used under the hood, but you don’t need to call their functions explicitly.

Create an endpoint and deploy to that endpoint

The endpoint is the object that will be equipped with all the resources needed for online predictions and it is the target for your model deployments.

References:

You work as a Data Scientist in a Startup. You want to create an optimized input pipeline to increase the performance of training sessions, avoiding GPUs and TPUs as much as possible because they are expensive.

Which technique or algorithm do you think is best to use?

Caching

Prefetching

Parallelizing data

All of the above

GPUs and TPUs can greatly increase the performance of training sessions, but an optimized input pipeline is likewise important.
The tf.data API provides these functions:
Prefetching
tf.data.Dataset.prefetch: while the execution of a training pass, the data for the next pass is read.
Parallelizing data transformation
The tf.data API offers the map function for the tf.data.Dataset.map transformation.
This transformation can be parallelized across multiple cores with the num_parallel_calls option.
Sequential and parallel interleave
tf.data.Dataset.interleave offers the possibility of interleaving and allowing multiple datasets to execute in parallel (num_parallel_calls).
Caching
tf.data.Dataset.cache allows you to cache a dataset increasing performance.

References:

https://www.tensorflow.org/guide/data_performance

You are working on a new model together with your client, a large financial institution. The data you are dealing with contains PII (Personally Identifiable Information) contents.

You face 2 different sets of problems:

Transform data to hide personal information you don't need
Protect your work environment because certain combinations of personal data are useful for your model and you need to keep them

What are the solutions offered by GCP that it is advisable to use?

Cloud Armor security policies

Cloud Armor is a security service at the edge against attacks like DDoS.

Cloud HSM

Cloud HSM is a service for cryptography based on special and certified hardware and software

Cloud Data Loss Prevention

Cloud Data Loss Prevention is a service that can discover, conceal and mask personal information in data.

Network firewall rules

Network firewall rules are a set of rules that deny or block network traffic in a VPC, just network rules. VPC service-controls lets you define control at a more granular level, with context-aware access, suitable for multi-tenant environments like this one.

VPC service-controls

VPC service-controls is a service that lets you build a security perimeter that is not accessible from outside; in this way data exfiltration dangers are greatly mitigated. It is a network security service that helps protect data in a Virtual Private Cloud (VPC) in a multi-tenant environment.

References:

You are a junior Data Scientist and working on a deep neural network model with Tensorflow to optimize the level of customer satisfaction for after-sales services to create greater client loyalty.

You are struggling with your model (learning rates, hidden layers and nodes selection) for optimizing processing and letting it converge in the fastest way.

What is your problem in ML language?

Cross Validation

Cross Validation is related to the input data organization for training, test and validation.

Regularization

Regularization is related to feature management and overfitting.

Hyperparameter tuning

ML training Manages three main data categories:

Training data is also called examples or records. It is the main input for model configuration and, in supervised learning, presents labels, that are the correct answers based on past experience. Input data is used to build the model but will not be part of the model.
Parameters are instead the variables to be found to solve the riddle. They are part of the final model and they make the difference among similar models of the same type.
Hyperparameters are configuration variables that influence the training process itself: Learning rate, hidden layers number, number of epochs, regularization, batch size are all examples of hyperparameters.

Hyperparameters tuning is made during the training job and used to be a manual and tedious process, made by running multiple trials with different values.
The time required to train and test a model can depend upon the choice of its hyperparameters.
With Vertex AI you just need to prepare a simple YAML configuration without coding.

drift detection management

Drift management is when data distribution changes and you have to adjust the model.

References:

You work for an important organization. Your manager tasked you with a new classification model with lots of data drawn from the company Data Lake.

The big problem is that you don’t have the labels for all the data, but you have very little time to complete the task for only a subset of it.

Which of the following services could help you?

Vertex Data Labeling

In supervised learning, the correctness of label data, together with the quality of all your training data, is utterly important for the resulting model and the quality of the future predictions.
If you cannot have your data correctly labeled, you may request professional people to complete your training data.
GCP has a service for this: Vertex AI data labeling. Human labelers will prepare correct labels following your directions.
You have to set up a data labeling job with:

The dataset
A list, vocabulary of the possible labels
An instructions document for the professional people

Mechanical Turk

Mechanical Turk is an Amazon service.

GitLab ML

GitLab is a DevOps lifecycle tool.

Tag Manager

Tag Manager is in the Google Analytics ecosystem.

References:

https://cloud.google.com/vertex-ai/docs/datasets/data-labeling-job

Your company runs an e-commerce site. You manage several deep learning models with Tensorflow that process Analytics-360 data, and they have been in production for some time. The modeling is made essentially with customers and orders data. You need to classify many business outcomes.

Your Manager realized that different teams in different projects used to deal with the same features based on the same data differently. The problem arose when models drifted unexpectedly over time.

You have to advise your Manager on the best strategy. Which of the following do you choose?

Each group classifies their features and sends them to the other teams

It creates confusion and doesn't solve the problem.

For each model of the different features store them in Cloud Storage

It will not avoid feature definition overlapping. Cloud Storage is not enough for identifying different features.

Search for features in Cloud Storage and reuse them

It will not avoid feature definition overlapping. Cloud Storage is not enough for identifying different features.

Search the Vertex Feature Store for features that are the same

Insert or update the features in Vertex Feature Store accordingly

The best strategy is to use the Vertex Feature Store.
Vertex Feature Store is a service to organize and store ML features through a central store.
This allows you to share and optimize ML features important for the specific environment and to reuse them at any time.
Here is the typical procedure for using the Feature Store:

Check out the Vertex Feature Store for Features that you can reuse or use as a template.
If you don't find a Feature that fits perfectly, create or modify an existing one.
Update or insert features of your work in the Vertex Feature Store.
Use them in training work.
Sets up a periodic job to generate feature vocabulary data and optionally updates the Vertex Feature Store

References:

You are starting to operate as a Data Scientist. You speak with your mentor who asked you to prepare a simple model with a nonparametric Machine Learning algorithm of your choice. The problem is that you don’t know the difference between parametric and nonparametric algorithms. So you looked for it.

Which of the following methods are nonparametric?

Simple Neural Networks

With Neural Networks you have to figure out the parameters of a specific function that best fit the data

K-Nearest Neighbors

K-nearest neighbor is a simple supervised algorithm for both classification and regression problems.
You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter.

Decision Trees

A decision tree has a series of tests inside a flowchart-like structure. So, no mathematical functions to solve.

Logistic Regression

With Logistic Regression you have to figure out the parameters of a specific function that best fit the data.

The non-parametric method refers to a method that does not assume any distribution with any function with parameters.

References:

As a Data Scientist, you are involved in various projects in an important retail company. You prefer to use, whenever possible, simple and easily explained algorithms. Where you can't get satisfactory results, you adopt more complex and sophisticated methods. Your manager told you that you should try ensemble methods. Intrigued, you are documented.

Which of the following are ensemble-type algorithms?

Random Forests

Random forests are made with multiple decision trees, random sampling, a subset of variables and optimization techniques at each step (voting the best models).
AdaBoost is built with multiple decision trees, too, with the following differences:

It creates stumps, that is, trees with only one node and two leaves.
Stumps with less error win.
Ordering is built in such a way as to reduce errors.

DCN

Deep and Cross Networks are a new kind of Neural Networks.

Decision Tree

Decision Trees are flowchart like with a series of tests on the nodes.

XGBoost

XGBoost is currently very popular. It is similar to Gradient Boost with the following differences:

Leaf nodes pruning, that is regularization in order to keep the best ones for generalization
Newton Boosting instead of gradient descent, so math-based and faster
Correlation between trees reduction with an additional randomization parameter
Optimized algorithm for tree penalization

Gradient Boost

Gradient Boost is built with multiple decision trees, too, with the following differences from AdaBoost;

Trees instead stumps
It uses a loss function to minimize errors.
Trees are selected to predict the difference from actual values

Ensemble learning is performed by multiple learning algorithms working together for higher predictive performance.
Examples of Ensemble learning are: Random forests, AdaBoost, gradient boost, and XGBoost.
Two main concepts for combining algorithms;

Bootstrap sampling uses random samples and selects the best of them.
Bagging when you put together selected random samples to achieve a better result

References:

https://towardsdatascience.com/all-machine-learning-algorithms-you-should-know-in-2021-2e357dd494c7

Your team works for an international company with Google Cloud, and you develop, train and deploy several ML models with Tensorflow. You use many tools and techniques and you want to make your work leaner, faster, and more efficient.

You would like engineer-to-engineer assistance from both Google Cloud and Google’s TensorFlow teams.
Which of the following services can be used to achieve the above requirement?

Vertex AI

Vertex AI is a managed service

Kubeflow

Kubeflow is an open source library with standard support from the community

Tensorflow Enterprise

The TensorFlow Enterprise is a distribution of the open-source platform for ML, linked to specific versions of TensorFlow, tailored for enterprise customers.
It is free but only for big enterprises with a lot of services in GCP. it is prepackaged and optimized for usage with containers and VMs.
It works in Google Cloud, from VM images to managed services like GKE and Vertex AI.
The TensorFlow Enterprise library is integrated in the following products:

Deep Learning VM Images
Deep Learning Containers
Notebooks
Vertex AI Training

It is ready for automatic provisioning and scaling with any kind of processor.
It has a premium level of support from Google.

TFX

TFX is an open source library with standard support from the community

References:

https://cloud.google.com/tensorflow-enterprise/docs/overview

Your team works for a startup company with Google Cloud. You develop, train and deploy several ML models with Tensorflow. You use data in Parquet format and need to manage it both in input and output. You want the smoothest solution without adding infrastructure and keeping costs down.

Which one of the following options do you follow?

Cloud Dataproc

Cloud Dataproc is the managed Hadoop service in GCP. It uses Parquet but not Tensorflow out of the box. Furthermore, it’d be an additional cost.

TensorFlow I/O

TensorFlow I/O is a set of useful file formats, Dataset, streaming, and file system types management not available in TensorFlow's built-in support, like Parquet.
So the integration will be immediate without any further costs or data transformations.
Apache Parquet is an open-source column-oriented data storage format born in the Apache Hadoop environment but supported in many tools and used for data analysis.

Dataflow Flex Template

There will be an additional cost and additional data transformations.

BigQuery to TFRecords

There will be an additional cost and additional data transformations.

References:

You are starting to operate as a Data Scientist and speaking with your mentor who asked you to prepare a simple model with a lazy learning algorithm.

The problem is that you don’t know the meaning of lazy learning; so you looked for it.

Which of the following methods uses lazy learning?

Naive Bayes

Naive Bayes is a classification algorithm. The features have to be independent. It requires a small amount of training data.

K-Nearest Neighbors

K-nearest neighbor is a simple supervised algorithm for both classification and regression problems.
You begin with data that is already classified. A new example will be set by looking at the k nearest classified points. Number k is the most important hyperparameter.

Logistic Regression

With Logistic Regression you have to train the model and figure out the parameters of a specific function that best fit the data before the inference.

Simple Neural Networks

With Neural Networks you have to train the model and figure out the parameters of a specific function that best fit the data before the inference.

Semi-supervised learning

Semi-supervised learning is a family of classification algorithms with labeled and unlabeled data and methods to organize examples based on similarities and clustering. They have to set up a model and find parameters with training jobs.

Lazy learning means that the algorithm only stores the data of the training part without learning a function. The stored data will then be used for the evaluation of a new query point.

References:

Your company traditionally deals with the statistical analysis of data. The services have been integrated with ML models for forecasting for some years, but analyzes and simulations of all kinds are carried out.

So you are using two types of tools. But you have been told that it is possible to have more levels of integration between traditional statistical methodologies and those more related to AI / ML processes.

Which tool is the best one for your needs?

TensorFlow Hub

It doesn’t deal with traditional statistical methodologies.

TensorFlow Probability

TensorFlow Probability is a Python library for statistical analysis and probability, which can be processed on TPU and GPU.
TensorFlow Probability main features are:

Probability distributions and differentiable and injective (one to one) functions.
Tools for deep probabilistic models building.
Inference and Simulation methods support: Markov chain, Monte Carlo.
Optimizers such as Nelder-Mead, BFGS, and SGLD.

TensorFlow Enterprise

It doesn’t deal with traditional statistical methodologies.

TensorFlow Statistics

It doesn’t deal with traditional statistical methodologies.

References:

https://www.tensorflow.org/probability

Your team works for an international company with Google Cloud. You develop, train and deploy different ML models. You use a lot of tools and techniques and you want to make your work leaner, faster and more efficient.

Now you have the problem that you have to create a model for recognizing photographic images related to collaborators and consultants. You have to do it quickly, and it has to be an R-CNN model. You don't want to start from scratch. So you are looking for something that can help you and that can be optimal for the GCP platform.

Which of these tools do you think can help you?

TensorFlow-hub

TensorFlow Hub is ready to use repository of trained machine learning models.
It is available for reusing advanced trained models with minimal code.
The ML models are optimized for GCP.

GitHub

GitHub is public and for any kind of code.

GCP Marketplace Solutions

GCP Marketplace Solutions is a solution that lets you select and deploy software packages from vendors.

BigQuery ML Open

BigQuery ML Open is related to Open Data.

References:

https://www.tensorflow.org/hub

You work in a large company that produces luxury cars. The following models will have a control unit capable of collecting data on mileage and technical status to allow intelligent management of maintenance by both the customer and the service centers.

Every day a small batch of data will be sent that will be collected and processed in order to provide customers with the management of their vehicle health and push notifications in case of important messages.

Which GCP products are the most suitable for this project?

Pub/Sub

Pub/Sub for technical data messages

DataFlow

DataFlow for data management both in streaming and in batch mode
DataFlow manages data pipelines directed acyclic graphs (DAG) of transformations (PTransforms) on data (PCollections).
The same pipeline can activate multiple PTransforms.
All the processing can be performed both in batch and in streaming mode.
So, in our case of streaming data, Dataflow can:

Serialize input data
Preprocess and transform data
Call the inference function
Get the results and postprocess them

Dataproc

Dataproc is the managed Apache Hadoop environment for big data analysis usually for batch processing.

Firebase Messaging

Firebase Messaging for push notifications

References:

Your company does not have a great ML experience. Therefore they want to start with a service that is as smooth, simple and managed as possible.

The idea is to use BigQuery ML. Therefore you are considering whether it can cover all the functionality you need. Various projects start with the design and set up various models using various techniques and algorithms in your company.

Which of these techniques/algorithms is not supported by BigQuery ML?

Wide-and-Deep DNN models

ARIMA

Ensamble Boosted Model

CNN

The convolutional neural network (CNN) is a type of artificial neural network extensively used especially for image recognition and classification. It uses the convolutional layers, that is, the reworking of sets of pixels by running filters on the input pixels.
It is not supported because it is specialized for images.

The other answers are wrong because they are all supported by BigQuery ML.
Following the list of the current models and techniques.

MODEL_TYPE = { 'LINEAR_REG' | 'LOGISTIC_REG' | 'KMEANS' | 'PCA' | 'MATRIX_FACTORIZATION' | 'AUTOENCODER' | 'TENSORFLOW' | 'AUTOML_REGRESSOR' | 'AUTOML_CLASSIFIER' | 'BOOSTED_TREE_CLASSIFIER' | 'BOOSTED_TREE_REGRESSOR' | 'DNN_CLASSIFIER' | 'DNN_REGRESSOR' | 'DNN_LINEAR_COMBINED_CLASSIFIER' | 'DNN_LINEAR_COMBINED_REGRESSOR' | 'ARIMA_PLUS' }

References:

https://cloud.google.com/bigquery/docs/introduction

Your team is working on a great number of ML projects.

You need to appropriately collect and transform data and then create and tune your ML models.

In a second moment, these procedures will be inserted in an MLOps flow and therefore will have to be automated and be as simple as possible.

What are the methodologies / services recommended by Google?

Dataflow

Dataflow is an optimal solution for compute-intensive preprocessing operations because it is a fully managed autoscaling service for batch and streaming data processing.

BigQuery

BigQuery is a strategic tool for GCP. BigData at scale, machine learning, preprocessing with plain SQL are all important factors.

Tensorflow

TensorFlow has many tools for data preprocessing and transformation operations.
Main techniques are aimed to feature engineering (crossed_column, embedding_column, bucketized_column) and data transformation (tf.Transform library).

Cloud Fusion

Cloud Fusion is for ETL with a GUI, so with limited programming.

Dataprep

Dataprep is a tool for visual data cleaning and preparation.

References:

Your team is preparing a Deep Neural Network custom model with Tensorflow in Vertex AI that forecasts, based on diagnostic images, medical diagnoses. It is a complex and demanding job. You want to get help from GCP for hyperparameter tuning.

What are the parameters that you must indicate?

learning_rate

parameterServerType

parameterServerType is a parameter for infrastructure set up for a training job.

machineType

machineType is a parameter for infrastructure set up for a training job.

num_hidden_layers

With Vertex AI, it is possible to create a hyperparameter tuning job for LINEAR_REGRESSION and DNN.
You can choose many parameters. But in the case of DNN, you have to use a hyperparameter named learning_rate.
The ConditionalParameterSpec object lets you add hyperparameters to a trial when the value of its parent hyperparameter matches a condition that you specify (added automatically) and the number of hidden layers, that is num_hidden_layers.

References:

Your team needs to create a model for managing security in restricted areas of campus.

Everything that happens in these areas is filmed. Instead of having a physical surveillance service, the videos must be managed by a model capable of intercepting unauthorized people and vehicles, especially at particular times.

What are the GCP services that allow you to achieve all this with minimal effort?

AI Infrastructure

AI Infrastructure allows you to manage hardware configurations for ML systems and, in particular, the processors used to accelerate machine learning workloads.

Video Intelligence API

Video Intelligence API is a pre-configured and ready-to-use service, therefore not configurable for specific needs.

AutoML

AutoML allows you to customize the pre-trained Video GCP system according to your specific needs.
In particular, AutoML object tracking allows you to identify and locate particular entities of interest to you with your specific tags.

Vision API

Vision API is for images and not video.

References:

Your client has a large e-commerce Website that sells sports goods and especially scuba diving equipment.

It has a seasonal business and has collected a lot of sales data from its structured ERP and market trend databases.

It wants to predict the demand of its customers both to increase business and improve logistics processes.

What managed and fast-to-use GCP products can be used for these types of models?

AutoML

BigQuery ML

KubeFlow

KubeFlow is an open-source libraries that work with Tensorflow. So, they are not managed and so simple.
Moreover, it can work in an environment outside GCP that is a big advantage, but it is not in our requirements.
Kubeflow is a system for deploying, scaling and managing complex Tensorflow systems on Kubernetes.

TFX

TFX is an open-source libraries that work with Tensorflow. So, they are not managed and so simple.
Moreover, it can work in an environment outside GCP that is a big advantage, but it is not in our requirements.
TFX is a platform that allows you to create scalable production ML pipelines for TensorFlow projects.

We have in GCP the possibility to use a large number of models and platforms. But the fastest and most immediate modes are with AutoML and BigQuery ML; both support quick creation and fine-tuning of templates.

References:

You are consulting a CIO of a big firm regarding organization and cost optimization for his company's ML projects in GCP.

He asked: “How can I get the most from ML services and the least costs?”

What are the best practices recommended by Google in this regard?

Use Notebooks as ephemeral instances

It's incomplete

Set up an automatic shutdown routine

It's incomplete

Use Preemptible VMs per long-running interrumpible tasks

It's incomplete

Get monitoring alerts about GPU usage

It's incomplete

All of the above

Notebooks are used for a limited time, but they reserve VM and other resources. So you have to treat them as ephemeral instances, not as long-living ones.
You can configure an automatic shutdown routine when your instance is idle, saving money.
Preemptible VMs are far cheaper than normal instances and are OK for long-running (batch) large experiments.
You can set up the GPU metrics reporting script; it is important because GPU is expensive.

References:

https://cloud.google.com/solutions/machine-learning/best-practices-for-ml-performance-cost

Your team is working with a great number of ML projects, especially with Tensorflow.

You have to prepare a demo for the Manager and Stakeholders. You are certain that they will ask you about the understanding of the classification and regression mechanism. You’d like to show them an interactive demo with some cool interference.

Which of these tools is best for all of this?

Tensorboard

Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard.

Tableau

Tableau is a graphical tool for data reporting.

What-If Tool

The What-If Tool (WIT) is an open-source tool that lets you visually understand classification and regression ML models.
It lets you see data points distributions with different shapes and colors and interactively try new inferences.
Moreover, it shows which features affect your model the most, together with many other characteristics.
All without code.

Looker

Looker is a graphical tool for data reporting.

LIT

LIT is for NLP models.

References:

https://www.tensorflow.org/tensorboard/what_if_tool

Your team is working with a great number of ML projects, especially with Tensorflow.

You recently prepared an NLP model that works well and is about to be rolled out in production.

You have to prepare a demo for the Manager and Stakeholders for your new system of text and sentiment interpretation. You are certain that they will ask you for explanations and understanding about how a software may capture human feelings. You’d like to show them an interactive demo with some cool interference.

Which of these tools is best for all of this?

Tensorboard

Tensorboard provides visualization and tooling needed for experiments, not for explaining inference. You can access the What-If Tool from Tensorboard.

Tableau

Tableau is a graphical tool for data reporting.

What-If Tool

What-If Tool is for classification and regression models with structured data.

Looker

Looker is a graphical tool for data reporting.

LIT

The Language Interpretability Tool (LIT) is an open-source tool developed specifically to explain and visualize NLP natural language processing models.
It is similar to the What-If tool, which instead targets classification and regression models with structured data.
It offers visual explanations of the model's predictions and analysis with metrics, tests and validations.

References:

Your team is working with a great number of ML projects, especially with Tensorflow.

You recently prepared a DNN model for image recognition that works well and is about to be rolled out in production.

Your manager asked you to demonstrate the inner workings of the model.

It is a big problem for you because you know that it is working well but you don’t have the explainability of the model.

Which of these techniques could help you?

Integrated Gradient

Integrated Gradient is an explainability technique for deep neural networks which gives info about what contributes to the model’s prediction.
Integrated Gradient works highlight the feature importance. It computes the gradient of the model’s prediction output regarding its input features without modification to the original model.

LIT

LIT is only for NLP models

WIT

What-If Tool is only for classification and regression models with structured data.

PCA

Principal component analysis (PCA) transforms and reduces the number of features by creating new variables, from linear combinations of the original variables.
The new features will be all independent of each other.

References:

https://towardsdatascience.com/understanding-deep-learning-models-with-integrated-gradients-24ddce643dbf

You are working on a linear regression model with data stored in BigQuery. You have a view with many columns. You want to make some simplifications for your work and avoid overfitting. You are planning to use regularization. You are working with Bigquery ML and preparing the query for model training. You need an SQL statement that allows you to have all fields in the view apart from the label.

Which one do you choose?

ROLLUP

ROLLUP is a group function for subtotals.

UNNEST

UNNEST gives the elements of a structured file.

EXCEPT

SQL and Bigquery are powerful tools for querying and manipulating structured data.
EXCEPT gives all rows or fields on the left side except the one coming from the right side of the query.
Example:
SELECT
EXCEPT(mylabel) myvalue AS label

LAG

LAG returns the field value on a preceding row.

References:

Your team is preparing a multiclass logistic regression model with tabular data.

The environment is Vertex AI with AutoML, and your data is stored in a CSV file in Cloud Storage.

AutoML can perform transformations on the data to make the most of it.

Which of the following types of transformations are you not allowed, based on your requirements?

Categorical

Text

Timestamp

Array

With complex data like Arrays and Structs, transformations are available only by using BigQuery, which supports them natively.
All the other kinds of data are also supported for CSV files, as stated in the referred documentation.

Number

References:

You are a junior Data Scientist, and you work in a Governmental Institution.

You are preparing data for a linear regression model for Demographic research. You need to choose and manage the correct feature.

Your input data is in BigQuery.

You know very well that you have to avoid multicollinearity and optimize categories. So you need to group some features together and create macro categories.

In particular, you have to join country and language in one variable and divide data between 5 income classes.

Which ones of the following options can you use?

FEATURE_CROSS

A feature cross is a new feature that joins two or more input features together. (The term cross comes from cross product.) Usually, numeric new features are created by multiplying two or more other features.

ARRAY_CONCAT

ARRAY_CONCAT joins one or more arrays (number or strings) into a single array.

QUANTILE_BUCKETIZE

QUANTILE_BUCKETIZE groups a continuous numerical feature into categories with the bucket name as the value based on quantiles.
Example: ML.FEATURE_CROSS STRUCT(country, language) AS origin)
and ML.QUANTILE_BUCKETIZE → income_class

ST_AREA

ST_AREA returns the number of square meters covered by a GEOGRAPHY area.

References:

You are a junior Data Scientist and you need to create a multi-class classification Machine Learning model with Keras Sequential model API.

You have been asked which activation function to use.

Which of the following do you choose?

ReLU

ReLU (Rectified Linear Unit): half rectified. f(z) is zero when z is less than zero and f(z) is equal to z when z. It returns one value

Softmax

Softmax is for multi-class classification what Sigmoid is for logistic regression. Softmax assigns decimal probabilities to each class so that their sum is 1.

SIGMOID

Sigmoid is for logistic regression and therefore returns one value from 0 to 1.

TANH

Tanh or hyperbolic tangent is like sigmoid but returns one value from -1 to 1.

References:

Your team is working on a great number of ML projects for an international consulting firm.

The management has decided to store most of the data to be used for ML models in BigQuery.

The motivation is that BigQuery allows for preprocessing and transformations easily and with standard SQL. It is highly structured; so it offers efficiency, integration and security.

Your team must create and modify code to directly access BigQuery data for building models in different environments.

What are the tools you can use?

Tf.data.dataset

tf.data.dataset reader for BigQuery is the way to connect directly to BigQuery from TensorFlow or Keras.

BigQuery Omni

BigQuery Omni is a multi-cloud analytics solution. You can access from BigQuery data across Google Cloud, Amazon Web Services (AWS), and Azure.

BigQuery Python client library

For any other framework, you can use BigQuery Python client library

BigQuery I/O Connector

BigQuery I/O Connector is the way to connect directly to BigQuery from Dataflow.

References:

Your team has prepared a Multiclass logistic regression model with tabular data in the Vertex AI with AutoML environment. Everything went very well. You appreciated the convenience of the platform and AutoML.

What other types of models can you implement with AutoML?

Image Data

Text Data

Cluster Data

Cluster Data may be related to unsupervised learning; that is not supported by AutoML.

Video Data

AutoML on Vertex AI can let you build a code-free model. You have to provide training data.
The types of models that AutoML on Vertex AI can build are created with image data, tabular data, text data, and video data.

References:

With your team, you have to decide the strategy for implementing an online forecasting model in production. This template needs to work with both a web interface as well as DialogFlow and Google Assistant. A lot of requests are expected.

You are concerned that the final system is not efficient and scalable enough. You are looking for the simplest and most managed GCP solution.

Which of these can be the solution?

Vertex AI online prediction

The Vertex AI prediction service is fully managed and automatically scales machine learning models in the cloud.
The service supports both online prediction and batch prediction.

GKE e TensorFlow

GKE e TensorFlow are not managed services.

VMs and Autoscaling Groups with Application LB

VMs and Autoscaling Groups with Application LB are not managed services.

Kubeflow

Kubeflow is not a managed service. It is used in Vertex AI and lets you deploy ML systems in various environments.

References:

You work in a medium-sized company as a developer and data scientist and use the managed ML platform, Vertex AI.

You have updated an AutoML model and want to deploy it to production. But you want to maintain both the old and the new version at the same time. The new version should only serve a small portion of the traffic.

What can you do?

Save the model in a Docker container image

You don’t have to create a Docker container image with AutoML.

Deploy on the same endpoint

Update the Traffic split percentage

Create a Canary Deployment with Cloud Build

Canary Deployment with Cloud Build is a procedure used in CI/CD pipelines. There is no need in such a managed environment.

The correct procedure is:

Deploy your model to an existing endpoint.
Update the Traffic split percentage in such a way that all of the percentages add up to 100%.

References:

https://cloud.google.com/vertex-ai/docs/predictions/deploy-model-console

You and your team are working for a large consulting firm. You are preparing an NLP ML model to classify customer support needs and to assess the degree of satisfaction. The texts of the various communications are stored in different storage.

What types of storage should you avoid in the managed environment of GCP ML, such as Vertex AI?

Cloud Storage

BigQuery

Filestore

Block Storage

Google advises avoiding data storage for ML in block storage, like persistent disks or NAS like Filestore.
They are more difficult to manage than Cloud Storage or BigQuery.
Likewise, it is strongly discouraged to read data directly from databases such as Cloud SQL. So, it is strongly recommended to store data in BigQuery and Cloud Storage.
Similarly, avoid reading data directly from databases like Cloud SQL.

References:

You are working with Vertex AI, the managed ML Platform in GCP.

You want to leverage Explainable AI to understand which are the most essential features and how they influence the model.

For what kind of model may you use Vertex Explainable AI?

AutoML

Image Classification

DNN

Decision Tree

Decision Tree Models are explainable without any sophisticated tool for enlightenment.

Deep Learning is known to give little comprehension about how a model works in detail.
Vertex Explainable AI helps to detect it, both for classification and regression tasks. So these functions are useful for testing, tuning, finding biases and thus improving the process.
You can get explanations from Vertex Explainable AI both for online and batch inference but only regarding these ML models:

Structured data models (AutoML, classification and regression)
Custom-trained models with tabular data and images

References:

You work as a Data Scientist in a Startup and you work with several project with Python and Tensorflow;

You need to increase the performance of the training sessions and you already use caching and prefetching.

So now you want to use GPUs, but in a single machine, for cost reduction and experimentations.

Which of the following is the correct strategy?

tf.distribute.MirroredStrategy

tf.distribute.Strategy is an API explicitly for training distribution among different processors and machines.
tf.distribute.MirroredStrategy lets you use multiple GPUs in a single VM, with a replica for each CPU.

tf.distribute.TPUStrategy

tf.distribute.TPUStrategy let you use TPUs, not GPUs

tf.distribute.MultiWorkerMirroredStrategy

tf.distribute.MultiWorkerMirroredStrategy is for multiple machines

tf.distribute.OneDeviceStrategy

tf.distribute.OneDeviceStrategy, like the default strategy, is for a single device, so a single virtual CPU.

References:

You work as a junior Data Scientist in a Startup and work with several projects with Python and Tensorflow in Vertex AI. You deployed a new model in the test environment and detected some problems that are puzzling you.

An experienced colleague of yours asked for the logs. You found out that there is no logging information available. What kind of logs do you need and how do you get them?

You need to Use Container logging

You need to Use Access logging

You can enable logs dynamically

You have to undeploy and redeploy

In Vertex AI, you may enable or avoid logs for prediction. When you want to change, you must undeploy and redeploy.
There are two types of logs:

Container logging, which logs data from the containers hosting your model; so these logs are essential for problem solving and debugging.
Access logging, which logs accesses and latency information.

Therefore, you need Container logging.

References:

https://cloud.google.com/vertex-ai/docs/predictions/online-prediction-logging

You are a junior Data Scientist working on a logistic regression model to break down customer text messages into two categories: important / urgent and unimportant / non-urgent.

You want to find a metric that allows you to evaluate your model for how well it separates the two classes. You are interested in finding a method that is scale invariant and classification threshold invariant.

Which of the following is the optimal methodology?

Log Loss

Log Loss is a loss function used especially for logistic regression; it measures loss. So it is highly dependent on threshold values.

One-hot encoding

One-hot encoding is a method used in feature engineering for obtaining better regularization and independence.

ROC- AUC

The ROC curve (receiver operating characteristic curve) is a graph showing the behavior of the model with positive guesses at different classification thresholds.
It plots and relates each others two different values:

TPR: true positives / all actual positives
FPR: false positives / all actual negatives

The AUC (Area Under the Curve) index is the area under the ROC curve and indicates the capability of a binary classifier to discriminate between two categories. Being a probability, it is always a value between 0 and 1. Hence it is a scale invariant.
It provides divisibility between classes. So it is independent of the chosen threshold value; in other words, it is threshold-invariant.
When it is equal, it is 0.5 indicating that the model randomly foresees the division between two classes, similar to what happens with heads and tails when tossing coins.

Mean Square Error

Mean Square Error is the most frequently used loss function used for linear regression. It takes the square of the difference between predictions and real values.

Mean Absolute Error

Mean Absolute Error is a loss function, too. It takes the absolute value of the difference between predictions and actual outcomes.

References:

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

You work as a junior Data Scientist in a consulting company and work with several projects with Tensorflow. You prepared and tested a new model, and you are optimizing it before deploying it in production. You asked for advice from an experienced colleague of yours. He said that it is not advisable to deploy the model in eager mode.

What can you do?

Configure eager_execution=no

There is no such parameter as eager_execution = no. Using graphs instead of eager execution is more complex than that.

Use graphs

Use tf.function decoration function

Create a new tf.Graph

When you develop and test a model, the eager mode is really useful because it lets you execute operations one by one and facilitate debugging.
But when in production, it is better to use graphs, which are data structures with Tensors and integrated computations Python independent. In this way, they can be deployed on different devices (like mobiles) and are optimizable.
To do that, you have to use tf.function decoration function for a new tf.Graph creation.

References:

In your company, you train and deploy several ML models with Tensorflow. You use on-prem servers, but you often find it challenging to manage the most expensive training.

Checking and updating models create additional difficulties. You are undecided whether to use Vertex Pipelines and Kubeflow Pipelines. You wonder if starting from Kubeflow, you can later switch to a more automated and managed system like Vertex AI.

Which of these answers are correct?

Kubeflow pipelines and Vertex Pipelines are incompatible

You may use Kubeflow Pipelines written with DSL in Vertex AI

Kubeflow pipelines work only in GCP

Kubeflow pipelines may work in any environment

Kubeflow pipelines may use Kubernetes persistent volume claims (PVC)

Vertex Pipelines can use Cloud Storage FUSE

Vertex AI Pipelines is a managed service in GCP.
Kubeflow Pipelines is an open-source tool based on Kubernetes and Tensorflow for any environment.
Vertex AI support code written with Kubeflow Pipelines SDK v2 domain-specific language (DSL).
Like any workflow in Kubernetes, access to persistent data is performed with Volumes and Volume Claims.
Vertex Pipelines can use Cloud Storage FUSE. So Vertex AI can leverage Cloud Storage buckets like file systems on Linux or macOS.

References:

Your company runs a big retail website. You develop many ML models for all the business activities.

You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow, and BigQuery ML. You also use BigTable and CloudSQL, and Cloud Storage, of course. You need to use input tabular data in CSV format. You are working with Vertex AI.

How do you manage them in the best way?

Vertex AI manage any CSV automatically, no operations needed

You have to setup an header and column names may have only alphanumeric character and underscore

Vertex AI cannot handle CSV files

Delimiter must be a comma

You can import only a file max 10GB

You can import multiple files, each one max 10GB.

Vertex AI manages CSV files automatically. But you need to have headers only with alphanumeric characters and underscores with commas as delimiters.

References:

Your company is a Financial Institution. You develop many ML models for all the business activities. You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow, and BigQuery ML.

You are now working on an international project with other partners. You need to use the Vertex AI. You are asking experts which the capabilities of this managed suite of services are.

Which elements are integrated into Vertex AI?

Training environments and MLOps

Training Pipelines, Datasets, Custom tooling, AutoML, Models Management and inference environments (endpoints)

Vertex AI covers all the activities and functions listed: from Training Pipelines (so MLOps), to Data Management (Datasets), custom models and AutoML models management, custom tooling and libraries deployment and monitoring.
So, all the other answers are wrong because they cover only a subset of Vertex functionalities.

Deployment environments

Training Pipelines and Datasets for data sources

References:

You are a junior data scientist working on a logistic regression model to break down customer text messages into important/urgent and important / not urgent. You want to use the best loss function that you can use to determine your model's performance.

Which of the following is the optimal methodology?

Log Loss

With a logistic regression model, the optimal loss function is the log loss.
The intuitive explanation is that when you want to emphasize the loss of bigger mistakes, you need to find a way to penalize such differences.
In this case, it is often used the square loss. But in the case of probabilistic values (between 0 and 1), the squaring decreases the values; it does not make them bigger.
On the other hand, with a logarithmic transformation, the process is reversed: decimal values get bigger.
In addition, logarithmic transformations do not modify the minimum and maximum characteristics (monotonic functions).
These are some of the reasons why they are widely used in ML.
Pay attention to the difference between loss function and ROC/AUC, which is useful as a measure of how well the model can discriminate between two categories.
You may have two models with the same AUC but different losses.

Mean Square Error

Mean Square Error, as explained, would penalize higher errors.

Mean Absolute Error

Mean Absolute Error takes the absolute value of the difference between predictions and actual outcomes. So, it would not empathize higher errors.

Mean Bias Error

Mean Bias Error takes just the value of the difference between predictions and actual outcomes. So, it compensates positive and negative differences between predicted/actual values. It is used to calculate the average bias in the model.

Softmax

softmax is used in multi-class classification models which is clearly not suitable in the case of a binary-class logarithmic loss.

References:

You have just started working as a junior Data Scientist in a Startup. You are involved in several projects with Python and Tensorflow in Vertex AI.
You are starting to get interested in MLOps and are trying to understand the different processes involved.
You have prepared a checklist, but inside there is a service that has nothing to do with MLOps.

Which one?

CI/CD

Source Control Tools

Data Pipelines

CDN

Cloud CDN is the service that caches and delivers static content from the closest locations (edge locations) to customers to accelerate web and mobile applications. This is a very important service for the Cloud but out of scope for MLOps.
MLOps covers all processes related to ML models; experimentation, preparation, testing, deployment and above all continuous integration and delivery.
The MLOps environment is designed to provide (some of) the following:

Environment for testing and experimentation
Source control, like Github
CI/CD Continuous integration/continuous delivery
Container registry: custom Docker images management
Feature Stores
Training services
Metadata repository
Artifacts repository
ML pipelines orchestrators
Data warehouse/ storage and scalable data processing for batch and streaming data.
Prediction service both batch and online.

So, all the other answers describe MLOps functionalities.

Artifact Registry, Container Registry

References:

You are working with Vertex AI, the managed ML Platform in GCP.

You want to leverage Vertex Explainable AI to understand the most important features and how they influence the model.

Which three methods does Vertex AI leverage for feature attributions?

sampled Shapley

integrated gradients

Maximum Likelihood

Maximum Likelihood is a probabilistic method for determining the parameters of a statistical distribution.

XRAI

Deep Learning is known to give little comprehension about how a model works in detail.
Vertex Explainable AI helps to detect it, both for classification and regression tasks. So, these functions are useful for testing, tuning, finding biases and thus improving the process.
It uses three methods for feature attributions:

sampled Shapley: Uses scores for each feature and their permutations
integrated gradients: computes the gradient of the features at different points, integrates them and computes the relative weights
XRAI is an optimization of the integrated gradients method

References:

Your company produces and sells a lot of different products.

You work as a Data Scientist. You train and deploy several ML models.

Your manager just asked you to find a simple method to determine affinities between different products and categories to give sellers and applications a wider range of suitable offerings for customers.

The method should give good results even without a great amount of data.

Which of the following different techniques may help you better?

One-hot encoding

One-hot encoding is a method used in feature engineering for obtaining better regularization and independence.

Cosine Similarity

In a recommendation system (like with the Netflix movies) it is important to discover similarities between products so that you may recommend a movie to another user because the different users like similar objects.
So, the problem is to find similar products as a first step.
You take two products and their characteristics (all transformed in numbers). So, you have two vectors.
You may compute differences between vectors in the euclidean space. Geometrically, that means that they have different lengths and different angles.

Matrix Factorization

Matrix Factorization is correctly used in recommender systems. Still, it is used with a significant amount of data, and there is the problem of reducing dimensionality.

PCA

Principal component analysis is a technique to reduce the number of features by creating new variables.

References:

Your company runs a big retail website. You develop many ML models for all the business activities.

You migrated to Google Cloud. Your models are developed with PyTorch, TensorFlow and BigQuery ML.

You are now working on an international project with other partners.

You need to let them use your Vertex AI dataset in Cloud Storage for a different organization.

What can you do?

Let them use your GCP Account

It is wrong mainly for security reasons.

Exporting metadata and annotations in a JSONL file

Exporting metadata and annotations in a CSV file

Annotations are written in JSON files.

Give access (Service account or signed URL) to the Cloud Storage file

Copy the data in a removable storage

It is wrong mainly for security reasons.

You can export a Dataset; when you do that, no additional copies of data are generated. The result is only JSONL files with all the useful information, including the Cloud Storage files URIs.
But you have to grant access to these Cloud Storage files with a Service account or a signed URL, if to be used outside GCP.

References:

You work as a junior Data Scientist in a consulting company, and you work with several ML projects.

You need to properly collect and transform data and then work on your ML models. You want to identify the services for data transformation that are most suitable for your needs. You need automatic procedures triggered before training.

What are the methodologies / services recommended by Google?

Dataflow

BigQuery

Tensorflow

Cloud Composer

Cloud Composer is often used in ML processes, but as a workflow tool, not for data transformation.

Google primarily recommends BigQuery, because this service allows you to efficiently perform both data and feature engineering operations with SQL standard.
In other words, it is suitable both to correct, divide and aggregate the data, and to process the features (fields) merging, normalizing and categorizing them in an easy way.
In order to transform data in advanced mode, for example, with window-aggregation feature transformations in streaming mode, the solution is Dataflow.
It is also possible to perform transformations on the data with Tensorflow (tf.transform), such as creating new features: crossed_column, embedding_column, bucketized_column.
It is important to note that with Tensorflow these transformations become part of the model and will be integrated into the graph that will be produced when the SavedModel is created.
Look at the summary table at this link for a complete overview.

References:

You just started working as a junior Data Scientist in a consulting Company. You are in a project team that is building a new model and you are experimenting. But the results are absolutely unsatisfactory because your data is dirty and needs to be modified.
In particular, you have various fields that have no value or report NaN. Your expert colleague told you that you need to carry out a procedure that modifies them at the time of acquisition. What kind of functionalities do you need to provide?

Delete all records that have a null/NaN value in any field

The common practice is to delete records / examples that are completely wrong or completely lacking information (all null values).

Compute Mean / Median for numeric measures

Replace Categories with the most frequent one

Use another ML model for missing values guess

The most frequent methodologies have been listed.
In the case of numerical values, substituting the mean generally does not distort the model (it depends on the underlying statistical distribution).
In the case of categories, the most common method is to replace them with the more frequent values.
There are often multiple categories in the data. So, in this way, the effect of the missing category is minimized, but the additional values of the current example are used.

References:

You just started working as a junior Data Scientist in a consulting Company.

The job they gave you is to perform Data cleaning and correction so that they will later be used in the best possible way for creating and updating ML models.

Data is stored in files of different formats.

Which GCP service is best to help you with this business?

BigQuery

BigQuery could obviously query and update data. But you need to preprocess data and prepare queries and procedures.

Dataprep

Dataprep is an end-user service that allows you to explore, clean and prepare structured and unstructured data for many purposes, especially for machine learning.
It is completely serverless. You don’t need to write code or procedures.

Cloud Compose

Cloud Compose is for workflow management, not for Data preparation.

Dataproc

Dataproc is a fully managed service for the Apache Hadoop environment.

You are supporting a group of data analysts who want to build ML models using a managed service. They also want the ability to customize their models and tune hyperparameters. What managed service in Google Cloud would you recommend?

Vertex AI custom training

Vertex AI custom training allows for tuning hyperparameters.

Vertex AI AutoML

Vertex AI AutoML training tunes hyperparameters for you.

Cloud TPUs

Cloud TPUs are accelerators you can use to train large deep learning models.

Cloud GPUs

Cloud GPUs are accelerators you can use to train large deep learning models.

BigQuery ML does not allow for hyperparameter tuning.

References:

https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform

You have created a Compute Engine instance with an attached GPU but the GPU is not used when you train a Tensorflow model. What might you do to ensure the GPU can be used for training your models?

Install GPU drivers

GPU drivers need to be installed if they are not installed already when using GPUs.
Deep Learning VM images have GPU drivers installed but if you don't use an image with GPU drivers installed, you will need to install them.

Use Pytorch instead of Tensorflow

Using Pytorch instead of Tensorflow will require work to recode and Pytorch would not be able to use GPUs either if the drivers are not installed.

Grant the Editor basic role to the VM service account

Granting a new role to the service account of the VM will not address the need to install GPU drivers.

Update Python 2.7 on the VM

Updating Python will not address the problem of missing drivers.

References:

https://cloud.google.com/compute/docs/gpus/install-drivers-gpu

A financial services company wants to implement a chatbot service to help direct customers to the best customer support team for their questions. What GCP service would you recommend?

Text-to-Speech API

Text-to-Speech converts text words to human voice-like sound.

Speech-to-Text API

Speech-to-Text converts spoken words to written words.

AutoML

AutoML is a machine learning service.

Dialogflow

Dialogflow is a service for creating conversational user interfaces.

References:

https://cloud.google.com/dialogflow/docs

You lead a team of machine learning engineers working for an IoT startup. You need to create a machine learning model to predict the likelihood of a device failure in manufacturing environments. The device generates a stream of metrics every 60 seconds. The metrics include 2 categorical values, 7 integer values, and 1 floating point value. The floating point value ranges from 0 to 100. For the purposes of the model, the floating point value is more precise than needed. Mapping that value to a feature with possible values "high", "medium", and "low" is sufficient. What feature engineering technique would you use to transform the floating point value to high, medium, or low?

L1 Regularization

Regularization is the limiting of information captured by a model to prevent overfishing;
L1 and L2 are two examples of regularization techniques.

Normalization

Normalization is a transformation that scales numeric values to the range 0 to 1.

Bucketing

In this case, values from 0 to 33 could be low, 34 to 66 could be medium, and values greater than 66 could be high.

L2 Regularization

Regularization is the limiting of information captured by a model to prevent overfishing;
L1 and L2 are two examples of regularization techniques.

You have trained a machine learning model. After training is complete, the model scores high on accuracy and F1 score when measured using training data; however, when validation data is used, the accuracy and F1 score are much lower. What is the likely cause of this problem?

Overfitting

This is an example of overfitting because the model has not generalized from the training data.

Underfitting

Underfitting would have resulted in poor performance with training data.

Insufficiently complex model

Insufficiently complex models can lead to underfitting but not overfitting.

The learning rate is too small

A small learning rate will lead to longer training times but would not cause the described problem.

You are building a machine learning model using random forests. You haven't achieved the precision and recall you would like. What hyperparameter or hyperparameters would you try adjusting to improve accuracy?

Number of trees only

It's incomplete

Number of trees and depth of trees

Both are hyperparameters that could be adjusted to improve accuracy.

Number of clusters

Random forests do not use the concept of clusters.

Learning rate

Random forest algorithms does not use a learning rate hyperparameter.

A logistics analyst wants to build a machine learning model to predict the number of units of a product that will need to be shipped to stores over the next 30 days. The features they will use are all stored in a relational database. The business analyst is familiar with reporting tools but not programming in general. What service would you recommend the analyst use to build a model?

Spark ML

Spark ML is suitable for modelers with programming skills.

AutoML

It uses structured data to build models with little input from users.

Bigtable ML

There is no Bigtable ML but BigQuery ML is a managed service for building machine learning models in BigQuery using SQL.

TensorFlow

Tensorflow is suitable for modelers with programming skills.

When testing a regression model to predict the selling price of houses. After several iterations of model building, you note that small changes in a few features can lead to large differences in the output. This is an example of what kind of problem?

Low variance

Low variance is desired in ML models and is not a problem.

High variance

Low bias

Low bias is desired in ML models and is not a problem.

High bias

High bias occurs when relationships are missed.

You are an ML engineer with a startup building machine learning models for the pharmaceutical industry. You are currently developing a deep learning machine learning model to predict the toxicity of drug candidates. The training data set consists of a large number of chemical and physical attributes and there is a large number of instances. Training takes several days on an n2 type Compute Engine virtual machine. What would you recommend to reduce the training time without compromising the quality of the model?

Use TPUs

TPUs are designed to accelerate the dominant computation in deep learning model training.

Randomly sample 20% of the training set and train on that smaller data set

Using a smaller data set by sampling would reduce training time but would likely compromise the quality of the model.

Increase the machine size to make more memory available

Increasing memory may reduce training time if memory is constrained but it will not decrease training time as much as other option.

Increase the machine size to make more CPUs available

Increasing CPUs would improve performance but not as much or as other option.

References:

https://cloud.google.com/tpu/docs/intro-to-tpu#TPU

You want to evaluate a classification model using the True Positive Rate and the False Positive Rate. You want to view a graph showing the performance of the model at all classification thresholds. What evaluation metric would you use?

Area under the ROC curve (AUC)

Area under the ROC curve (AUC) is a graph of True Positive and False Positive rates.

Precision

Precision is a measure of the quality of positive predictions.

F1 Score

F1 Score is a harmonic mean of precision and recall.

L2 Regularization

L2 Regularization is a technique to prevent overfitting.

You are building a machine learning model and during the data preparation stage, you preform normalization and standardization using the full data set. You then split the full data set into training, validation, and testing data sets. What problem could be introduced by performing the steps in the order described?

Regularization

Regularization is a technique to prevent overfitting.

Data leakage

This is an example of data leakage because you are making additional data available during training that would not be available when running predictions, in this case, additional information is used to perform normalization and standardization.

Introduction of bias

No bias is introduced

Imbalanced classes

There is no indication that classes are imbalanced

A simple model based on hand-coded heuristics or a simple algorithms such as a linear model is often built early in the model training process. What is the purpose of such as model?

It provides a baseline for the minimum performance to expect in an ML model

It provides the maximum expected performance in an ML model

Simple models do not provide indication of maximum performance.

It provides a measure of the likelihood of underfitting

A simple model could underfit and would be expected.

It provides a measure of the likelihood of overfitting

Simple models are not likely to overfit.

What characteristics of feature values do we try to find when using descriptive statistics for data exploration?

Central tendency only

It's incomplete

Spread of values only

It's incomplete

Central tendency and spread of values

Descriptive statistics are used to measure both central tendency and the spread of values.

Likelihood to contribute to a prediction

The likelihood of contributing to a prediction is not measured until after a model is created.

You are building a classification model to detect fraud in credit card transactions. When exploring the training data set you notice that 2% of instances are examples of fraudulent transactions and 98% are legitimate transactions. This is an example of what kind of data set?

An imbalanced data set

This is an imbalanced data set because one class has significantly more instances than the others.

A standardized data set

Standardization is a technique for preparing the data set.

A normalized data set

Normalization is a technique for preparing the data set.

A marginalized data set

There is no such thing as a marginalized data set in machine learning.

Which of the following techniques can be used when working with imbalanced data sets?

Collecting more data

It's incomplete

Resampling

It's incomplete

Generating synthetic data using an algorithm such as SMOTE

It's incomplete

All of the above

A team of machine learning engineers is training an image recognition model to detect defects in manufactured parts. The team has a data set of 10,000 images but wants to train with at least 30,000 images. They do not have time to wait for an additional set of 20,000 images to be collected on the factory floor. What type of technique could they use to produce a data set with 30,000 images?

Normalization

Normalization is a data preparation technique.

Data augmentation

Data augmentation is a set of techniques for artificially increasing the number of instances in a data set by manipulating other instances.

Data leakage

Data leakage is the use of data in training that is not available during prediction and is unwanted.

Imbalanced classes

Imbalanced classes is not a technique for expanding the size of a dataset.

You are using distributed training with TensorFlow. What type of server stores parameters and coordinates shared model state across workers?

Parameter servers

Parameter servers store model parameters and share state.

State servers

There is no state servers.

Evaluators

Evaluators evaluate models.

Primary replica

Primary replicas manage other nodes.

A dataset includes multiple categorical values. You want to train a deep learning neural network using the data set. Which of the following would be an appropriate data encoding scheme?

One-hot encoding

One-hot encoding is an appropriate encoding technique to map categorical values to a bit vector.

Categorical encoding

Categorical values themselves are not suitable input to a deep learning network.

Regularization

Regularization is is used to prevent overfitting.

Normalization

Normalization is a data preparation operation.

A dataset you are using has categorical values mapped to integer values, such as red to 1, blue to 2, and green to 3. What kind of encoding scheme is this?

One-hot encoding

One-hot encoding maps to a bit vector with only one bit set to one.

Feature hashing

Feature hashing applies a hash function to compute a representation.

Ordinal encoding

Data augmentation

Data augmentation is not an encoding scheme, it is a set of techniques for increasing the size of a data set.

Which of the following are ways bias can be introduced in a machine learning model?

Biased data distribution

Biased data can introduce bias in a machine model.

Proxy variables

Proxy variables can introduce bias in a machine model.

Data leakage

Data leakage can cause problems but is not likely to introduce bias that isn't already in the data set.

Data augmentation

Data augmentation can continue to represent bias in a data set but does not introduce new bias.

Normalization

Normalization is a data preparation operations.

A machine learning engineer detects non-linear relationships between two variables in a dataset. The dataset is relatively small and it is expensive to acquire new examples. What can the machine learning engineer do to increase the performance of the model with respect to the non-linear relationship detected?

Use a deep learning network

A deep learning network can also learn non-linear relationships but they require large volumes of data.

Use regularization

Regularization is a set of techniques for preventing overfitting.

Create a feature cross

A feature cross could capture the non-linear relationship.

Use data leakage

Data leakage is unwanted in a machine learning model.

You have a dataset with more features than you believe you need to train a model. You would like to measure how well two numerical values linearly correlate so you can eliminate one of them if they highly correlate. What statistical test would you use?

Pearson's Correlation

The Pearson's Correlation is used for measuring the linear correlation between two variables.

ANOVA

ANOVA is used to measure the difference among means.

Kendall's Rank Coefficient

Kendall's Rank Coefficient is used for measuring numeric and categorical correlations.

Chi-Squared Test

The Chi-Squared test is used for measuring the correlation between categorical values.

You have a dataset with more features than you believe you need to train a model. You would like to measure how well two categorical values linearly correlate so you can eliminate one of them if they highly correlate. What statistical test would you use?

Pearson's Correlation

Pearson's Correlation is used for measuring the linear correlation between two variables.

ANOVA

ANOVA is used to measure the difference among means.

Chi-Squared Test

The Chi-Squared test is used for measuring the correlation between categorical values.

Kendall's Rank Coefficient

Kendall's Rank Coefficient is used for measuring numeric and categorical correlations.

Which of the following types of pre-built containers are available in Vertex AI?

TensorFlow Optimized Runtime

TensorFlow Optimized Runtime is available in Vertex AI pre-built containers.

Theano

Theano is a machine learning platforms but not available as pre-built containers.

Hadoop Mahout

Hadoop Mahout is a machine learning platforms but not available as pre-built containers.

XGBoost

XGBoost is available in Vertex AI pre-built containers.

Scikit-Learn

Scikit-Learn is available in Vertex AI pre-built containers.

Which of the following are required of a custom container used with Vertex AI?

Support for health checks and liveliness checks

Custom container images running in Vertex AI must have support health checks and liveliness checks.

Request and response message size may be no more than 10 MB

Request and response message sizes must be 1.5MB or less.

Running an HTTP server

Custom container images running in Vertex AI must have an HTTP server

Include GPU drivers

Include support for TPUs or GPUs

Support for GPUs or TPUs is not required.

You are training large deep learning networks in Kubernetes Engine and want to use a cost-effective accelerator. You do not need high precision floating point operations. What would you choose?

GPUs

GPUs are high precision accelerators.

TPUs

Tensor processing units (TPUs) are lower precision accelerators designed for TensorFlow operations and cost less than GPUs.

ASICs

ASICs are a general class of application specific integrated circuits.

CPUs

CPUs are central processing units and are not considered accelerators.

Several datasets you use for training ML models have missing data. You consider deleting rows with missing data. In which case would you not want to delete instances with missing data?

When a significant portion of the instances are missing data

You would not want to delete instance with missing data when a significant portion of the instances are missing data because you would lose many instances.

When a small number of instances are missing data

When a small number of instance are missing data, removing those instances would not adversely affect results.

When instances are missing data for more than one feature

Since all data for all features are removed when removing a row with any missing data, the number of features with missing data does not impact the final results.

when instances are missing data for more than three features

Since all data for all features are removed when removing a row with any missing data, the number of features with missing data does not impact the final results.

When is it appropriate to use the Last Observed Value Carried Forward strategy for missing data?

When working with time series data

The Last Observed Value Carried Forward strategy works well with time series data.

When working with categorical data and a small number of values

Categorical values with a small number of possible values is not a good candidate since the previous value may have not relation to next instance in the data set.

When overfitting is a high risk

The technique is irrelevant to overfitting.

When underfitting is a high risk

The technique is irrelevant to underfitting.

Which of the following are examples of hyperparameters?

Maximum depth of a decision tree only

It's incomplete

Number of layers in a deep learning network only

It's incomplete

Learning rate of gradient descent

It's incomplete

All of the above

You are validating a machine learning model and have decided you need to further tune hyperparamets. You would like to try analyze multiple hyperparameter combinations in parallel. Which of the following techniques could you use?

Grid search and Bayesian search

Bayesian search is a sequential method for searching hyperparameter combinations.

Random search and Grid search

Random search and grid search can both be applied in parallel.

Bayesian search only

Bayesian search is a sequential method for searching hyperparameter combinations.

Random search only

It's incomplete

You spend a lot of time tuning hyperparameters by manually testing combinations of hyperparameters. You want to automate the process and use a technique that can learn from previous evaluations of other hyperparameter combinations. What algorithm would you use?

Grid search

Grid search is used for hyperparameter tuning but do not use prior knowledge.

Data augmentation

Data augmentation is not used for searching hyperparameters.

Bayesian search

Bayesian search uses knowledge from previous evaluations when selecting new hyperparameter values.

Random search

Random search is used for hyperparameter tuning but do not use prior knowledge.

A dataset has been labeled by a crowd-sourced group of labelers. You want to evaluate the quality of the labeling process. You randomly select a group of labeled instances and find several are mislabled. You want to find other instances that are similar to the mislabeled instances. What kind of algorithm would you use to find similar instances?

Approximate Nearest Neighbor

Approximate Nearest Neighbor algorithms use clustering to group similar instances and would be the correct choice.

XGBoost

XGBoost is not clustering algorithms and would not be as good a choice as a clustering algorithm.

Random Forest

Random Forest is not clustering algorithms and would not be as good a choice as a clustering algorithm.

Gradient descent

Gradient descent is a technique used to optimize weights in deep learning.

A company is migrating a machine learning model that is currently being served on premises to Google Cloud. The model runs in Spark ML. You have been asked to recommend a way to migrate the service with the least disruption in service and minimal effort. The company does not want to manage infrastructure if possible and prefers to use managed services. What would you recommend?

BigQuery ML

BigQuery supports BigQuery ML but that would require re-implmenting the model.

Cloud Dataproc

Cloud Dataproc is a managed Spark/Hadoop service and would be a good choice.

Cloud Dataflow

Cloud Dataflow is a managed service for batch and stream processing.

Cloud Data Studio

Cloud Data Studio is a visualization tool.

A group of data analysts know SQL and want to build machine learning models using data stored on premises in relational databases. They want to load the data into the cloud and use a cloud-based service for machine learning. They want to build models as quickly as possible and use them for problems in classification, forecasting, and recommendations. They do not want to program in Python or Java. What Google Cloud service would you recommend?

Cloud Dataproc

Cloud Dataproc could be used for machine learning but requires programming in Java, Python or other programming languages.

Cloud Dataflow

Cloud Dataflow is for data processing, not machine learning.

BigQuery ML

BigQuery ML uses SQL to create and serve machine learning models and dose not require programming in a language such as Python or Java.

Bigtable

Bigtable does not support machine learning directly in the service.

What feature representation is used when training machine learning models using text or image data?

Feature vectors

Feature vectors are the standard way of inputting data to a machine learning algorithm.

Lists of categorical values

Lists of categorical values are not accessible to many machine learning algorithms.

2-dimensional arrays

2-dimensional arrays are mapped to 1-dimensional feature vectors before submitting data to the machine learning training algorithm.

3-dimensional arrays

3-dimensional arrays are mapped to 1-dimensional feature vectors before submitting data to the machine learning training algorithm.

An IoT company has developed a TensorFlow deep learning model to detect anomalies in machine sensor readings. The model will be deployed to edge devices. Machine learning engineers want to reduce the model size without significantly reducing the quality of the model. What technique could they use?

ANOVA

ANOVA is a statistical test for comparing the means of two or more populations.

Quantization

Quantization is a technique for reducing model size without reducing quality.

Data augmentation

Data augmentation is used to create new training instances based on existing instances.

Bucketing

Bucketing is a technique of mapping feature values into a smaller set of values.

You have created a machine learning model to identify defective parts in an image. Users will send images to an endpoint used to serve the model. You want to follow Google Cloud recommendations. How would you encode the image when making a request of the prediction service?

CSV

CSV is a file formats for structured data.

Avro

Avro is a file formats for structured data.

base64

Base64 is the recommended encoding for images.

Capacitor format

Capacitor format is used by BigQuery to store data in compressed, columnar format.

You are making a large number of predictions using an API endpoint. Several of the services making requests could send batches of requests instead of individual requests to the endpoint. How could you improve the efficiency of serving predictions?

Use batches with a large batch size to take advantage of vectorization

Using batches with large batch size will take advantage of vectorization and improve efficiency.

Vertically scale the API server

Vertically scaling will increase throughput but using the API and single requests will still use more compute resources than using batch processing.

Train with additional data to improve accuracy

Training with additional data will not change serving efficiency.

Release re-trained models more frequently

Re-training more frequently will not change serving efficiency.

Which component of the Vertex AI platform provides for the orchestration of machine learning operations in Vertex AI?

Vertex AI Prediction

Vertex AI Prediction is for serving models

Vertex AI Pipelines

Vertex AI Experiments

Vertex AI Experiments is for tracking training experiments

Vertex AI Workbench

Vertex Workbench provides managed and user managed notebooks for development.

A team of researchers have built a TensorFlow model for predicting near-term weather changes. They are using TPUs but are not achieving the throughput they would like. Which of the following might improve the efficiency of processing?

Using the tf.data API to maximize the efficiency of data pipelines using GPUs and TPUs

Use distributed XGBoost

XGBoost is a machine learning platform and will not improve the efficiency of a TensorFlow model.

Use early stopping

Early stopping is an optimization for training, not serving.

Scale up CPUs before scaling out the number of CPUs

Scaling up CPUs or adding more CPUs will not significantly change the efficiency of using GPUs or TPUs.

Managed data sets in Vertex AI provided which of the following benefits?

Manage data sets in a central location only

There are no enhanced predefined roles for Vertex AI datasets.

Managed data sets in a central location and create labels and annotations only

Managed data sets in a central location, create labels and annotations, and apply enhanced predefined IAM roles only

There are no enhanced predefined roles for Vertex AI datasets.

Managed data sets in a central location, create labels and annotations, apply enhanced predefined IAM roles, and track the lineage of models

There are no enhanced predefined roles for Vertex AI datasets.

Which of the following are options for tabular datasets in Vertex AI Datasets?

CSV files only

It's incomplete

CSV files and BigQuery tables and views

Vertex AI Datasets support CSV files and BigQuery tables and views for tabular data.

CSv files, BigQuery tables and views, and Bigtable tables

Bigtable tables are not supported.

CSV files, BigQuery tables and views, and Avro files

Avro files are not supported.

A team of reviewers is analyzing a training data set for sensitive information that should not be used when training models. Which of the following are types of sensitive information that should be removed from the training set?

Credit card numbers

Government ID numbers

Purchase history

Purchase history is not sensitive information.

Faces in images

Customer segment identifier

Customer segment identifiers are not sensitive information.

Which of the follwoing techniques can be used to mask sensitive data?

Substitution cipher

Tokenization

Data augmentation

Data augmentation is used to increase the size of training sets.

Regularization

Regularization is used to prevent overfitting.

Principal component analysis

Which of the following is a type of security risk to machine learning models?

Data poisoning

Data poisoning is a security risk associated with an attacker compromising the training process in order to train the model to behave in ways the attacker wants.

Missing data

Missing data and inconsistent data are data risks that can compromise a model but they are not security risks.

Inconsistent labeling

Insufficiently agreed upon objectives

Insufficiently agreed upon objectives is a process risk but not a security risk.

You are training a classifier using XGBoost in Vertex AI. Training is proceeding slower than expected so you add GPUs to your training server. There is no noticeable difference in the training time. Why is this?

GPUs are only useful for improving serving efficiency

GPUs are useful for improving training performance.

TPUs should have been used instead

Using TPUs would not improve performance.

GPUs are not used with XGBoost in Vertex AI

You did not install GPU drivers on the server

Vertex AI manages images used for training and serving so there is no need to manually install GPU drivers.

Aerospace engineers are building a model to predict turbulence and impact on a new airplane wing design. They have large, multi-dimensional data sets. What file format would you recommend they use for training data?

Parquet

Parquet is a columnar format and could be used but there is a better option.

Petastorm

Petastorm is designed for multi-dimensional data.

ORC

ORC is a columnar format and could be used but there is a better option.

CSV

CSV is inefficient for large data sets.

You would like to use a nested file format for training data that will be used with TensorFlow. You would like to use the most efficient format. Which of the following would you choose?

JSON

JSON is a plain text format and not as efficient as other option.

XML

XML is a plain text format and not as efficient as other option.

CSV

CSV is not a nested file format.

TFRecords

TFRecords is based on protobuf, a binary nested file format and optimized for TensorFlow.

A robotics developer has created a machine learning model to detect unripe apples in images. Robots use this information to remove unripe apples from a conveyor belt. The engineers who developed this model are using it as a starting model for training a model to detect unripe pears. This is an example of what kind of learning?

Unsupervised learning

Unsupervised learning uses data sets without labels.

Regression

Regression models predict a continuous value.

Reinforcement learning

Reinforcement learning uses feedback from the environment to learn.

Transfer learning

A retailer has deployed a machine learning model to predict when a customer is likely to abandon a shopping cart. A MLOps engineer notices that the feature data distribution in production deviates from feature data distribution in the latest training data set. This is an example of what kind of problem?

Skew

Skew is the problem of feature data distribution in production deviating from feature data distribution in training data.

Drift

Drift occurs when feature data distribution in production changes significantly over time.

Data leakage

Data leakage is a problem in training when data not available when making predictions is used in training.

Underfitting

Underfitting occurs when a model does not perform well even on training data set because the model is unable to learn.

Space Y is launching its hundredth satellite to build its StarSphere network. They have designed an accurate orbit (launching speed/time/and so on) for it based on the existing 99 satellite orbits to cover the Earth’s scope. What’s the best solution to forecast the position of the 100 satellites after the hundredth launch?

Use ML algorithms and train ML models to forecast

To decide whether ML is the best method for a problem, we need to see whether traditional science modeling would be very difficult or impossible to solve the problem and whether plenty of data exists.

Use neural networks to train the model to forecast

Use physical laws and actual environmental data to model and forecast

When we start, science modeling will be our first choice since it builds the most accurate model based on science and natural laws.
For example, given the initial position and speed of an object, as well as its mass and the forces acting on it, we can precisely predict its position at any time. For this case, the mathematical model works much better than any ML model!

Use a linear regression model to forecast

This is an ML problem framing question.

References:

Journey to Become a Google Cloud Machine Learning Engineer [book]. Section Is ML the best solution? in Chapter 3, Preparing for ML Development

A financial company is building an ML model to detect credit card fraud based on their historical dataset, which contains 20 positives and 4,990 negatives.
Due to the imbalanced classes, the model training is not working as desired. What’s the best way to resolve this issue?

Data augmentation

Early stopping

Downsampling and upweighting

Regularization

This question is about class imbalance when preparing data for classification problems.

When the data is imbalanced, it will be very difficult to train the ML model and get good forecasts

References:

Journey to Become a Google Cloud Machine Learning Engineer [book]. Section Data sampling and balancing in Chapter 3, Preparing for ML Development

A chemical manufacturer is using a GCP ML pipeline to detect real-time sensor anomalies by queuing the inputs and analyzing and visualizing the data. Which one will you choose for the pipeline?

Dataproc | Vertex AI | BQ

Check feedback section

Dataflow | AutoML | Cloud SQL

Check feedback section

Dataflow | Vertex AI | BQ

Check feedback section

Dataproc | AutoML | Bigtable

Check feedback section

This is an ML pipeline question.
Dataproc and Dataflow are GCP data processing services, and both can process batch or streaming data.
Dataproc is designed to run on clusters for jobs that are compatible with MapReduce (Apache Hadoop, Hive, and Spark).
Dataflow is based on parallel data processing and works better if your data has no implementation with Spark or Hadoop.
AutoML is “automated” ML training with Google’s model and your own data, with no coding.
Vertex AI custom training involves “human-performed” ML training – using your own data and model.
Cloud SQL is for relational data online transaction processing.
Bigtable is more for NoSQL transaction processing.
BQ is great for analyzing and visualizing data (integrating with Data Studio).

A real estate company, Zeellow, does great business buying and selling properties in the United States. Over the past few years, they have accumulated a big amount of historical data for US houses.
Zeellow is using ML training to predict housing prices, and they retrain the models every month by integrating new data. The company does not want to write any code in the ML process. What method best suits their needs?

AutoML

AutoML serves the purpose of no coding during the ML process

BigQuery ML

Vertex AI Custom training

This question is about the difference between AutoML and Vertex AI custom training.

The data scientist team is building a deep learning model for a customer support center of a big Enterprise Resource Planning (ERP) company, which has many ERP products and modules. The DL model will input customers’ chat texts and categorize them into products before routing them to the corresponding team. The company wants to minimize the model development time and data preprocessing time. What strategy/platform should they choose?

Vertex AI custom training

AutoML

NLP API

Vertex AI Custom notebooks

A real estate company, Zeellow, does great business buying and selling properties in the United States. Over the past few years, they have accumulated a big amount of historical data for US houses.
Zeellow wants to use ML to forecast future sales by leveraging their historical sales data. The historical data is stored in cloud storage. You want to rapidly experiment with all the available data. How should you build and train your model?

Load data into BigQuery and use BigQuery ML

BQ and BQML are the best options here since all the others will take a long time to build and train the model.

Convert the data into CSV and use AutoML

Convert the data into TFRecords and use TensorFlow

Convert and refactor the data into CSV format and use the built-in XGBoost library

A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. New data is coming in every week, and Zeellow needs to make sure the model is continually retrained to reflect the marketing trend. What should they do with the historical data and new data?

Only use the new data for retraining

Update the datasets weekly with new data

Update the datasets with new data when model evaluation metrics do not meet the required criteria

We need to retrain the model when the performance metrics do not meet the requirements using the integrated datasets, including existing and new data.

Update the datasets monthly with new data

A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. Their data science team trained and deployed a DL model in production half a year ago. Recently, the model is suffering from performance issues due to data distribution changes.
The team is working on a strategy for model retraining. What is your suggestion?

Monitor data skew and retrain the model

Model retraining is based on data value skews, which are significant changes in the statistical properties of data.
When data skew is detected, this means that data patterns are changing, and we need to retrain the model to capture these changes.

Retrain the model with fewer model features

Retrain the model to fix overfitting

Retrain the model with new data coming in every month

References:

https://developers.google.com/machine-learning/guides/rules-of-ml/#rule_37_measure_trainingserving_skew

Recent research has indicated that when a certain kind of cancer, X, is developed in a human liver, there are usually other symptoms that can be identified as objects Y and Z from CT scan images. A hospital is using this research to train ML models with a label map of (X, Y, Z) on CT images. What cost functions should be used in this case?

Binary cross-entropy

Binary cross-entropy is used for binary classification problems.

Categorical cross-entropy

Categorical entropy is better to use when you want to prevent the model from giving more importance to a certain class – the same as the one-hot encoding idea.

Sparse categorical cross-entropy

Sparse categorical entropy is more optimal when your classes are mutually exclusive (for example, when each sample belongs exactly to one class)

Dense categorical cross-entropy

The data science team in your company has built a DNN model to forecast the sales value for an automobile company, based on historical data. As a Google ML Engineer, you need to verify that the features selected are good enough for the ML model

Train the model with L1 regularization and verify that the loss is constant

Train the model with no regularization and verify that the loss is constant

Train the model with L2 regularization and verify that the loss is decreasing

Train the model with no regularization and verify that the loss is close to zero

The loss function is the measurement for model prediction accuracy and is used as an index for the ML training process.

References:

Journey to Become a Google Cloud Machine Learning Engineer [book]. Section Regularization in Chapter 4, Developing and Deploying ML Models

The data science team in your company has built a DNN model to forecast the sales value for a real estate company, based on historical data. As a Google ML Engineer, you find that the model has over 300 features and that you wish to remove some features that are not contributing to the target. What will you do?

Use Explainable AI to understand the feature contributions and reduce the non-contributing ones.

Explainable AI is one of the ways to understand which features are contributing and which ones are not

Use L1 regularization to reduce features.

L1 is a method for resolving model overfitting issues and not feature selection in data engineering.

Use L2 regularization to reduce features.

L2 is a method for resolving model overfitting issues and not feature selection in data engineering.

Drop a feature at a time, train the model, and verify that it does not degrade the model. Remove these features.

The data science team in your company has built a DNN model to forecast the sales value for a real estate company, based on historical data. They found that the model fits the training dataset well, but not the validation dataset. What would you do to improve the model?

Apply a dropout parameter of 0.3 and decrease the learning rate by a factor of 10

Apply an L2 regularization parameter of 0.3 and decrease the learning rate by a factor of 10

Apply an L1 regularization parameter of 0.3 and increase the learning rate by a factor of 10

Tune the hyperparameters to optimize the L2 regularization and dropout parameters

The correct answer would be fitting to the general case

You are building a DL model for a customer service center. The model will input customers’ chat text and analyze their sentiments. What algorithm should be used for the model?

MLP

Regression

CNN

RNN

Text processing for sentiment analysis needs to process sequential data (time series)

A health insurance company scans customers' hand-filled claim forms and stores them in Google Cloud Storage buckets in real time. They use ML models to recognize the handwritten texts. Since the claims may contain Personally Identifiable Information (PII), company policies require only authorized persons to access the information. What’s the best way to store and process this streaming data?

Create two buckets and label them as sensitive and non-sensitive. Store data in the non-sensitive bucket first. Periodically scan it using the DLP API and move the sensitive data to the sensitive bucket.

Create one bucket to store the data. Only allow the ML service account access to it.

Create three buckets – quarantine, sensitive, and non-sensitive. Store all the data in the quarantine bucket first. Then, periodically scan it using the DLP API and move the data to either the sensitive or non-sensitive bucket.

Create three buckets – quarantine, sensitive, and non-sensitive. Store all the data in the quarantine bucket first. Then, once the file has been uploaded, trigger the DLP API to scan it, and move the data to either the sensitive or non-sensitive bucket.

A real estate company, Zeellow, uses ML to forecast future sales by leveraging their historical data. The recent model training was able to achieve the desired forecast accuracy objective, but it took the data science team a long time. They want to decrease the training time without affecting the achieved model accuracy. What hyperparameter should the team adjust?

Learning rate

Epochs

Machine type

Changing the other three parameters will change the model’s prediction accuracy.

Batch size

The data science team has built a DNN model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company. As a Google ML Engineer, you need to measure the performance of the ML model for the test dataset/images. Which of the following would you choose?

The AUC value

It measures how well the predictions are ranked rather than their absolute values. It is a classification threshold invariant and thus is the best way to measure the model’s performance.

The recall value

The precision value

The TP value

The data science team has built a DL model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company. Over time, the team has built multiple model versions in Vertex AI. As a Google ML Engineer, how will you compare the model versions?

Compare the mean average precision for the model versions

It measures how well the different model versions perform over time: deploy your model as a model version and then create an evaluation job for that version. By comparing the mean average precision across the model versions, you can find the best performer.

Compare the model loss functions on the training dataset

Compare the model loss functions on the validation dataset

Compare the model loss functions on the testing dataset

The data science team is building a recommendation engine for an e-commerce website using ML models to increase its business revenue, based on users’ similarities. What model would you choose?

Collaborative filtering

Collaborative filtering uses similarities between users to provide recommendations.

Regression

Classification

Content-based filtering

Content-based filtering uses the similarity between items to recommend items that are similar to what the user likes.

References:

https://developers.google.com/machine-learning/recommendation/overview/candidate-generation

The data science team is building a fraud-detection model for a credit card company, whose objective is to detect as much fraud as possible and avoid as many false alarms as possible. What confusion matrix index would you maximize for this model performance evaluation?

Precision

Recall

The area under the PR curve

In this fraud-detection problem, it asks you to focus on detecting fraudulent transactions - maximize True Positive rate and minimize False Negative - maximize recall (Recall = TruePositives / (TruePositives + FalseNegatives))
It also asks you to minimize false alarms (false positives) - maximize precision (Precision = TruePositives / (TruePositives + FalsePositives)).
So, you want to maximize both precision and recall.

The area under the ROC curve

References:

https://machinelearningmastery.com/roc-curves-andprecision-recall-curves-for-imbalanced-classification/

The data science team is building a data pipeline for an auto manufacturing company, whose objective is to integrate all the data sources that exist in their on-premise facilities, via a codeless data ETL interface. What GCP service will you use?

Dataproc

Dataflow

Dataprep

Data Fusion

Data Fusion is the best choice for data integration with a codeless interface

References:

https://cloud.google.com/data-fusion/docs/concepts/overview#using_the_code-free_web_ui

The data science team has built a TensorFlow model in BigQuery for a real estate company, whose objective is to integrate all their data models into the new Google Vertex AI platform. What’s the best strategy?

Export the model from BigQuery ML

Register the BQML model to Vertex AI

Vertex AI allows you to register a BQML model in it.

Import the model into Vertex AI

Use Vertex AI as the middle stage

References:

https://cloud.google.com/bigquery/docs/managing-models-vertex

A real estate company, Zeellow, uses ML to forecast future house sale prices by leveraging their historical data. The data science team needs to build a model to predict US house sale prices based on the house location (US city-specific) and house type. What strategy is the best for feature engineering in this case?

One feature cross: [latitude X longitude X housetype]

Two feature crosses: [binned latitude X binned housetype] and [binned longitude X binned housetype]

Three separate binned features: [binned latitude], [binned longitude], [binned housetype]

One feature cross: [binned latitude X binned longitude X binned housetype]

Crossing binned latitude with binned longitude enables the model to learn city-specific effects on house types. It prevents a change in latitude from producing the same result as a change in longitude
Depending on the granularity of the bins, this feature cross could learn city-specific housing effects.

References:

https://developers.google.com/machine-learning/crash-course/feature-crosses/check-your-understanding

A health insurance company scans customer’s hand-filled claim forms and stores them in Google Cloud Storage buckets in real time. The data scientist team has developed an AI documentation model to digitize the images. By the end of each day, the submitted forms need to be processed automatically. The model is ready for deployment. What strategy should the team use to process the forms?

Vertex AI batch prediction

We need to run the process at the end of each day, which implies batch processing

Vertex AI online prediction

Vertex AI ML pipeline prediction

Cloud Run to trigger prediction

A real estate company, Zeellow, uses GCP ML to forecast future house sale prices by leveraging their historical data. Their data science team has about 30 members and each member has developed multiple versions of models using Vertex AI custom notebooks. What’s the best strategy to manage these different models and different versions developed by the team members?

Set up IAM permissions to allow each member access to their notebooks, models, and versions

Create a GCP project for each member for clean management

Create a map from each member to their GCP resources using BQ

Apply label/tags to the resources when they’re created for scalable inventory/cost/access management

Resource tagging/labeling is the best way to manage ML resources for medium/big data science teams

References:

https://cloud.google.com/resource-manager/docs/tags/tags-creating-and-managing

Starbucks is an international coffee shop selling multiple products A, B, C… at different stores (1, 2, 3… using one-hot encoding and location binning). They are building stores and want to leverage ML models to predict product sales based on historical data (A1 is the data for product A sales at store 1). Following the best practices of splitting data into a training subset, validation subset, and testing subset, how should the data be distributed into these subsets?

Distribute data randomly across the subsets:

Training set: [A1, B2, F1, E2, ...]
Testing set: [A2, C3, D2, F4, ...]
Validation set: [B1, C1, D9, C2...]

If we distribute the data randomly into the training, validation, and test sets, the model will be able to learn specific qualities about the products.

Distribute products randomly across the subsets:

Training set: [A1, A2, A3, E1, E2, ...]
Testing set: [B1, B2, C1, C2, C3, ...]
Validation set: [D1, D2, F1, F2, F3, ...]

If we divided things up at the product level so that the given products were only in the training subset, the validation subset, or the testing subset, the model would find it more difficult to get high accuracy on the validation since it would need to focus on the product characteristics/qualities

Distribute stores randomly across subsets:

Training set: [A1, B1, C1, ...]
Testing set: [A2, C2, F2, ...]
Validation set: [D3, A3, C3, ...]

Aggregate the data groups by the cities where the stores are allocated and distribute cities randomly across subsets

This question is about dataset splitting to avoid data leakage.

References:

https://developers.google.com/machine-learning/crash-course/18th-century-literature

You are building a DL model with Keras that looks as follows:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128, activation='relu', input_shape=(200,)))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.Dense(4, activation='relu'))
model.add(tf.keras.layers.Dropout(rate=0.25))
model.add(tf.keras.layers.Dense(2))

How many trainable weights does this model have?

200x128+128x4+4x2

200x128+128x4+2

200x128+129x4+5x2

200x128x0.25+128x4x0.25+4x2

200x128+128+128x4+4+4x2+2

Trainable params: 26,254

The data science team is building a DL model for a customer support center of a big ERP company, which has many ERP products and modules. The company receives over a million customer service calls every day and stores them in GCS. The call data must not leave the region in which the call originated and no PII can be stored/analyzed. The model will analyze calls for customer sentiments. How should you design a data pipeline for call processing, analyzing, and visualizing?

GCS -> Speech2Text -> DLP -> BigQuery

GCS -> Pub/Sub -> Speech2Text -> DLP -> Datastore

GCS -> Speech2Text -> DLP -> BigTable

GCS -> Speech2Text -> DLP -> Cloud SQL

BigQuery is the best tool here to analyze and visualize

The data science team is building an ML model to monitor and detect defective products using the images from the assembly line of an automobile manufacturing company, which does not have reliable Wi-Fi near the assembly line. As a Google ML Engineer, you need to reduce the amount of time spent by quality control inspectors utilizing the model’s fast defect detection. Your company wants to implement the new ML model as soon as possible. Which model should you use?

AutoML

AutoML Edge mobile-versatile-1

AutoML Edge mobile-low-latency-1

The question asks for a quick inspection time and prioritizes latency reduction

AutoML Edge mobile-high-accuracy-1

References:

https://cloud.google.com/vertex-ai/docs/export/export-edge-model

A national hospital is leveraging Google Cloud and a cell phone app to build an ML model to forecast heart attacks based on age, gender, exercise, heart rate, blood pressure, and more. Since the health data is highly sensitive personal information and cannot be stored in cloud databases, how should you train and deploy the ML model?

IoT with data encryption

Federated learning

With federated learning, all the data is collected, and the model is trained with algorithms across multiple decentralized edge devices such as cell phones or websites, without exchanging them.

Encrypted BQML

DLP API