CrowdforGeeks | Build Skills with Online Courses from Top Institutions

Top 100+ Predictive Modeling Interview Questions And Answers

Question 1. What Are The Essential Steps In A Predictive Modeling Project?

Answer :

It includes the subsequent steps:

Establish business goal of a predictive model
Pull Historical Data - Internal and External
Select Observation and Performance Window
Create newly derived variables
Split Data into Training, Validation and Test Samples
Clean Data - Treatment of Missing Values and Outliers
Variable Reduction / Selection
Variable Transformation
Develop Model
Validate Model
Check Model Performance
Deploy Model
Monitor Model
Question 2. What Are The Applications Of Predictive Modeling?

Answer :

Predictive modeling is in general used in the following areas -

Acquisition - Cross Sell / Up Sell
Retention - Predictive Attrition Model
Customer Lifetime Value Model
Next Best Offer
Market Mix Model
Pricing Model
Campaign Response Model
Probability of Customers defaulting on loan
Segment customers based totally on their homogenous attributes
Demand Forecasting
Usage Simulation
Underwriting
Optimization - Optimize Network
SAS Programming Interview Questions
Question 3. Explain The Problem Statement Of Your Project. What Are The Financial Impacts Of It?

Answer :

Cover the goal or major intention of your predictive model. Compare economic blessings of the predictive model vs. No-model. Also highlights the non-economic benefits (if any).

Question four. Difference Between Linear And Logistic Regression?

Answer :

Two foremost distinction are as follows -

Linear regression requires the structured variable to be continuous i.E. Numeric values (no categories or organizations). While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multinomial or normal logistic regression will have established variable with extra than classes.

Linear regression is based on least square estimation which says regression coefficients ought to be chosen in one of these manner that it minimizes the sum of the squared distances of every located response to its outfitted value. While logistic regression is primarily based on Maximum Likelihood Estimation which says coefficients need to be chosen in this sort of way that it maximizes the Probability of Y given X (likelihood)

SAS Programming Tutorial
Question five. How To Handle Missing Values?

Answer :

We fill/impute lacking values the usage of the following strategies. Or make lacking values as a separate class.

Mean Imputation for Continuous Variables (No Outlier)
Median Imputation for Continuous Variables (If Outlier)
Cluster Imputation for Continuous Variables
Imputation with a random value that is drawn between the minimal and maximum of the variable [Random value = min(x) + (max(x) - min(x)) * ranuni(SEED)]
Impute Continuous Variables with Zero (Require enterprise information)
Conditional Mean Imputation for Continuous Variables
Other Imputation Methods for Continuous - Predictive suggest matching, Bayesian linear regression, Linear regression ignoring model mistakes and so on.
WOE for missing values in express variables
Decision Tree, Random Forest, Logistic Regression for Categorical Variables
Decision Tree, Random Forest works for each Continuous and Categorical Variables
Multiple Imputation Method
Red Hat cluster Interview Questions
Question 6. How To Treat Outliers?

Answer :

There are numerous strategies to deal with outliers -

Percentile Capping
Box-Plot Method
Mean plus minus 3 Standard Deviation
Weight of Evidence
Question 7. Explain Dimensionality / Variable Reduction Techniques?

Answer :

Unsupervised Method (No Dependent Variable)

Principal Component Analysis (PCA)
Hierarchical Variable Clustering (Proc Varclus in SAS)
Variance Inflation Factor (VIF)
Remove zero and near-0 variance predictors
Mean absolute correlation. Removes the variable with the biggest suggest absolute correlation.
Supervised Method (In respect to Dependent Variable):

For Binary / Categorical Dependent Variable

Information Value
Wald Chi-Square
Random Forest Variable Importance
Gradient Boosting Variable Importance
Forward/Backward/Stepwise - Variable Significance (p-price)
AIC / BIC rating
For Continuous Dependent Variable

Adjusted R-Square
Mallows' Cp Statistic
Random Forest Variable Importance
AIC / BIC rating
Forward / Backward / Stepwise - Variable Significance
SAS DI Interview Questions
Question eight. What Is Multicollinearity And How To Deal It?

Answer :

Multicollinearity implies high correlation among independent variables. It is one of the assumptions in linear and logistic regression. It can be recognized by using looking at VIF score of variables. VIF > 2.5 implies mild collinearity trouble. VIF >5 is taken into consideration as high collinearity.

It may be handled by way of iterative process : first step - take away variable having highest VIF and then test VIF of closing variables. If VIF of remaining variables > 2.5, then follow the identical first step until VIF < =2.5

Question nine. How Vif Is Calculated And Interpretation Of It?

Answer :

VIF measures how an awful lot the variance (the rectangular of the estimate's preferred deviation) of an anticipated regression coefficient is expanded because of collinearity. If the VIF of a predictor variable were nine (√9 = 3) which means that the usual blunders for the coefficient of that predictor variable is 3 instances as huge as it might be if that predictor variable have been uncorrelated with the alternative predictor variables.Steps of calculating VIF

Run linear regression in which one of the impartial variable is taken into consideration as goal variable and all the different impartial variables considered as independent variables
Calculate VIF of the variable. VIF = 1/(1-RSquared)
Advanced SAS Interview Questions
Question 10. Do We Remove Intercepts While Calculating Vif?

Answer :

No. VIF depends on the intercept due to the fact there is an intercept within the regression used to determine VIF. If the intercept is eliminated, R-rectangular isn't meaningful because it can be terrible in which case you will get VIF < 1, implying that the standard error of a variable would go up if that independent variable were uncorrelated with the other predictors.

Question 11. What Is P-value And How It Is Used For Variable Selection?

Answer :

The p-value is lowest level of significance at which you can reject null hypothesis. In the case of independent variables, it implies whether coefficient of a variable is significantly different from zero.

Base Sas Interview Questions
Question 12. Explain Important Model Performance Statistics?

Answer :

AUC > 0.7. No tremendous distinction among AUC rating of education vs validation.
KS ought to be in pinnacle three deciles and it ought to be greater than 30
Rank Ordering. No destroy in rank ordering.
Same signs and symptoms of parameter estimates in both schooling and validation
SAS Programming Interview Questions
Question 13. Explain Collinearity Between Continuous And Categorical Variables. Is Vif A Correct Method To Compute Collinearity In This Case?

Answer :

Collinearity among categorical and non-stop variables may be very commonplace. The choice of reference class for dummy variables influences multicollinearity. It method changing the reference class of dummy variables can keep away from collinearity. Pick a reference category with highest share of cases.

VIF isn't a correct method in this example. VIFs need to simplest be run for non-stop variables. The t-check technique can be used to check collinearity between non-stop and dummy variable.