Statistics Interview Questions and Answers

1. What is Statistics in Data science?

Statistics is the study of collecting, analyzing, and interpreting data. In Data Science, it’s used for data analysis, creating models, and validating predictions.

2. What are Descriptive and Inferential Statistics?

  • Descriptive Statistics summarize data (mean, median, mode).
  • Inferential Statistics draw conclusions from a data sample using hypothesis testing.

3. What is the difference between Population and Sample?

  • Population includes all possible data points.
  • Sample is a subset of the population, used in inferential statistics.

4. What is Standard Deviation?

Standard Deviation measures data variability. A low value indicates data points close to the mean, while a high value shows greater spread.

5. What is the importance of Probability in Data science?

Probability quantifies uncertainty in data, crucial for algorithms like Bayesian Networks and classification models.

6. What is pvalue in hypothesis testing?

The p-value measures the evidence against a null hypothesis. A smaller p-value (<0.05) indicates strong evidence to reject it.

7. What is the difference between Type i and Type ii Errors?

  • Type I Error: False positive (rejecting a true null hypothesis).
  • Type II Error: False negative (failing to reject a false null hypothesis).

8. Explain the Central Limit Theorem.

The Central Limit Theorem states that the sampling distribution of the sample mean approaches normality as the sample size increases.

9. What is a Confidence interval?

A confidence interval provides a range of values likely to contain the population parameter, typically with a 95% or 99% confidence level.

10. What is A Z-score?

A z-score indicates how many standard deviations a data point is from the mean, useful for normal distribution analysis.

11. What is Overfitting and Underfitting in data science?

  • Overfitting: Model fits training data too well, fails on new data.

  • Underfitting: Model is too simple, cannot capture data patterns.

12. What is Variance and Covariance?

  • Variance measures data spread.
  • Covariance shows the relationship between two variables (positive or negative).

13. What is Correlation, how is it different from Covariance?

  • Correlation standardizes covariance, giving values between -1 and 1.
  • It measures the strength and direction of a relationship.

14. What is Hypothesis testing?

Hypothesis testing evaluates an assumption about a population parameter using a null hypothesis and an alternative hypothesis.

15. What is the difference between Parametric and Non-parametric tests?

  • Parametric tests assume normal distribution (e.g., t-test).
  • Non-parametric tests don’t require such assumptions (e.g., Mann-Whitney U test).

16. What is Linear regression?

Linear regression models the relationship between dependent and independent variables using a straight-line equation.

17. What is the role of statistics in machine learning?

Statistics helps understand data distributions, validate models using hypothesis tests, and optimize algorithms like logistic regression.

18. What is Sampling Name its types?

Sampling selects a data subset for analysis. Types:

  • Random Sampling
  • Stratified Sampling
  • Systematic Sampling

19. What is the difference between Mean, Median and Mode?

  • Mean: Average value.
  • Median: Middle value when sorted.
  • Mode: Most frequent value.

20. What is Bayesian Statistics?

Bayesian Statistics uses probability to update beliefs based on new data, crucial in models like Naive Bayes.

21. What is Multicollinearity, how do you handle it?

Multicollinearity occurs when independent variables are highly correlated. It’s addressed using techniques like VIF analysis or PCA.

22. What are Outliers and how do you handle them?

Outliers are data points significantly different from others. They’re managed through methods like boxplots or standard deviation thresholds.

23. What is Chi-Square test in statistics?

The Chi-Square test measures the association between categorical variables.


24. What is the difference between R-Squared and Adjusted R-Squared ?

  • R-Squared shows model fit.
  • Adjusted R-Squared adjusts for the number of predictors in the model.

25. What is A/B testing?

A/B Testing compares two variants (A and B) to determine which performs better, widely used in marketing and product analysis.
line

Copyrights © 2024 letsupdateskills All rights reserved