Tuesday, June 11, 2019

Crack Data Science (Must know concepts)

Languages : Python or R
R vs Python

Get good grip in python language. It eases and help in many ways

I prefer Python because

  • Speed
  • Syntax
  • ease in production systems

Data Structures in python

  • Arrays
  • List
  • DataFrame
  • Dictionary
  • Tuples
  • Sets

Key packages in python

  • Numpy
  • Pandas
  • Matplotlib
  • Scipy
  • Scikit-learn
  • others also there........
  • Pytorch (Neural Networks)
Some of must know concepts

BrodCasting
Distributions - Normal, T-distribution, Bernoulli
Hypothesis testing
Central limit theorem
Supervised vs Un-Supervised
Loss functions
Bias / Variance tradeoff
Missing data analysis - Imputation methods
Linear Algebra
what is reshape(-1,1)

Dot product
Validation techniques
k-fold validation
Stratified k fold
Stacking

Linear Regression
Logistic Regression
Tree based models

Adjusted R-squared
P-value explanation
Storing model in pickle file


Hyper parameters
Model parameters

GridSearchCV
RandomizedSearchCV

Regularization
ROC/AUC
Lasso - L1
Ridge - L2
Elastic Net


Decision Trees
Entropy
Information gain
CART

ensemble models

Bagging - Random Forest
Boosting - XGBoost, Lightgbm
Stacking

PCA
Factor analysis

















1 comment:

  1. Hello I am so delighted I located your blog, I really located you by mistake, while I was watching on google for something else, Anyways I am here now and could just like to say thank for a tremendous post and a all round entertaining website. Please do keep up the great work. anaconda install tensorflow

    ReplyDelete

Image noise comparison methods

 1. using reference image technique     - peak_signal_noise_ratio (PSNR)     - SSI 2. non-reference image technique     - BRISQUE python pac...