Welcome to Yuming's Blog!

你是我温暖的手套,冰冷的啤酒,带着阳光味道的衬衫,日复一日的梦想。

Java Learning Plan

2018-08-13

Java

Java
- Java basic
- Java unit test
  - Junit:
  - Mockito:
- Java web base:
  - JDBC
  - JSP
- Java Framework:
Java basic

Basic data structure and container:
- HashMap, ArrayList, HashSet etc
Design pattern
- Factory, Dao, api-gateway Resouces:
  1. https://github.com/iluwatar/java-design-patterns
  2. http://tutorials.jenkov.com/java-persistence/dao-design-pattern.html
Error Handling
- Resouces:
  1. https://stackify.com/best-practices-exceptions-java/?nabe=4582376289337344%3A0%2C5628017803264000%3A1%2C6190967756685312%3A2&nabr=10
  2. https://howtodoinjava.com/best-practices/java-exception-handling-best-practices/
XML

Java util

Resources:
1. https://www.geeksforgeeks.org/java-util-package-java/
2. https://www.tutorialspoint.com/java/util/index.htm
POJO
- Resouces:
  1. https://www.jianshu.com/p/224489dfdec8
Java Annotation

Resource:
1. http://www.cnblogs.com/skywang12345/p/3344137.html
Java unit test

Junit:
1. https://www.tutorialspoint.com/junit/
Mockito:
1. https://www.tutorialspoint.com/mockito/
Java web base:

JDBC

JSP

Java Framework:

Coral Service

Follow wiki for Coral Service

Guice
1. https://www.tutorialspoint.com/guice/index.htm
Spring boot
1. https://www.tutorialspoint.com/spring/
Tomcat
Read All

Data Pre-Processing Missing Data

2018-05-02

Machine Learning

Sample DF
Analysis Missing Data in Pandas
Handle Missing Data
- Drop
- Impute
  - mean imputation
  - MICE
- transformer

Sample DF

	A	B	C	D
0	1	2	3	4
1	5	6	NaN	8
2	0	11	12	NaN

Analysis Missing Data in Pandas

df.isnull().sum()

Use df.value to access the underlying NumPy array.

Handle Missing Data

df.dropna()

Drop

Drop NaN values(rows/cols)

Drop rows

# only drop rows where all columns are NaN
df.dropna(how='all')
# drop rows that have not at least 4 non-NaN values
df.dropna(thresh=4)
# only drop rows where NaN appear in specific columns (here: 'C')
df.dropna(subset=['C'])

Drop columns

df.dropna(axis=1)

Compare

drop rows vs drop columns:

drop rows may include overfitting as it will lose valueable data. while drop columns amy include underfitting as it will reduce features.

Impute

Just drop NaN values may lose too many values, so we can estimate the missing values from the other training samples.

mean imputation

we simply replace the missing value by the mean value of the entire feature column. We can use from sklearn.preprocessing import Imputer to do that.

from sklearn.preprocessing import Imputer
imr = Imputer(missing_values='NaN', strategy='mean', axis=0)
imr = imr.fit(df)
imputed_data = imr.transform(df.values)
imputed_data

Options:

Change row or column axis to 0/1
Change impute algorithm - strategy
- mean
- most_frequent: which is mostly used for categorical feature values
- median

MICE

mice：multivariate imputation via chained equation. 假设missing at random (MAR)，也就是说数据缺失的概率仅与其他观察值有关，所以可以通过预测进行估计。这是一种参数型方法，对于不同的缺失值变量采用不同的回归或者其他方法进行imputation

transformer

Used to transfer data. There are 2 main essential methods. fit & transfer

fit: learn the parameters from the training data
transfer: use those parameters to transform the data.

Read All

CS Books For 2018

2018-04-15

Todo

Todo

Read All
Ruby Class

2018-04-15

ruby

ruby

Read All
奢侈品管理笔记

2018-04-15

杂记

杂记

Read All
Git Commit style guide

2018-04-13

Git

Git
- Git commit message style
Git commit message style
Read All

1/12

Welcome to Yuming's Blog!

Java Learning Plan

Java basic

Basic data structure and container:

Design pattern

Error Handling

XML

Java util

POJO

Java Annotation

Java unit test

Junit:

Mockito:

Java web base:

JDBC

JSP

Java Framework:

Coral Service

Guice

Spring boot

Tomcat

Data Pre-Processing Missing Data

Sample DF

Analysis Missing Data in Pandas

Handle Missing Data

Drop

Drop rows

Drop columns

Compare

Impute

mean imputation

MICE

transformer

CS Books For 2018

Ruby Class

奢侈品管理 笔记

Git Commit style guide

Git commit message style

奢侈品管理笔记