评估与改进机器学习模型

评估与改进机器学习模型的方法与误差分析

stoAir

1231人浏览 · 2024-02-14 23:43:29

stoAir · 2024-02-14 23:43:29 发布

Ml strategy

文章目录

Ml strategy

Single Numble Evaluation Metric

Precision : n% actually are …

Recall : n% was correctly recognized

F1 score : $\frac{2}{\frac{1}{P}+\frac{1}{R}}$

optimizing and satisficing metric

cost = accuracy - 0.5 * running time

N matrics : 1 optimizing , (N-1) reach threshold (satisficing)

if doing well on your metric + dev/test set does not crrespond to doing welll on your application , change your metric and/or dev/test set.

Improving model performance

Two fundamental

fit the training set well
the training set performance generalizes pretty well to the dev/test set

Reduce bias and variance

human lever <–> training error <–> dev error (avoidable bias / variance)

Avoidable bias

Train bigger model
Train longer/better optimization algorithms
NN architecture/hyperparameters search

variance

more data
regularization
NN architecture/hyperparameters search

error analysis

ways

dev examples to evalueate ideas ( 5/100 -> |10%->9.5% )
evaluate multiple ideas in parallel

Incorrectly labled examples

diffenrent distributions

human leber <–> training error <–> training-dev error <–> dev/test error

(avoidable bias -> variance -> data mismatch)

address data mismatch

understand diffenrence between training and dev/test sets
collect more data similar to dev/test sets

Transfer learning

the same input
a lot more data for A than B
low lever features of A

change the $w^{[l]}; b^{[l]}$ to pre-training (initial the weights)and pine-tuning (a large number of datas)

Multi-task learning

change the $y\;and\;\hat y$ dimension

end-to-end

more data learn well

audio --> transcript

MCP技术社区

欢迎加入 MCP 技术社区！与志同道合者携手前行，一同解锁 MCP 技术的无限可能！

更多推荐

8种封装的1700V国产碳化硅(SiC)功率模块产品介绍及应用

MCP技术社区

（一篇入门）汽车电子电器之电机MCU控制器四

MCP技术社区

基于Echarts的甘特图实现与封装实战

Echarts（Enterprise Charts）是由百度开源的一款功能强大、高度可定制的JavaScript数据可视化库，广泛应用于各类企业级Web应用中。其核心设计理念是“以数据驱动视图”，通过声明式配置即可实现复杂图表的渲染与交互。在现代前端工程化背景下，Echarts不仅支持静态图表展示，更具备动态更新、大数据量处理和跨平台兼容等高级能力，成为构建高性能数据看板、实时监控系统和项目管理工