Introduction to
Care for data scientists and models

How to set up your organizations
to be informed by data
Joerg Rings 2024
What is a model
- Takes input data and transforms it through a mathematical/statistical process into an output
Prominent Types of Models
- Supervised (with target): Regression/Classification
- Unsupervised: Clustering
- Reinforcement
- Generative: Large Language Models
- Forecasting
Example: HELOC Dataset
- Is a home loan too risky? (FICO dataset)
- Target: Will loan be paid within 2 years?
- Some features:
- External estimate of risk
- Months since first trade
- Percentage trades with balance
Model Lifecycle
Design -- What is the problem?
Development -- Use math and coding to build a solution
Operation -- Automate and monitor
Good/bad news
Being data-driven means
everyone’s way of work will change.
Model Lifecycle - 1: Design
- Collaborate to find out:
- What is the problem the organization wants to solve
- What analysis have they done
- What are the levers
- What decisions will be changed and how are they prepared to do that
- What data is available
- Is a model needed to improve decision making?
Model Lifecycle - 1: Design
- Thought Experiment:
- If the data scientists were your superiors, how would you prove to them you know how to change your ways based on data decisions?
- Can you do a small test using a very simplified model?
- Can you set up A/B testing of challenger model against the current baseline process?
Model Lifecycle - 2: Development
- Where all the mathy and engineering happens
- Extract and transform data, engineer feature, fit algorithms
- Analyse and explain outcome
- Define decision process
- Document so it can be reproduced
Building a model is advanced data analysis.
It lets your data tell stories.
Don't outsource them, don't rush them.
Example: HELOC Dataset
- Feature importance of [InterpretML Explainable Boosting Trees

- Will relying on external risk evaluation make our model unstable?
Example: HELOC Dataset
SHAP measures impact of features on results locally or globally

Model Lifecycle - 3: Operation
- Package model (scoring) code to production quality
- Has to run fully automated
- Document everything so it's reproducible
- Especially, document all assumptions and risks
- Monitor execution; validity and stability over lifetime
- Validity: Model parameters still work for the problem
- Stability: Scored population still looks like training population
Model Risk

Model Risk - Sources of Errors
- Data
- Data
- Method
- Implementation
- Lack of Monitoring
- etc etc etc
Successful Data Science: Skills Needed
- Math
- Creativity
- Scientific Rigor
- Communication
- Software Engineering and DevOps
- Domain Knowledge
- Judgment of "good enough"
Takeaways
- Modeling is art and science
- Set everyone up to make new, informed decisions from model output
- Advanced data analysis opens up the stories data can tell
- Think deeply about the stories the data tells you
- There are no quick fixes, but know what is good enough
Introduction to Care for data scientists and models How to set up your organizations to be informed by data Joerg Rings 2024