Profile Picture

Getting (more) confortable with statistics

During this summer, I’ve taken the decision of getting better at Statistics. My relationship with statistics was a kind of relation that I coukd stare at it and maybe get an intuition of what’s going on, but no more than that. Anything that involved, for example, understanding how is a Pearson’ correlation matrix calculated and what’s the meaning of the math behind that gave me the chills since I really had bad foundations.

Well, so during this summer I’ve decided to join the Master in Analysis and Engineering of Big Data at FCT NOVA - at least partially, since working at Feedzai will still occupy most of my time. This started in the mid of September, and it’s going to end at the beginning of January.

The courses I’m taking are:

  1. Multivariate Stats

  2. Computational Stats

Multivariate Stats

You can find my online book of this course here: Multivariate Stats

The goal of this course is to put the students familiar with the inference of multivariate means and co-variance matrices, as well as Gaussian (populations) linear models and dimensionality reduction techniques. In order to apply the knowledge gathered, it is then applied on data discrimination and classification.

At this point, this made me get more comfortable with matrix operations as well as method’s assumption on normality.

For example, let’s talk a little bit about the determinant of a matrix.

Matrix Determinant

It was during the time I was studying this subject that I got a grasp on what is, semantically, the determinant of a matrix - kudos to 3blue1brow for his amazing job on explaining that!

Let me try to summarize this in a few lines of codes and plots.

Assume that we have the following data with the following co-variance matrix:

set.seed(1)
df <- data.frame(
  v1 = rnorm(20, 4,2),
  v2 = rchisq(20, 2)
)
plot <- ggplot(df, aes(v1, v2)) 
plotly::ggplotly(plot + stat_density2d(geom="tile", aes(fill = ..density..), contour = FALSE) + geom_point(colour = "white"))