The other day I found this post on the Domino Data Science blog that covers calculating a PCA of a matrix with 1 million rows and 13,000 columns. This is pretty big as far as PCA usually goes. They used a Spark cluster on a 16 core machine with 30GB of RAM and it took them 27 hours.
I read up a bit on PCA and realized you can do PCA on large (several billion element) matrices much faster and without using any Big Data tech like Spark by using better algorithms and more RAM.
This post is from my old blog. It’s about a weekend project where I downloaded a bunch of football match data and did some light analysis of it. I had further plans to use it for “Machine Learning” and try my hand at a prediction engine, but I didn’t get that far. Sadly, I couldn’t get the pictures back. It’s a great example of where I was 4 years ago and reminds me of the progress I’ve made.
We have advanced models of tech, we need advanced models of society to match.