Here you can find all my discoveries on Github, projects I starred and liked or you can visit my personal Github profile.
vaex
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
Check on Githubdeequ
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
Check on Github