This is the companion repository to the following medium post: Doing cool data science in Java: how 3 DataFrame libraries stack up
The data was extracted from Eurostat in the beginning of September 2018. I opened the extracted CSV in LibreOffice and saved it again because there were some illegal UTF-8 characters in the Eurostat output that some csv importers couldn't handle directly.
| Library | Maintained | Version | Time (ms) |
|---|---|---|---|
| DuckDb | Y | 1.3.0 | 93 |
| DFLib | Y | 1.3.0 | 226 |
| Kotlin DataFrame | Y | 1.0-beta2 | 816 |
| Tablesaw | Y | 0.44.1 | 820 |
| Joinery | n | 1.9 | 1,478 |
| Krangl | n | 0.18.4 | 1,796 |
| Morpheus | n | 0.9.23 | * |
- Morpheus is no longer maintained and doesn't seem to work on later java versions (error related to accessing
sun.util.calendar.ZoneInfo)
The code for the three libraries is present in the Test{libraryname}.java files. They all use CheckResult.java to do a basic correctness check for the top-growing cities.
As described in the medium post, I couldn't find a good way to do the pivot step in datavec, but I included the code I wrote up until that point.