View All Public Notes
About Miller johnkerl.org
"Miller is like sed, awk, cut, join, and sort for name-indexed data such as CSV."

"With Miller you get to use named fields without needing to count positional indices. For example: % mlr --csv cut -f hostname,uptime mydata.csv % mlr --csv --rs lf filter '$status != "down" && $upsec >= 10000' *.csv % mlr --nidx put '$sum = $7 + 2.1*$8' *.dat % grep -v '^#' /etc/group | mlr --ifs : --nidx --opprint label group,pass,gid,member then sort -f group % mlr join -j account_id -f accounts.dat then group-by account_name balances.dat % mlr put '$attr = sub($attr, "([0-9]+)_([0-9]+)_.*", "\1:\2")' data/* % mlr stats1 -a min,mean,max,p10,p50,p90 -f flag,u,v data/* % mlr stats2 -a linreg-pca -f u,v -g shape data/* This is something the Unix toolkit always could have done, and arguably always should have done. It operates on key-value-pair data while the familiar Unix tools operate on integer-indexed fields: if the natural data structure for the latter is the array, then Miller’s natural data structure is the insertion-ordered hash map. This encompasses a variety of data formats, including but not limited to the familiar CSV. (Miller can handle positionally-indexed data as a special case.)" #!TO_TAG_bit #homebrew_formulas #cli #dev #open_source #language:C #pub