Ever have a .csv file that's multiple terabytes in size? And have someone run an awk command on it to change a few things? AND have that awk command remove all of the t's from the file?
This Clojure script was designed as a backstop for those sorts of strange/interesting file problems by checking for the existence of a given set of characters and logging characters that weren't found!
- Java v1.7+
Usage: java -jar char-check.jar [OPTION]... [FILE]
Check for the existence of character classes in FILE or standard input.
-u, --upper Check for uppercase letters [A-Z]
-l, --lower Check for lowercase letters [a-z]
-n, --number Check for numbers [0-9]
-6, --hex Check for hexidecimal numbers [0-9a-f]
-p, --punctuation Check for common punctuation [.,?!&-'";:]
-s, --symbol Check for symbols [`~!@#$%^&_-+*/=(){}[]|\:;"'<,>.?}]
-h, --help
With no FILE, read standard input.
Examples:
java -jar char-check.jar -l test_file.txt
echo "abcdefghijklmnopqrstuvwxy" | java -jar char-check.jar -l
Please note that these can be combined... to check for existence of upper and lowercase letters and digits in a file named testdata.txt the command line should look something like:
java -jar char-check.jar -uln testdata.txtAlternatively for streaming from stdin:
cat testdata.txt | java -jar char-check.jar -uln