Messy data is like rusted iron—it’s hard to work with until you clean the surface. Most CSVs come with headers like Student Name , ID #, or Fee (USD). Those spaces and symbols are ‘slag’ that causes errors in our scripts.
The Sed Cleaning Ritual
We can use a single sed strike to transform the first line of any file:
- Remove Symbols: Delete things like
#or(). - Replace Spaces: Turn spaces into underscores
_. - Lowercase Everything: Consistency is the key to sovereignty.
The Command
head -n 1 raw_data.csv | sed 's/[()#]//g; s/ /_/g; s/^[[:upper:]]/\L&/g'
The Result
Student Name (#) becomes student_name.
Now, whether you are a mechanic tracking spare parts or a student organizing a thesis, your data is predictable. You’ve used the sculptor’s chisel to make the iron smooth.
Forged in the terminal. Refined under the anvil.