12 DecScript to Convert Windows-1252 files to UTF-8

I had several hundred (over 1000) HTML files in a directory. They were unfortunately encoded in Windows-1252 and I wanted them all converted to UTF-8, but I was not willing to open the files one by one or feed their names to a script (there’s too many) so I needed a script that would operate on the whole directory and spit out the converted files in one fell swoop.

If you’re not familiar with encodings the visual problem one sees is that Firefox displays little black diamonds with question marks inside them for characters it doesn’t understand (I think they’re mostly tabs, spaces, and em-dashes in this case.)

With help from friends and the internet I learned about the GNU/Linux command-line tool iconv which handled this perfectly. Here’s the bash script I used that made it work on the entire directory at once:

LIST=`ls *.html`
for i in $LIST;
do iconv -f WINDOWS-1252 -t UTF8 $i -o $i.”utf8″;
mv $i.”utf8″ $i;

It seems that iconv requires a new name for the output file, so the above script temporarily names them *.utf and then moves them back over the original .html files. Hopefully this helps someone else.

3 Responses to “Script to Convert Windows-1252 files to UTF-8”

  1. Bernhard says:

    that’s exactly what I was looking for, thanks a lot :)

    I have a little suggestion: If you change the code to this:

    for file in “$@”
    iconv -f “WINDOWS-1252″ -t “UTF-8″ “$file” -o “$file.utf8_TEMP_CONV_FILE”;
    mv “$file.utf8_TEMP_CONV_FILE” “$file”;

    Then you can call the script and pass the filenames as arguments. E.g. if you save that under the name “win2dos.sh”, you can invoke the script like

    win2dos.sh *.htm


    win2dos.sh file1.txt file2.txt



  2. \ \ says:

    This script works fine for MOST cases, but not ALL: What if a filename contains a blank space?

  3. Yasser says:

    can you please edit this script to convert files in subfolders too, thanks