How can I easily convert HTML special entities from a standard input stream in Linux?

Mike B asked:

CentOS

Is there an easy way to convert HTML special entities from a data stream? I’m passing data to a bash script and sometimes that data includes special entities. For example:

"test" & test $test ! test @ # $ % ^ & *

I’m not sure why some characters show up fine and other don’t but unfortunately, I don’t have control over the data coming in.

I’m thinking I might be able to use SED here but that seems like it would be cumbersome and possibly prone to false positives. Is there a Linux command I could pipe to that specializes in decoding this type of data?

My answer:


PHP is well suited to this. This example requires PHP 5:

cat file.html | php -R 'echo html_entity_decode($argn);'

View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.