ORacle SQL: Removing HTML Tags

Problem


You have some text stored in a fields as HTML, and you want to remove it.

select '<HTML><HEAD>THIS IS HEAD.</HEAD><BODY>THIS IS BODY.<P>THIS IS P.</P><H1>THIS IS H1.</H1></BODY></HTML>' STR from dual;

Solution

Easier than I thought. I read some extremely complicated methods, and then stumbled on this beauty:

SELECT REGEXP_REPLACE(STR,'<.*?>') from (
select '<HTML><HEAD>THIS IS HEAD.</HEAD><BODY>THIS IS BODY.<P>THIS IS P.</P><H1>THIS IS H1.</H1></BODY></HTML>' STR from dual);

i.e
 REGEXP_REPLACE(STR,'<.*?>')

Result
THIS IS HEAD.THIS IS BODY.THIS IS P.THIS IS H1.

Acknowledgement

http://nimishgarg.blogspot.co.uk/2011/09/extracting-text-between-html-tags.html


Comments