Find and replace html tags in a database
I recently had a client want to change all h2 tags to h1 tags in a refinerycms app I managed. They had several hunderd pages, so manually doing it was not sounding good and using regular expressions for html parsing sounded like trouble.
Then I thought of Nokogiri. Nokogiri does a good job of parsing html right? But how do I change the html?
I ended up writin a rake task searched the body parts (main content body for refinery) for h2 tags and then used Nokogiri to change the tags.
To change the tags, you can just h2.name = "h1"
. Another problem I ran into was that I was using
Nokogiri::HTML(page.body)
But when I called to_html
on it, I'd get the doctype and body tags with it. Then I found you can use fragment:
Nokogiri::HTML.fragment(page.body)
And that only retuned exactly what I wanted.
Check it out:
desc "Change body tags"
task :change_body_tags => :environment do
puts "Changing Body Tags"
body_parts = Refinery::PagePart.where(:title => "Body")
body_parts.each do |body_part|
body_part.translations.each do |page|
doc = Nokogiri::HTML.fragment(page.body)
doc.css("h2").each{|h2| h2.name = "h1" }
page.body = doc.to_html
page.save
puts "Updated id # #{page.id}"
end
end
end