Exporting WordPress Posts
Blog post created on 2020-11-03
At dodify we are currently working on migrating Olbia.it from WordPress to our home grown CMS. And of course the first challenge is migrating all the content. There are many plugins and options available that will export the WordPress posts in various formats but it immediately felt that converting from the XML (or similar) output of sed plugins to the format we require was more work than just being able to get access to the raw data! So SQL to the rescue. After a little fiddling and some help from Google we get the following query:
SELECT post.id AS `ID`, -- Author user.user_login AS `Author Username`, user.user_email AS `Author Email`, user.display_name AS `Author Name`, -- Post post.post_date AS `Date`, post.post_title AS `Title`, post.post_excerpt AS `Short Title`, post.post_name AS `URL`, post.post_content AS `Content`, -- Post meta (SELECT thumb.guid FROM wp_posts thumb LEFT JOIN wp_postmeta meta ON meta.meta_value = thumb.id WHERE meta.meta_key = '_thumbnail_id' AND meta.post_id = post.id LIMIT 1) AS `Thumbnail`, (SELECT GROUP_CONCAT(wp_terms.name SEPARATOR ', ') FROM wp_terms INNER JOIN wp_term_taxonomy ON wp_terms.term_id = wp_term_taxonomy.term_id INNER JOIN wp_term_relationships wpr ON wpr.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id WHERE taxonomy = 'category' AND post.id = wpr.object_id) AS `Categories`, (SELECT GROUP_CONCAT(wp_terms.name SEPARATOR ', ') FROM wp_terms INNER JOIN wp_term_taxonomy ON wp_terms.term_id = wp_term_taxonomy.term_id INNER JOIN wp_term_relationships wpr ON wpr.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id WHERE taxonomy = 'post_tag' AND post.id = wpr.object_id) AS `Tags` FROM wp_posts post LEFT JOIN wp_users user ON post.post_author = user.id WHERE post.post_type = 'post' AND post.post_status = 'publish';
The result will likely have most of the information you would care about from each WordPress post, namely:
- the post ID;
- the author username;
- the author email;
- the author name;
- the publishing date;
- the title;
- the excerpt if available;
- the URL. Note this will normally be directly on the domain/path WordPress is hosted under and can be used as the source of any redirects if the new system has different links;
- the HTML content;
- the thumbnail link;
- the categories as a comma separated list
- the tags as a comma separated list.
To get all the information the query joins on/uses quite a few tables: wp_posts
, wp_terms
, wp_postmeta
, wp_term_taxonomy
, wp_term_relationships
and wp_users
.
With the results from the query parsing the HTML content and downloading all linked images locally as well as the thumbnail images quickly becomes the next challenge. Hopefully it can be helpful to someone else on the web!