in tutorial sql ~ read.

Migrating from Joomla to Ghost

After yet another Joomla security issue and the rise of more simple programming using Node.js I thought it was time to port my old Joomla 2.5.x blog to a new platform: Ghost.

Unfortunately I couldn't find many resources online on how to do the data migration but with a bit of SQL and some regex I managed to script out the data from my blog and load into Ghost quite easily.

This method uses the REGEXP_REPLACE function which exists in MariaDB but not MySQL (but you should be using MariaDB instead of MySQL anyway). You need to replace @prefix with your table prefix which was set up when you installed Joomla.

The first script extracts the posts from the _content table. The regex will find any images in your existing posts and rename them into the Ghost structure (/content/images/[year]/[month]/[filename]). It uses the created date of the post to apply the naming convention to the images contained in the post. This worked well for me as my photos could easily be exported into this naming convention using the EXIF data. When Ghost loads the data it must parse the markdown field and generate the html data which is why it can be left null. Nested replace calls may not be the most elegant code but it will work well enough.

-- posts
SELECT  
   id as 'id'
  ,null as 'uuid'
  ,`title` as 'title'
  ,`alias` as 'slug'
  ,REPLACE(
    REPLACE(
      REPLACE(
        REPLACE(
          REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  REGEXP_REPLACE(
                    REGEXP_REPLACE(`fulltext`
                    ,'(<img.*?src="(?:[^"]*?([^/\/\]+(?:jpg|gif|png|JPG)))".*?alt="([^"]*)".*\/>)'
                    ,CONCAT('![\\3](/content/images/',DATE_FORMAT(`created`,'%Y/%m/'),'\\2)'))
                    ,'(<p>(.*)<\/p>)'
                  ,CONCAT('\\2','\r\n'))
                ,'&nbsp;',' ')
              ,'JPG','jpg')
            ,'<strong>','**')
          ,'</strong>','**')
        ,'<h4>','####')
      ,'</h4>','')
    ,'<em>','*')
  ,'</em>','*')      
  as 'markdown'
  ,null as 'html'
  ,null as 'image'
  ,0 as 'featured'
  ,0 as 'page'
  ,"published" as 'status'
  ,"en_US" as 'language'
  ,null as 'meta_title'
  ,null as 'meta_description'
  ,1 as 'author_id'
  ,UNIX_TIMESTAMP(`created`)*1000 as 'created_at'
  ,1 as 'created_by'
  ,UNIX_TIMESTAMP(`modified`)*1000 as 'updated_at'
  ,1 as 'updated_by'
  ,UNIX_TIMESTAMP(`publish_up`)*1000 as 'published_at'
  ,1 as 'published_by'
FROM @prefix_content  
WHERE state=1; -- only published records  

The second script extracts creates tags from the _categories table. I am eliminating many categories I didn't want ported so you can choose if you want to do that.

--tags
SELECT  
   id as 'id'
  ,null as 'uuid'
  ,title as 'name'
  ,alias as 'slug'
  ,REPLACE(REPLACE(description,'<p>',''),'</p>','') as 'description'
  ,null as 'image'
  ,0 as 'hidden'
  ,parent_id as 'parent_id'
  ,null as 'meta_title'
  ,null as 'meta_description'
  ,UNIX_TIMESTAMP(`created_time`)*1000 as 'created_at'
  ,1 as 'created_by'
  ,UNIX_TIMESTAMP(`modified_time`)*1000 as 'updated_at'
  ,1 as 'updated_by'
FROM @prefix_categories  
WHERE alias NOT IN ('uncategorised','home','root','2008','2009','2010','2011','2012','2013','2014','2015');  

Finally, because MariaDB does not play nicely with Common Table Expressions to do recursion, I am manually recursing up the _categories hierarchy to create posts_tags by linking each _content record to all its _categories. This one recurses up 5 levels but you could quickly add more with this pattern.

SELECT  
   @i:=@i+1 AS 'id'
  ,_content.id AS 'post_id'
  ,tag_id as 'tag_id'
  ,0 as 'sort_order'
FROM  
@prefix_content _content
INNER JOIN  
(
    SELECT l1.id, l1.id tag_id, l1.alias
    FROM @prefix_categories l1
    WHERE l1.id NOT IN (SELECT parent_id FROM @prefix_categories WHERE parent_id IS NOT NULL)
    UNION
    SELECT l1.id, l2.id tag_id,l2.alias
    FROM @prefix_categories l1
    INNER JOIN @prefix_categories l2 ON l2.id = l1.parent_id
    WHERE l1.id NOT IN (SELECT parent_id FROM @prefix_categories WHERE parent_id IS NOT NULL)
    UNION
    SELECT l1.id, l3.id tag_id,l3.alias
    FROM @prefix_categories l1
    INNER JOIN @prefix_categories l2 ON l2.id = l1.parent_id
    INNER JOIN @prefix_categories l3 ON l3.id = l2.parent_id
    WHERE l1.id NOT IN (SELECT parent_id FROM @prefix_categories WHERE parent_id IS NOT NULL)
    UNION
    SELECT l1.id, l4.id tag_id,l4.alias
    FROM @prefix_categories l1
    INNER JOIN @prefix_categories l2 ON l2.id = l1.parent_id
    INNER JOIN @prefix_categories l3 ON l3.id = l2.parent_id
    INNER JOIN @prefix_categories l4 ON l4.id = l3.parent_id
    WHERE l1.id NOT IN (SELECT parent_id FROM @prefix_categories WHERE parent_id IS NOT NULL)
    UNION
    SELECT l1.id, l5.id tag_id,l5.alias
    FROM @prefix_categories l1
    INNER JOIN @prefix_categories l2 ON l2.id = l1.parent_id
    INNER JOIN @prefix_categories l3 ON l3.id = l2.parent_id
    INNER JOIN @prefix_categories l4 ON l4.id = l3.parent_id
    INNER JOIN @prefix_categories l5 ON l5.id = l4.parent_id
    WHERE l1.id NOT IN (SELECT parent_id FROM @prefix_categories WHERE parent_id IS NOT NULL)
) _categories ON _content.catid = _categories.id 
AND _categories.alias NOT IN ('uncategorised','home','root','2008','2009','2010','2011','2012','2013','2014','2015')  
,(SELECT @i:=0) AS variables
WHERE _content.state=1;  -- only published records

I executed these scripts with MySQL Workbench which will allow you to manually export the result sets as JSON files. I had to open these exports with a text editor and replace NULL to null as the MySQL Workbench JSON extract doesn't conform to JSON standards.

Starting with a new Ghost installation you should then export the default schema from the Labs section, override the posts,tags and posts_tags sections with the output from these extracts. You can then reload your exported file and you should be good to go.

comments powered by Disqus