Wordpress

WordPress: get all posts with no audio files attached

Useful SQL query for wordpress:

SELECT p.post_title, p.guid
FROM wp_posts p
WHERE p.post_type = 'post' AND
NOT EXISTS (SELECT *
FROM wp_posts a
WHERE p.ID = a.post_parent
AND a.post_mime_type LIKE '%audio%');

Generating thumbnails on the fly with WordPress

One advantage of Drupal’s resizing images over WordPress – WordPress’s resized images are generated at upload time only, whereas Drupal’s will be generated automatically on page load if the thumbnail doesn’t already exist.

You could put this function into your WP template – given an image url $image_url, it checks for the address of its thumbnail $thumb_url. If it doesnt find it, then it generates a thumbnail of size $xdim x $ydim

function maybe_generate_userarticlethumb($thumb_url, $image_url, $xdim, $ydim) {  
  
  /* There might be quicker ways to check if url exists than file_get_contents */
  if(!@file_get_contents($thumb_url)) {   
    $imagepath = str_replace(array($_SERVER["HTTP_HOST"], 'http://', 'https://'), array($_SERVER["DOCUMENT_ROOT"], '', ''), $image_url);
    $thumbpath = str_replace(array($_SERVER["HTTP_HOST"], 'http://', 'https://'), array($_SERVER["DOCUMENT_ROOT"], '', ''), $thumb_url);
    $image = wp_get_image_editor($imagepath);
    if ( ! is_wp_error( $image ) ) {
       /* Use any of WP's image manipulation functions here */
       $image->resize( $xdim, $ydim, true );
       $image->save($thumbpath);
    }
  }
}  

Then just call maybe_generate_userarticlethumb($thumb_url, $image_url, $xdim, $ydim) in your template, and print $thumb_url where you need to.

Migrating static HTML sites into a WordPress multisite

1. Create a multisite account for the site you are about to import

We set up our multisite using subdomains – this is useful for us as some of our sites there really are subdomains of the main site. If you use subfolders, some of the steps below might be different – unfortunately we havent tested them for that case.

2. Import the site

We are using the HTML Import module. You can import blog posts and pages in seperate import runs. For pages, it will keep the hierarchy.

3. Clean up superflous HTML

Sometimes there is unneeded content that appears on every page. For example our HTML sites were scraped from Plone sites, so there was lots of template cruft at the beginning and end of every post (plus the title was inside the CSS class that served as our ‘body’ class, so it ended up inside the main content too). So here’s a little php script you can modify for your own ends. Note that you will also need to download the htmLawed script and place it in the same folder as the script below.

 2,
'tidy' => 1,
'elements' => $elements,
'cdata' => 1,
'comment' => 1,
'deny_attribute' => 'align'
);

mysql_select_db($dbname);

$doc = new DOMDocument();

$query = mysql_query("SELECT ID, post_date, post_title, post_name, post_content, post_type FROM " . POSTTABLE . " WHERE post_type IN ('post', 'page') AND post_status = 'publish'");

while($post = mysql_fetch_object($query)) {

if(TESTINGSINGLE == false || $post->ID == TESTID) {
print "\n\n" . '**opening ' . $post->ID . ' - name: ' . $post->post_name . "\n";
}

$oldpost = $post->post_content;
$doc->loadHTML($oldpost);

$newtabledom = new DOMDocument;
$xpath = new DOMXPath($doc);

$newtabledom = $doc;
$pagepath = new DOMXPath($newtabledom);

// Remove content inside certain classes inside page
// The below elements are leftover elements from Plone
// See XPath documentation for more details of how to make queries
$toremove= $pagepath->query("//h1[@class='documentFirstHeading'] | //p[@class='documentDescription'] | //div[@class='documentDescription'] | //div[@class='documentByLine'] | //div[@class='documentActions'] | //div[@id='relatedItems'] | //div[@class='discussion'] | //a[@id='documentContent'] | //a[@class='link-parent']");

foreach ($toremove as $entry) {
$entry->parentNode->removeChild($entry);
}

// We try and change classes of images to use the WordPress floated classes
// The script detects classes (or parent div classes) that have the words
// left right or center and renames them
$imgtags = $doc->getElementsByTagName('img');
foreach($imgtags as $child) {
$linkclass = $child->attributes->getNamedItem('class')->nodeValue;
$alignclass = $child->attributes->getNamedItem('align')->nodeValue;
$linkfile = $child->attributes->getNamedItem('src')->nodeValue;

if(strpos($linkclass, 'left') !== false || strpos($alignclass, 'left') !== false) {
$child->setAttribute( 'class' , 'alignleft' );
}
else if (strpos($linkclass, 'right') !== false || strpos($alignclass, 'right') !== false) {
$child->setAttribute( 'class' , 'alignright' );

}
else if (strpos($linkclass, 'centre') !== false || strpos($linkclass, 'center') !== false
|| strpos($alignclass, 'centre') !== false || strpos($alignclass, 'center') !==false ) {
$imageinfo = @getimagesize(SITEROOT . $linkfile);
$child->setAttribute( 'class' , 'aligncenter' );
}
else {

// get parent
$parent = $child->parentNode;

if ($parent) {
$grandparent = $parent->parentNode;
$parentclass = $parent->attributes->getNamedItem('class')->nodeValue;
$parentalign = $parent->attributes->getNamedItem('align')->nodeValue;
$grandparentclass = $grandparent->attributes->getNamedItem('class')->nodeValue;
$grandparentalign = $grandparent->attributes->getNamedItem('align')->nodeValue;
}
else {
$parentclass = '';
$parentalign = '';
$grandparentclass = '';
$grandparentalign = '';
}

if (strpos($parentclass, 'left') !== false || strpos($parentalign, 'left') !== false) {
print "\n\n" . '** left parent ' . $linkfile . " - class: " . $parentclass . " - align: " . $parentalign. "\n";
$child->setAttribute( 'class' , 'alignleft' );
}
else if (strpos($parentclass, 'right') !== false || strpos($parentalign, 'left') !== false) {
print "\n\n" . '** right parent ' . $linkfile . " - class: " . $parentclass . " - align: " . $parentalign. "\n";
$child->setAttribute( 'class' , 'alignright' );
}
else if (strpos($parentclass, 'centre') !== false || strpos($parentclass, 'center') !== false
|| strpos($parentalign, 'centre') !== false || strpos($parentalign, 'center') !== false) {
print "\n\n" . '** centred parent ' . $linkfile . " - class: " . $parentclass . " - align: " . $parentalign. "\n";
$imageinfo = @getimagesize($linkfile);
$child->setAttribute( 'class' , 'aligncenter' );
}
else if (strpos($grandparentclass, 'left') !== false || strpos($grandparentalign, 'left') !== false) {
print "\n\n" . '** left grandparent ' . $linkfile . " - class: " . $grandparentclass . " - align: " . $grandparentalign. "\n";
$child->setAttribute( 'class' , 'alignleft' );
}
else if (strpos($grandparentclass, 'right') !== false || strpos($grandparentalign, 'left') !== false) {
print "\n\n" . '** right grandparent ' . $linkfile . " - class: " . $grandparentclass . " - align: " . $grandparentalign. "\n";
$child->setAttribute( 'class' , 'alignright' );
}
else if (strpos($grandparentclass, 'centre') !== false || strpos($grandparentclass, 'center') !== false
|| strpos($grandparentalign, 'centre') !== false || strpos($grandparentalign, 'center') !== false) {
print "\n\n" . '** centred grandparent ' . $linkfile . " - class: " . $grandparentclass . " - align: " . $grandparentalign. "\n";
$imageinfo = @getimagesize($linkfile);
$child->setAttribute( 'class' , 'aligncenter' );
}
}
}

// Replace underscores with dashes inside relative links
// we are excluding ../ links for now - too complicated
$atags = $doc->getElementsByTagName('a');
foreach($atags as $child) {

$linkhref = $child->attributes->getNamedItem('href')->nodeValue;

if (!(substr($linkhref, 0, 4) == 'http' || substr($linkhref, 0, 1) == '/' || substr($linkhref, 0, 3) == '../')) {

$parent = mysql_fetch_object(mysql_query("SELECT post_name FROM " . POSTTABLE . " WHERE post_parent = " . $post->post_parent));

// print "\n\n" . 'parent name: ' . $parent->post_name . "\n";
if(strpos($linkhref, '_') !== FALSE && (strpos($post->post_name, '-') !== FALSE || strpos($parent->post_name, '-') !== FALSE) ) {
print "\n\n" . 'link: ' . $linkhref . "\n";

$linkhref = str_replace('_', '-', $linkhref);
$linkhref = preg_replace('/--+/', '-', $linkhref);

print "\n\n" . 'changed internal link: ' . $linkhref . "\n";

$child->setAttribute( 'href' , $linkhref);
}
}

}

// Output HTML from query documents
$newtablehtml = $newtabledom->saveHTML();

// Text rewriting
// This can be modified to your needs
$newtablehtml = str_replace('[...]', '', $newtablehtml);
$newtablehtml = str_replace('/index.html"', '"', $newtablehtml);
$newtablehtml = str_replace('/"', '"', $newtablehtml);
$newtablehtml = str_replace('https://my.', 'http://www.', $newtablehtml);

// Sometimes there are encoding issues which need dealing with
$newtablehtml = str_replace(' ', '', $newtablehtml);
$newtablehtml = str_replace('Â', '', $newtablehtml);
$newtablehtml = str_replace('„', '', $newtablehtml);
$newtablehtml = str_replace('â€&#8482', "'", $newtablehtml);
$newtablehtml = str_replace("‘", "'", $newtablehtml);
$newtablehtml = str_replace("’", "'", $newtablehtml);
$newtablehtml = str_replace("“", "'", $newtablehtml);
$newtablehtml = str_replace("”", "'", $newtablehtml);
$newtablehtml = str_replace("–", " - ", $newtablehtml);
$newtablehtml = str_replace("—", " - ", $newtablehtml);
$newtablehtml = str_replace("’", "'", $newtablehtml);
$newtablehtml = str_replace("“", "", $newtablehtml);
$newtablehtml = str_replace("”", "", $newtablehtml);

// Sometimes the old page still contains html doctype
// inside the content tag
if (strpos($newtablehtml, '') !== 0) {
$newtablehtml = str_replace('', '', $newtablehtml);
}

// Remove empty paragraphs
$newtablehtml = preg_replace("#]*>(\s| ?)*

#", '', $newtablehtml); // Now run htmLawed to clean up $newtablehtml = htmLawed($newtablehtml, $config); // Normalise post titles in all caps if (strtoupper($post->post_title) == $post->post_title) { $post->post_title = ucwords(strtolower($post->post_title)); } if(strlen($newtablehtml) > 30) { // Post name exists - save new post content only if(strlen(trim($post->post_name)) > 0) { if(TESTINGSINGLE == false || $post->ID == TESTID) { $query2 = "UPDATE " . POSTTABLE . " SET post_content = '" . mysql_real_escape_string($newtablehtml) . "', post_title = '" . $post->post_title . "' WHERE ID = ". $post->ID; mysql_query($query2); } } // Need to generate post content from title else { $postname = strtolower(sanitize_file_name($post->post_title)); if(TESTINGSINGLE == false || $post->ID == TESTID) { $query2 = "UPDATE " . POSTTABLE . " SET post_content = '" . mysql_real_escape_string($newtablehtml) . "', post_name ='" . mysql_real_escape_string($postname) . "',post_title = '" . $post->post_title . "' WHERE ID = ". $post->ID; mysql_query($query2); } } } else { // Delete posts with v little or no content $query3 = mysql_query("SELECT ID FROM " . POSTTABLE . " WHERE post_type IN ('post', 'page') AND post_status = 'publish' AND post_parent = " . $post->ID); if(!mysql_fetch_object($query3)) { if(TESTINGSINGLE == false || $post->ID == TESTID) { mysql_query("DELETE FROM " . POSTTABLE . " WHERE ID = ". $post->ID); } } } } // Taken from the WP function function sanitize_file_name( $filename ) { $filename_raw = $filename; $special_chars = array("?", "[", "]", "/", "\\", "=", "<", ">", ":", ";", ",", "'", "\"", "&", "$", "#", "*", "(", ")", "|", "~", "`", "!", "{", "}", "–", "—","—", chr(0)); $filename = str_replace($special_chars, '', $filename); $filename = preg_replace('/[\s-]+/', '-', $filename); $filename = trim($filename, '.-_'); $entities = array("%e2", "%80", "%9c", "%9d", "%94", "%a0". "%93", "%99"); $filename = str_replace($entities, '', $filename); $unique = 0; $i = 0; while (!$unique) { $query = mysql_query("SELECT post_name FROM " . POSTTABLE . " WHERE post_name IN ('post', 'page') AND post_name = '" . $filename . "'"); if(mysql_fetch_object($query)) { print('***not unique - ' . $filename); $filename = $filename . '-' . $i; $i = $i + 1; } else { $unique = 1; } } return $filename; } ?>

4. Create redirections from one site to another

The also generates a very nice .htaccess file we can use as the basis for our redirects. Unfortunately we can’t use this directly in the .htaccess file for wordpress multisites, as the same htaccess file is used across all sites. Fortunately we can use the Redirection module, which lets WordPress handle the redirections instead of Apache.

For static sites, we often need to cater for the case where the URL ends in / as well as /index.html. So we need to rewrite our redirects a little – heres a little shell script you can run. Copy the generated .htaccess file to your desktop and run

# specify your original domain here - ie the one that occurs first in the .htaccess rule
DOMAIN = http://www.domain.com
# Replace tabs with spaces - much easier to deal with
expand -t1 htaccess > htaccess1
# this replaces your domain with ^/ - makes it much easier to target remaining
sed 's-$DOMAIN/-^/-g' htaccess2 > htaccess3
# Use RedirectMatch
sed 's/Redirect/RedirectMatch 301/g' htaccess3 > htaccess4
# We wont use the mod_rewrite way as Redirect doesnt handle that
sed 's/[R=301,NC,L]//g' htaccess4 > htaccess5
# This tacks on a regex that will handle / and index.html at the end of a URL
# note: this is for the case where your static URLS in .htaccess dont end in
# either / or /index.html
sed 's- http://-(:?/index.html|/)?$ http://-g' htaccess5 > htaccess_new

# comment the above line and uncomment one of these if your static URLs
# end in / (1st one) or /index.html (2nd one)
# sed 's-/ http://-(:?/index.html|/)?$ http://-g' htaccess5 > htaccess_new
# sed 's-/index.html-(:?/index.html|/)?$-g' htaccess5 > htaccess_new

Test it out on one line, and then see if it works.

Separating a wordpress multisite into individual installations

We had 2 large sites on the same WordPress multisite, but then we decided to split it up for the following reasons:

  • From a sysadmin point of view, its always better to keep large sites in their own account
  • Easier from a debugging point of view – the simpler the setup, the easier it is to find issues
  • A lot of plugins (eg Broken link checker) don’t work with multisites. Also its our experience that with many other plugins (eg caching) the experience is less than optimal compared to if they were on their own installation.

So…we decided to take a WP multisite with 2 sites (www.default.org – the default site, and www.addon.org which was added as a multisite subomain addon.default.org and then mapped to www.addon.org using the Domain Mapping plugin)

Extracting the database for the addon site:

  • Save a copy of your db somewhere, because we will first extract the db for www.addon.org, before returning to the original db to extract default.org. To extract the db for addon.org, we edited the table using phpmyadmin.
  • Empty any tables that can be reindexed (eg for the relevansii or broken links plugins)
  • Remove all wp_ tables (these store the data for www.default.org) that have a matching wp_2_ table (these store the data for www.addon.org) Be very careful here, as a few tables eg wp_users and wp_usermeta need to be kept that were shared by both installations – so these have no wp_2_ equivalent.
  • Then you can rename all wp_2_ tables to wp_ – for each table, select ‘Structure’ and then ‘Operations’ (maybe there is a faster way to do this via script, but this took me only 20 mins)
  • Edit wp-config.php and comment out the multisite variables and the sunrise variable associated with the Domain Mapping plugin
  • In wp-content folder, remove the sunrise.php file. Also it is probably a good idea to remove any cache files in this directory and temporarily get rid of any caching plugins.
  • In our newly renamed wp_options table change the site_name and home variables from addon.default.org to www.addon.org, and change the upload_path variable from wp-content/blogs.dir/2/files to wp-content/uploads
  • Now we need to do some find and replace in our newly renamed wp_posts table: we need to change all instances of wp-content/blogs.dir/2/files to wp-content/uploads and also all instances of addon.default.org to www.addon.org. So in the SQL tab you can do the following commands:
    update wp_posts set post_content = replace(post_content,’wp-content/blogs.dir/2/files’,’wp-content/uploads’);
    update wp_posts set post_content = replace(post_content,’addon.default.org’,’www.addon.org’);
    It is no harm to also change the guid entry, just for the sake of cleanliness:
    update wp_posts set guid = replace(guid,’wp-content/blogs.dir/2/files’,’wp-content/uploads’);
    update wp_posts set guid = replace(guid,’addon.default.org’,’www.addon.org’);
    You might also need to do search and replace in any other database tables added by other plugins eg the redirection module.
  • Finally, delete the uploads folder (again, make sure it is backed up somewhere) and rename the wp-content/blogs.dir/2/files folder to wp-content/uploads