Since my first steps in WordPress many years ago, I was often bothered when copying texts out of emails or old websites from clients. I didn’t understand why the WordPress editor, aka TinyMCE, doesn’t clean the pasted content from all kinds of classes, styles and tag attributes that for sure we don’t want to be taken over to not mess with our own.

The TinyMCE editor did very well when copying directly from Microsoft Word, at least removing most of the crap, but this handy feature was not implemented for any other sources. This situation is especially tragic when it comes to preventing clients from messing up their sites by pasting texts from all kind of sources. Most people nowadays are not even using Word from Microsoft but the free alternatives as OpenOffice or LibreOffice, which result in a horrible HTML code chaos behind the scenes. The tricky thing is that such dirty code is not visible to the layman’s eye and most often it is not interfering with the proper display. Yet, it can happen that a client asks you why parts of his content have a different color, font-size or even font that he cannot get back to normal no matter which buttons he uses. Even the rubber fails to clear some elements, and the “Paste as text” function is removing all our formatting. A look into the HTML source code then reveals the impuritities – all kinds of classes, styles and attributes sitting in the HTML tags.

My ideal pasting process

I was pondering about how my ideal cleaning process would look l like, and it is quite simple:

  • keep all formatting relevant tags as headings, paragraphs, strong and italic (this is why “paste as text” is not an option)
  • remove EVERYTHING inside HTML Tags directy during the pasting process (catching classes, styles and all other expected or unexpected attributes that in this way will never even appear)
  • with the exception of the A tag that should not loose its HREF, ID and _target information
  • remove images completely to avoid hotlinking (after all the images should be uploaded into the own WordPress installation)
  • remove certain entries completely (as iframes, nav, article, footer etc. which TinyMCE does by default but not with all structural elements)
  • replace br tag with p tag (sometimes we would like to keep line breaks, but much more often copying from some sources delivers all paragraphs as br which we don’t want to correct manually)
  • replace div tag with p tag (because we don’t want divs but sometimes they are used by badly programmed websites as paragraphs)
  • and then in the end, because we might end up with some double or triple empty paragraphs, we remove them

With the help of two programmers I made my dream reality, and I am amazed that a few lines of code have achieved this result that is making my own life and the one of all my clients so much better. Clean code, no unexpected hassle, and we can even copy a whole website into the editor and achieve just the plain content in a perfectly formatted form. Here and there we might still need to manually repair one of the other format, but no comparison to how lousy it worked before.

As an example, this was the default result when copying the content from a website:

And this is the purified result, actually this should be standard:

Of course, if you don’t share my needs and want to keep iframes, you just remove the iframe tag from the list of processed tag elements. This is the code that does the magic, using the paste preprocess of Tiny MCE:

Just copy it into your functions.php of your child-theme, or download it direclty as plugin (no warranty, no update guarantee, but very enjoyable as-it-is). Let me know if you have further improvements.

Gutenberg Editor

One caveat might pop up at the horizon: The newly develped Gutenberg editor that will soon come as the new core editor into WordPress, offers a “Classical Text” module that continues the known experience for all those who are not so keen on the block based story but just want to continue with proper HTML code. Yet so far, in dev version 0.8 of Gutenberg, the paste process completely ceases to work when this plugin is activated. I hope that the developers share my dream of a clean code pasting experience and make it possible to still work or even code it into the core, so no matter if users prefer blocks or the classic TinyMCE, pasting text from no matter which source will be from now on a professional experience for webdesigners as for the not so tech-savvy clients.