TruncString Bugs

joomla
hack

There is an annoying bug in the Joomla core JHtmlString truncate function. There is also an even more annoying (to me) flaw in the way it works.

Fixing these problems require modification to a core file libraries/cms/html/string.php

The blog post describes the problem in more detail, but here is the required new code for the truncate function in the file.

Here are some examples. The first case has text "test without tags" and is set to truncate to 15 chars which is at the 'a' in 'tags' so we should expect to get an ellipsis after 'without' truncating the text at the word break before 15 chars.

test1:

before: test without tags
truncate at char 15 - the a in tags
after: test without...

So that works ok. Now lets wrap the text in a <b> tag and truncated at the 18th character which should be the space after 'bold' (we are using bold rather than para tag as the para tags get reformatted elsewhere in the system)

test2:

before: <b>test with bold tag</b>
truncate at char 18 - the space after bold
after: <b>test with</b>...

well that's odd - not only has the ellipsis been placed outside the closing bold tag when logically (to my way of thinking) it should be before the closure since it is the bolded text which is being truncated, but also the 18th char is the space after a word, so that last word 'bold' should have been included. To illustrate the first point consider this version where we use an <h4> tag.

test3:

before: <h4>test with h4 tag</h4>
truncate at char 16 - the 4 in h4
after: <h4>test with</h4>...

test4:

before: <a href="link">text with anchor</a> tag
truncate at char 18 - the x in text
after: <a...

whoops, we've got a nasty unclosed a tag with none of the text. We should get just the ellipsis. This only ocurs if the <a...> is at the begining of the string and the truncate point is in the first word in the text. If the truncate is within the <a...> then you get just an ellipsis, if there is a word before the <a...> then the tag gets closed correctly. The problem occurs because there is no space after the first > (as there doesn't have to be) so 'href="link">text' is seen as all one word

Now lets do the same tests with the fixed code:

test1:

before: test without tags
truncate at char 15 - the a in tags
after: test without...

test2:

before: <b>test with bold tag</b>
truncate at char 18 - the space after bold
after: <b>test with bold...</b>

test3:

before: <h4>test with h4 tag</h4>
truncate at char 16 - the 4 in h4
after: <h4>test with h4...</h4>

test4:

before: <a href="link">text with anchor</a> tag
truncate at char 18 - the x in text
after: ...

So those all seem better. There may be other quirks and the truncate function could be much improved by only counting characters that actually get displayed (which is what the string.truncateComplex() function is supposed to do, but that is also buggy and not widely used as far as I can see)

Since fixing these requires a mod to a core library file there is no way to override it. The buggy version will be restored at every Joomla update - so the fix below will need to be re-instated after any Joomla core update until such time as it gets sorted. This code needs to replace the existing truncate function in the file. NB use at your own risk, this has only been tested by me using it on sites I run and with browsers I use - it needs to be checked against the full Joomla unit tests but I am not competent to do that and don't intend to learn...

   public static function truncate($text, $length = 0, $noSplit = true, $allowHtml = true)
    {
        // Assume a lone open tag is invalid HTML.
        if ($length === 1 && $text[0] === '<')
        {
        	return '...';
        }

        // Check if HTML tags are allowed.
        if (!$allowHtml)
        {
            // Deal with spacing issues in the input.
            $text = str_replace('>', '> ', $text);
            $text = str_replace(array('&nbsp;', '&#160;'), ' ', $text);
            $text = StringHelper::trim(preg_replace('#\s+#mui', ' ', $text));

            // Strip the tags from the input and decode entities.
            $text = strip_tags($text);
            $text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');

            // Remove remaining extra spaces.
            $text = str_replace('&nbsp;', ' ', $text);
            $text = StringHelper::trim(preg_replace('#\s+#mui', ' ', $text));
        }

        // Whether or not allowing HTML, truncate the item text if it is too long.
        if ($length > 0 && StringHelper::strlen($text) > $length)
        {
           //test if the next character is a space - if it is include it so we don't loose the word
           if ($text[$length] == ' ')
           {
		        ++$length;
           }
           //trim leading spaces, leave trailing ones so as not to loose the last word
           $tmp = ltrim(StringHelper::substr($text, 0, $length));
		    
           //test if all we have is an incomplete tag
           if ($tmp[0] === '<' && strpos($tmp, '>') === false)
           {
                return '...';
           }

           // $noSplit true means that we do not allow splitting of words.
           if ($noSplit)
           {
                // Find the position of the last space within the allowed length.
                $offset = StringHelper::strrpos($tmp, ' ');
                // If there are no spaces and the string is longer than the maximum
                // we need to just use the ellipsis. In that case we are done.
                if ($offset === false && strlen($text) > $length)
                {
                    return '...';
                }
                $tmp = StringHelper::substr($tmp, 0, $offset + 1);
           }

           if ($allowHtml)
           {
                // Put all opened tags into an array
                preg_match_all("#<([a-z][a-z0-9]*)\b.*?(?!/)>#i", $tmp, $result);
                $openedTags = $result[1];

                // Some tags self close so they do not need a separate close tag.
                $openedTags = array_diff($openedTags, array('img', 'hr', 'br'));
                $openedTags = array_values($openedTags);

                // Put all closed tags into an array
                preg_match_all("#</([a-z][a-z0-9]*)\b(?:[^>]*?)>#iU", $tmp, $result);
                $closedTags = $result[1];

                $numOpened = count($openedTags);

                // Check if we end inside a tag; if we are remove it to get rid of the fragment
                if (StringHelper::strrpos($tmp, '<') > StringHelper::strrpos($tmp, '>'))
                {
                    $offset = StringHelper::strrpos($tmp, '<');
                    $tmp = StringHelper::trim(StringHelper::substr($tmp, 0, $offset));
                }
                //now we can add the ellipsis
                $tmp .= '...';

                // Not all tags are closed so close them and finish.
                if (count($closedTags) !== $numOpened)
                {
                    // Closing tags need to be in the reverse order of opening tags.
                    $openedTags = array_reverse($openedTags);

                    // Close tags
                    for ($i = 0; $i < $numOpened; $i++)
                    {
                        if (!in_array($openedTags[$i], $closedTags))
                        {
                            $tmp .= '</' . $openedTags[$i] . '>';
                        } else {
                            unset($closedTags[array_search($openedTags[$i], $closedTags)]);
                        }
                    }
                }
            } else {
                // $allowHtml==false so just add an ellipsis
                $tmp .= '...';
            }

            if ($tmp === false || strlen($text) > strlen($tmp))
            {
                $text = trim($tmp); 
            }
        }

        // Clean up any internal spaces created by the processing.
        $text = str_replace(' </', '</', $text);
        $text = str_replace(' ...', '...', $text);

        return $text;
    }

CrOsborne Software

TruncString Bugs

Latest Articles