SmartString: Open-source string manipulation library with automatic HTML encoding and fluent chainable interface

13 posts by 4 authors in: Forums > CMS Builder
Last Post: Monday at 12:22pm   (RSS)

By Dave - August 27 - edited: August 27

Hi All!

We're thrilled to announce the release of our latest open-source project: SmartString

SmartString is a powerful PHP library designed to simplify string manipulation while prioritizing web security. If you're familiar with how ZenDB fields work in CMSB, you'll find SmartString similar but with expanded features and standalone functionality.

Essentially instead of writing code like this: 

// Show last login date formatted as: Aug 27th, 2024
$timestamp = strtotime($user['lastLogin']);
$formattedDate = date('M jS, Y', $timestamp);
echo 'Last login: ' . htmlencode($formattedDate) . "\n";

You can write code like this: 

echo "Last login: {$user->lastLogin->dateFormat('M jS, Y')}}\n"; // Aug 27th, 2024

Here are the key Features:

  • Automatic XSS protection with HTML encoding
  • Fluent, chainable interface for expressive code
  • Flexible encoding options (HTML, URL, JSON)
  • Support for non-string data types and type conversion
  • Built-in formatting for dates and numbers
  • Conditional operations for cleaner code

You can read more and find the code on GitHub: https://github.com/interactivetools-com/SmartString/

This is a library we've been using internally and it's been greatly speeding up our development so we're excited to add it to CMSB over time and continue adding features.

If you can think of any common tasks in your frontend viewer code that previously required custom PHP code, let us know, and we may be able to automate them.  One feature on our roadmap is a maxWords() function for generating text snippets or article summaries.

Please feel free to post suggestions, bugs, or feature requests below.  Thank you for your continued support and happy coding!

Dave Edis - Senior Developer
interactivetools.com

I have three examples of some we use on all our websites. All three of these are essential for properly displaying the text, phone numbers, schema markup, and more just the way we like it and for a better user experience. I'd love to see what else can be done here to improve / make more efficient our development process.

Also, if anyone see anything I should be doing different with or improving these functions, please let me know! I haven't had any issues with them, but there may be something I am missing.
 

Format Phone
This function, format_phone(), is designed to format phone numbers into a more readable format. In this case, we like our numbers to appear as (888) 999-0000 for the most part. In the CMS we want them to be 888-999-0000 for easier schema markup, but in plain text, we want the parentheses.

	function format_phone( $phone, $default_country_code = '1' ) {
		// Remove all non-digit characters
		$phone = preg_replace( "/[^0-9]/", "", $phone );

		// Check if it's an international number (more than 10 digits)
		if ( strlen( $phone ) > 10 ) {
			// Format as international number
			return '+' . substr( $phone, 0, strlen( $phone ) - 10 ) . ' ' .
			'(' . substr( $phone, -10, 3 ) . ') ' .
			substr( $phone, -7, 3 ) . '-' .
			substr( $phone, -4 );
		}

		// If it's less than 10 digits, assume it's a local number and add default country code
		if ( strlen( $phone ) < 10 ) {
			$phone = $default_country_code . $phone;
		}

		// Format as (XXX) XXX-XXXX
		return '(' . substr( $phone, -10, 3 ) . ') ' .
		substr( $phone, -7, 3 ) . '-' .
		substr( $phone, -4 );
	}

maxWords and textLimit

Then we have maxWords which truncates the number of words in your content where you need it with <?= maxWords($record['content'], 100) ?>... Read More

	function maxWords($textOrHtml, $maxWords, $appendEllipsis = true) {
		// Remove HTML tags and decode HTML entities
		$text = html_entity_decode(strip_tags($textOrHtml), ENT_QUOTES, 'UTF-8');

		// Split into words, respecting Unicode
		$words = preg_split('/\s+/u', $text, -1, PREG_SPLIT_NO_EMPTY);

		// Slice to max words
		$limitedWords = array_slice($words, 0, $maxWords);

		// Join words and optionally append ellipsis
		$result = implode(' ', $limitedWords);
		if ($appendEllipsis && count($words) > $maxWords) {
			$result .= '...';
		}

		return $result;
	}

and textLimit which limits the number of characters (instead of words). We use this for limiting meta descriptions and other areas where the character limit needs to be set instead of number of words. Example: $meta_description = htmlencode(textLimit($description, 160)); limits the meta description to 160 characters, even though there is more meta content in the CMS.

	function textLimit($string, $length, $replacer = '...', $breakWords = false) {
		// Convert to UTF-8 if not already
		$encoding = mb_detect_encoding($string, 'UTF-8, ISO-8859-1', true);
		$string = mb_convert_encoding($string, 'UTF-8', $encoding);

		if (mb_strlen($string, 'UTF-8') <= $length) {
			return $string;
		}

		if ($breakWords) {
			$limited = mb_substr($string, 0, $length, 'UTF-8');
		} else {
			$limited = mb_substr($string, 0, $length, 'UTF-8');
			$lastSpace = mb_strrpos($limited, ' ', 0, 'UTF-8');
			if ($lastSpace !== false) {
				$limited = mb_substr($limited, 0, $lastSpace, 'UTF-8');
			}
		}

		return $limited . $replacer;
	}

The differences between the two are:

  1. Unit of limitation:
    • maxWords() limits by number of words
    • textLimit() limits by number of characters
  2. HTML handling:
    • maxWords() strips HTML tags and decodes HTML entities
    • textLimit() doesn't handle HTML specifically; it treats HTML as part of the text
  3. Unicode support:
    • maxWords() uses preg_split() with Unicode support for word splitting
    • textLimit() uses multibyte string functions for proper Unicode character handling
  4. Truncation behavior:
    • maxWords() always cuts off at word boundaries
    • textLimit() has an option ($breakWords) to cut off mid-word or at word boundaries
  5. Ellipsis handling:
    • maxWords() has an option to append an ellipsis only if words were truncated
    • textLimit() always appends the replacer (default '...') if truncation occurs
  6. Input encoding:
    • maxWords() assumes UTF-8 input
    • textLimit() detects input encoding and converts to UTF-8 if necessary

Hi KennyH,

From what I understand, you can use the apply() method to use your functions:

// Example of usage
$text = "<p>This is a sample text with more than ten words to demonstrate the maxWords function.</p>";
$smartString = SmartString::new($text);

// Using apply() with the maxWords function
$result = $smartString->apply('maxWords', 5, true);

// Display the result
echo $result->value();  // Output: This is a sample text with more than ten words...
// Example of usage
$phoneNumber = "123-456-7890"; // Phone number to format
$smartString = SmartString::new($phoneNumber);

// Using apply() with the format_phone function
$result = $smartString->apply('format_phone', '1'); // Default country code is '1'

// Display the result
echo $result->value(); // Output: (123) 456-7890

But, Dave should be able to confirm.

Thanks,
Djulia

By Dave - August 30 - edited: August 30

Hi All, 

Djulia: Yes, you'll be able to use the functions as is, that's absolutely right and great examples, thanks!  We also want to add common ones to the library for people who don't already have them.  If you have any helper function you often use feel free to share.

Kenny: Why doesn't textLimit() strip out HTML?  Is the assumption that maxWords will be taking HTML content and textLimit will be taking text only content? 

Some possible methods I'm thinking about:

->toText()       // strip tags and convert HTML entities to chars
->maxWords()     // limit max words
->maxChars()     // limit max chars

// Then we could do: 
$product->description->toText()->maxWords(15, "...")

Let me know any thoughts or feedback.

Dave Edis - Senior Developer
interactivetools.com

Why doesn't textLimit() strip out HTML?  Is the assumption that maxWords will be taking HTML content and textLimit will be taking text only content? 

Good question. I might have gotten it backwards, beacause I do need the HTML stripped out of textLimit, since I only use it for meta descriptions. I probably should use a custom function. I think this would work (I'll do some more testing):

function metaDescriptionLimit($string, $length = 160, $replacer = '...', $breakWords = false) {
    // Strip HTML tags
    $string = strip_tags($string);
    
    // Convert to UTF-8 if not already
    $encoding = mb_detect_encoding($string, 'UTF-8, ISO-8859-1', true);
    $string = mb_convert_encoding($string, 'UTF-8', $encoding);

    // Trim whitespace
    $string = trim($string);

    if (mb_strlen($string, 'UTF-8') <= $length) {
        return $string;
    }

    if ($breakWords) {
        $limited = mb_substr($string, 0, $length, 'UTF-8');
    } else {
        $limited = mb_substr($string, 0, $length, 'UTF-8');
        $lastSpace = mb_strrpos($limited, ' ', 0, 'UTF-8');
        if ($lastSpace !== false) {
            $limited = mb_substr($limited, 0, $lastSpace, 'UTF-8');
        }
    }

    // Remove any partial words or punctuation at the end
    $limited = rtrim($limited, "!,.-");
    $limited = preg_replace('/\s+?(\S+)?$/', '', $limited);

    return $limited . $replacer;
}

My apologies as I have not heard of the smartstring library until now, and I'm not the most proficient coder.  Is this a library that has to be called/initialized/loaded somewhere before using such functions...or is it ready for usage on the .php pages?  I have a need to limit a field (in a cmsb section) called last_name in which the client only wants the first letter of the last_name input to display on the public php page (testimonialDetails.php). So my code for display is currently (with no limit on the the number of characters):

<?php if ($testimonialsRecord['last_name']): ?>
  <br><br>
  <span class="pointtexti">
   <?php echo htmlencode($testimonialsRecord['last_name']) ?>
  </span>
<?php endif ?>

How would I specifically recode the above to limit the display of last_name to just the first letter of the person's last name?

By Djulia - September 4 - edited: September 4

Hi Codee,

I don't think SmartString can help you right now but you can use PHP's substr function to extract the first character of the last_name. Here's how you can update your code:

<?php if (!empty($testimonialsRecord['last_name'])): ?>
  <br><br>
  <span class="pointtexti">
    <?php 
      // Extract the first letter of the last name and encode it
      echo htmlencode(substr($testimonialsRecord['last_name'], 0, 1)); 
    ?>
  </span>
<?php endif ?>

https://www.php.net/manual/fr/function.substr.php

--
In the latest Beta version, the library SmartString is available without needing to install.

You can simply copy and paste a code snippet like this to test it, for example:

$dateString = SmartString::new("2024-09-01");
echo $dateString->dateFormat('d/m/Y'); // 01/09/2024

$number = SmartString::new(1234567.89);
echo $number->numberFormat(2, ',', ' '); // 1 234 567,89

Thanks,
Djulia

Hi Djulia,

Thank you!  That looks a bit cleaner than the method I was already using, which is to place the function before the last_name field line as:

<?PHP function textLimit($string, $length, $replacer = '...')
{
if(strlen($string) > $length)
return (preg_match('/^(.*)\W.*$/', substr($string, 0, $length+1), $matches) ? $matches[1] : substr($string, 0, $length)) . $replacer;

return $string;
}

?>

and then the php field line:

<?PHP echo textLimit($record['last_name'], 1) ?>

which works, but I'm going to try yours out. Thank you kindly!

Hi Codee,

Here’s a function I use across several projects.

/**
 * Shortens a full name by keeping only the initial of the last name.
 *
 * This function takes a full name (consisting of a first name and last name, 
 * possibly with middle names) and returns the name with the initial of the last name 
 * followed by a period.
 *
 * @param string $name The full name to be shortened.
 * @return string The shortened name with the last name's initial.
 *
 * @example
 * echo nameWithLastInitial("John Doe"); // Outputs "John D."
 * echo nameWithLastInitial("Jane Mary Smith"); // Outputs "Jane Mary S."
 */
function nameWithLastInitial($name) {
    // Split the full name into an array of words using space as a delimiter.
    $names = explode(' ', $name);
    
    // Retrieve the last element from the array (the last name) and extract the first letter.
    $last_initial = substr(array_pop($names), 0, 1);
    
    // Reconstruct the name with the last name's initial followed by a period.
    return implode(' ', $names) . ' ' . $last_initial . '.';
}

Thanks,
Djulia