Installation
Quick Take
Examples
Idea
This library takes a string and removes widow words, by replacing last space in the paragraph with non-breaking space.
- Not just adds but if want, removes widow word prevention measures
- Tackles both paragraphs and single lines
- Recognises existing measures and if found, skips operation
- Option to encode for HTML, CSS or JS strings or put a raw non-breaking space
- Does not mangle the line endings (Mac
LF
, Old styleCR
or Windows-styleCR LF
) - A customisable minimum amount of words per line/paragraph to trigger widow word removal
- Can be used in different stages of the workflow: before HTML/CSS/JS-encoding or after
- Optionally replaces spaces with non-breaking spaces in front of all kinds of dashes
- Optionally replaces spaces with non-breaking spaces within UK postcodes
- Optionally it can skip content between templating tags, for example, Nunjucks
{{
and}}
— presets are given for Jinja, Nunjucks, Liquid, Hexo and Hugo
API - removeWidows()
The main function removeWidows()
is imported like this:
It's a function which takes three input arguments:
Input argument | Key value's type | Obligatory? | Description |
---|---|---|---|
str | String | yes | String which we will process |
opts | Plain object | no | Put options here |
The Optional Options Object has the following shape:
Options Object's key | The type of its value | Default | Description |
---|---|---|---|
removeWidowPreventionMeasures | boolean | false | If it's true , it will replace all widow word nbsp locations, with a single space |
convertEntities | boolean | true | If it's false , raw non-breaking space is inserted. If true , encoded in particular language (default HTML) |
targetLanguage | string | html | Choose out of html , css or js — non-breaking spaces will be encoded in this language |
UKPostcodes | boolean | false | If enabled, every whitespace between two parts of UK postcodes will be replaced with non-breaking space |
hyphens | boolean | true | Whitespace in front of dashes (- ), n-dashes (– ) or m-dashes (— ) will be replaced with a non-breaking space |
minWordCount | natural number, 0 (disables feature), falsy thing (disables feature) | 4 | Minimum word count on a paragraph to trigger widow removal |
minCharCount | natural number, 0 (disables feature), falsy thing (disables feature) | 20 | Minimum non-whitespace character count on a paragraph to trigger widow removal |
ignore | array of zero or more strings OR string | [] | List templating languages whose heads/tails will be recognised and skipped |
reportProgressFunc | function or null | null | If function is given, it will be pinged a natural number, for each percentage-done (in its first input argument) |
reportProgressFuncFrom | natural number or 0 | 0 | Normally reportProgressFunc() reports percentages starting from zero, but you can set it to a custom value |
reportProgressFuncTo | natural number | 100 | Normally reportProgressFunc() reports percentages up to 100 , but you can set it to a custom value |
tagRanges | array of zero or more arrays | [] | If you know where the HTML tags are, provide string index ranges here |
Here are all defaults in one place for copying:
The function will return a plain object (Res
type above):
Key in a returned object | Key value's type | Description |
---|---|---|
res | String | Processed string |
ranges | Null or Array of one or more Ranges (arrays) | Same Ranges used to produce the res |
log | Plain object | See its format below |
whatWasDone | Plain object | Was it widow removal or just decoding performed ? |
for example, here's how the output could look like:
{
res: "Lorem ipsum dolor sit amet",
ranges: [
[21, 27, " "]
],
log: {
timeTakenInMilliseconds: 42
},
whatWasDone: {
removeWidows: true,
convertEntities: false
}
}
API - defaults
You can import defaults
:
It's a plain object:
The main function calculates the options to be used by merging the options you passed with these defaults.
API - version
You can import version
:
opts.targetLanguage
Not all text ends up in HTML. As you know, you can inject the content via CSS pseudo attributes and also text might be prepared to be pasted into JSON.
This program allows you to customise the target encoding for chosen language: html
, css
or js
.
Here's an HTML with HTML-encoded non-breaking space:
Some raw text in a very long line.
Here's CSS analogue:
span:before {
content: "Some raw text in a very long\00A0line.";
}
Here's JavaScript analogue:
alert("Some raw text in a very long\u00A0line.");
For example, a minimal application would look like this:
import { removeWidows } from "string-remove-widows";
// second input argument is a plain object, the Optional Options Object:
const result = removeWidows("Some raw text in a very long line.", {
targetLanguage: "css",
});
// now the widow words will be prevented considering that content will go to CSS content:
console.log(result);
// => "Some raw text in a very long\00A0line."
opts.ignore
Very often text already contains templating language literals.
For example, this Nunjucks snippet:
Hi{% if data.firstName %} data.firstName{% endif %}!
We intend to either say Hi John!
to customer John or just Hi!
if we don't know the customer's name.
But if we run widow words removal on this piece of text, we don't want
inserted into the middle of endif
:
Hi{% if data.firstName %} data.firstName{% endif %}!
^^^^^^
That's where opts.ignore
comes in. You can list heads/tails (chunks from which to start ignoring/where to stop) manually:
import { removeWidows } from "string-remove-widows";
const result = removeWidows("Here is a very long line of text", {
targetLanguage: "html",
ignore: [
{
heads: "{{",
tails: "}}",
},
{
heads: ["{% if", "{%- if"],
tails: ["{% endif", "{%- endif"],
},
],
});
or you can just pick a template:
all
jinja
nunjucks
liquid
hugo
hexo
for example:
import { removeWidows } from "string-remove-widows";
const result = removeWidows("Here is a very long line of text", {
targetLanguage: "html",
ignore: "jinja",
});
If you want widest support of literals, all languages at once, put "all".
opts.tagRanges
Sometimes input string can contain HTML tags. We didn't go that far as to code up full HTML tag recognition, more so that such thing would duplicate already existing libraries, namely, string-strip-html
.
opts.tagRanges
accepts known HTML tag ranges (or, in fact, any "black spots" to skip):
import { stripHtml } from "string-strip-html";
import { removeWidows } from "string-remove-widows";
const input = `something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>`;
// first, gung-ho approach - no tag locations provided:
const res1 = removeWidows(input).res;
console.log(res1);
// => something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>
// ^^^^^^
// notice how non-breaking space is wrongly put inside the tag
//
// but, if you provide the tag ranges, program works correctly:
const tagRanges = stripHtml(input, { returnRangesOnly: true });
console.log(JSON.stringify(knownHTMLTagRanges, null, 4));
// => [[24, 51], [52, 56], [57, 84], [85, 89]]
// now, plug the tag ranges into opts.tagRanges:
const res2 = removeWidows(input, { tagRanges }).res;
console.log(res2);
// => something in front here <a style="display: block;">x</a> <b style="display: block;">y</b>
Compared to competition
This program, string-remove-widows | widow-js | @simmo/widower | |
---|---|---|---|
Can both add and remove nbsp s | ✅ | ❌ | ❌ |
Option to choose between raw, HTML, CSS or JS-encoded nbsp s | ✅ | ❌ | ❌ |
Can replace spaces in front of hyphens, n- and m-dashes | ✅ | ❌ | ❌ |
Can prepare UK postcodes | ✅ | ❌ | ❌ |
Does not mangle different types of line endings (LF , CRLF , CR ) | ✅ | ✅ | ✅ |
Customisable minimal word count threshold | ✅ | ✅ | ❌ |
Customisable minimal character count threshold | ✅ | ❌ | ❌ |
Progress reporting function for web worker web apps | ✅ | ❌ | ❌ |
Reports string index ranges of what was done | ✅ | ❌ | ❌ |
Non-breaking space location's whitespace does not necessarily have to be a single space | ✅ | ❌ | ❌ |
Presets for Jinja, Nunjucks, Liquid, Hugo and Hexo templating languages | ✅ | ❌ | ❌ |
Decoupled API^ | ✅ | ❌ | ✅ |
CommonJS build | ✅ | ❌ | ✅ |
ES Modules build | ✅ | ❌ | ❌ |
UMD build for browser | ✅ | ✅ | ❌ |
Can process live DOM of a web page | ❌ | ✅ | ❌ |
Licence | MIT | ISC | MIT |
^ A decoupled API means that at its core, the program is a function "string-in, string-out" and is not coupled with DOM, file I/O, network or other unrelated operations. Such API makes it easier to test and create many different applications on top of a decoupled API.
For example, our competitor widow.js has two coupled parts: 1. API which does string-in, string-out, and 2. DOM processing functions. It could have been two npm libraries. In the end, people who don't need DOM operations can't use it.
One decoupled, "string-in, string-out" library like string-remove-widows
might power all these at once:
- Web page DOM-manipulation library
- a CLI application to process files or piped streams
- an Express REST endpoint on a server,
- a serverless lambda on AWS,
- an Electron desktop program