Solving Palindromes
Published on Apr 17, 2025
Solving Palindromes
Every single leetcode-esk / learn-programming online course seems to reference palindrome-solving, and at this point, im wondering…What is it Im missing? Clearly there’s some hidden meaning behind the logic used to satisfy this problem. There must be more than meets the eye here. Programming a solution that will determine if a sequence of characters is a palindrome must be something programmers have to tackle every single day, and they must laugh every time they see us noobs using string slicing to logically categorize a palindrome!
First of all — let’s talk about string slicing. Python’s word[::-1] syntax? It’s sleek, sure. But I’ll be honest: it feels like black magic.
I don’t want to memorize slicing syntax like some arcane spellbook.
And in the era of AI… surely there’s a better way. Why write explicit logic at all when we can train a machine to detect palindromes for us?
For one, I suck with string slicing. The syntax is just too much for me to memorize, and in this day and age of AI, that must be the perfect solution! Train a machine learning model to recognize if a word is a palindrome! That has to be the true, sought after solution to this problem. String slicing deals in absolutes, it compares indexes of a string. Why use such hardline logic when we can use statistical probability?
Picking an AI denomination
We need to train a model to recognize palindromes — words where the character sequence is the same forward and backward (like racecar, noon, tacocat). Patterns that make up a palindrome:
In order to accomplish this, Ive selected using a Recurrent Neural Network (RNN), more specifically, an LSTM RNN. LSTM stands for Long Short-Term Memory, and its typically used to understand sequential data. Keraas coincidentally provides a Bidirectional LSTM, which is used to identify and work with Symmetry. Additionally, Standard dense layers treat every input independently, but LSTMs carry “memory” across time steps. Which should help identify symetric structure.
So, in summary, we use an LSTM because:
- Palindromes are sequence-based
- we need a model that "reads" a word like a sentence.
- LSTMs can track and compare characters across the whole sequence.
- Bidirectional
- look both ways — which is literally what palindromes are
To start off, I gathered 2 files, palindromes.txt and not_palindromes.txt These are line-deliminated files with actual english-word examples of both valid palindromes, and invalid palindromes. I just picked them up using google. But thats not enough data to train on, really. So we’ll have to generate our own data, which represents both categories here.
This is done in the https://github.com/sockheadrps/PalindromeRNNClassifier/blob/main/data_gen.py file.
Just a function to help check validity of input data.
def is_palindrome(s):
generate_string(length, palindrome=False)
If length is odd, insert a random middle letter: rac + e + car = racecar
if the keyword is false, Randomly generate a full word of the given length If it accidentally turns out to be a palindrome (e.g. “noon”, “abba”) It loops until it gets a non-palindrome
make_near_palindrome(p)
make_near_palindrome(“noon”) → “noan” or “nook” (input word with 1 character flipped) This helps the model learn that not all similar-looking words are palindromes.
generate_dataset()
Load real palindromes from a file Filters out any words that are too short or too long Inject real palindromes Adds 25% real-world palindromes
Fill the remaining 75% with randomly generated strings Randomly decides if each one should be a palindrome (1) or not (0)
Add “hard negatives” Loads manually chosen non-palindromes that might look symmetrical but aren’t Injects x number of them and then Mixes the data so the model doesn’t see examples in a predictable order (like all palindromes first)
preprocess()
[(“racecar”, 1), (“apple”, 0), …] into
X = [[18, 1, 3, 5, 3, 1, 18, 0, 0, 0], […], …] # char sequences
y = [1, 0, …] # the corresponding boolean identifying if that item is in fact a palindrome, or not.
encode(word)
we then pad the sequence which:
“cat” → [3, 1, 20] padded → [3, 1, 20, 0, 0, 0, 0, 0, 0, 0] ← (if maxlen = 10)
The Model
https://github.com/sockheadrps/PalindromeRNNClassifier/blob/main/model.py
Embedding(input_dim=vocab_size, output_dim=16, mask_zero=True)
Turns each character (e.g., ‘a’, ‘b’, …) into a learnable 16-dimensional vector.
This layer gives the model a way to learn semantic meaning of characters in relation to each other. For example, it might learn that ‘r’ and ‘a’ often appear in mirrored positions in palindromes.
mask_zero=True: Tells downstream layers to ignore padding (0s), so shorter words don’t confuse the model with meaningless zeros.
Bidirectional(LSTM(128, return_sequences=True))
An LSTM that reads the input both forwards and backwards and outputs a sequence at every step.
Palindromes are all about symmetry. Bidirectional LSTMs are perfect for that because they: Look at characters from both directions, Catch mirrored patterns regardless of position.
Dropout(0.2)
Randomly “drops out” 20% of the layer’s neurons during training. This prevents overfitting — especially important when working with a relatively small vocabulary (only 26 letters). It helps the model generalize better.
Bidirectional(LSTM(128, return_sequences=True))
Why another?
LayerNormalization()
Another Dropout(0.2)
Same purpose: Keep regularizing the model.
LSTM(128)
Processes the output from the stacked bidirectional layers but only returns the final output (i.e., a single vector summarizing the sequence).
This is the compression step — it condenses the learned features from the character sequence into a summary representation that can be passed into Dense layers for classification.
Another Dropout(0.2)
Still regularizing. This time it’s before heading into fully connected layers.
Dense(32, activation='relu', kernel_initializer='he_uniform')
Dropout(0.2)
Final layer of regularization before the output.
Dense(1, activation='sigmoid')
Why it’s perfect: This is binary classification, so sigmoid is standard — it maps to probability, between 0 and 1.
ExponentialDecay + Adam
What it does:
This is helpful in deep models to stabilize late-stage training.
BinaryFocalCrossentropy(gamma=2.5)
What it does:
Summary
So, this model is:
- Rich enough to learn subtle symmetrical structure
- Deep enough to abstract features over time
- Bidirectional to account for mirrored sequences,
- Regularized to avoid overfitting
- Adaptively optimized with learning rate decay and focal loss
Then we train!
https://github.com/sockheadrps/PalindromeRNNClassifier/blob/main/run.py