Inconsistent split Behavior in Python

Saturday, November 05, 2011.

Here’s a futile but cathartic bug report I filed against Python recently.

In Python, string.split and re.split both take an optional argument that limits the number of splits that are done. This is unlike Perl’s split builtin, which limits the number of pieces. But it makes sense I guess, and consistency between the two languages is not something I’d necessarily expect.

However, consistency within a language…a reasonable expectation, no?

The inconsistency lies in how the string.split and re.split handle the edge cases of “do an unlimited number of splits” and “don’t do any splits.” The two agree that “unlimited splits” is the default. They don’t agree on how to interpret the value of an explicit maxsplit parameter.

maxsplit=0 maxsplit=-1
string.split no splits unlimited splits
re.split unlimited splits no splits

I think string.split is doing the sensible thing here.

Of course, the “bug” has zero chance of being fixed at this point. I pretty much just filed it to create a search result for others similarly bitten, annoyed, or both.

Posted by Alan on Saturday, November 05, 2011. (Discuss)

blog comments powered by Disqus

"After a little while I became possessed with the keenest curiosity about the whirl itself. I positively felt a wish to explore its depths, even at the sacrifice I was going to make; and my principal grief was that I should never be able to tell my old companions on shore about the mysteries I should see."

Illustration for Edgar Allan Poe's story "Descent into the Maelstrom" by Harry Clarke, published in 1919.