Why pathlib.Path doesn't inherit from str in Python

Over on python-ideas a discussion has broken out about somehow trying to make p'/some/path/to/a/file return an instance of pathlib.Path. This led to a splinter discussion as to why pathlib.Path doesn't inherit from str? I figured instead of burying my response to this question in the thread I'd blog about it to try and explain one approach to API design.

I think the key question in all of this is whether paths are semantically equivalent to a sequence of characters? Obviously the answer is "no" since paths have structure, directly represent something like files and directories, etc. Now paths do have a serialized representation as strings -- at least most of the time, but I'm ignoring Linux and the crazy situation of binary paths -- which is why C APIs take in char * as the representation of paths (and because C tends to make one try to stick with the types that are part of the C standard). So not all strings represent paths, but all paths can be represented by strings. I think we can all agree with that.

OK, so if all paths can be represented as strings, why don't we just make pathlib.Path subclass str? Well, as I said earlier, not all strings are paths. You can't concatenate a string representing a path with some other random string and expect to get back a string that still represents a valid path (and I'm not talking about "valid" as in "the file doesn't exist", I'm talking about "syntactically not possible"). This is what PEP 428 is talking about when it says:

Not behaving like one of the basic builtin types [list str] also minimizes the potential for confusion if a path is combined by accident with genuine builtin types [like str].

Now at this point someone someone will start saying, "but Brett, 'practicality beats purity' and all of those pre-existing, old OS APIs want a string as an argument!" And you're right, in terms of practicality it would be easier for pathlib.Path to inherit from str ... for the short/medium term. One thing people are forgetting is that "explicit is better than implicit" as well and as with most design decisions there's not one clear answer. Implicitly treating a path as a string currently works, but that's just because we have inherited a suboptimal representation for paths from C. Now if you use an explicit representation of paths like pathlib.Path then you gain semantic separation and understanding of what you are working with. You also avoid any issues that come with implicit str compatibility as I pointed out earlier.

And before anyone says, "so what?", think about this: Python 2/3. I'm sure some will say I'm overblowing the comparison, but Python 3 came about because the implicit compatibility of binary and textual data in Python 2 caused major headaches to the point that we made a backwards-incompatible change that has caused widespread ramifications (but which we seem to be coming out on the other side of). By not inheriting from str, pathlib.Path has avoided a similar potential issue from the get-go. Or another way of looking at it is to ask why doesn't dict inherit from str so that it's easier to make it easier to stick in the body of an HTTP response so it can be implicitly treated as JSON? If you going, "eww, no because they are different types" then you at least understand the argument I'm trying to make (if you're going "I like that idea", then I think Perl might be more to your liking than Python and that's fine since the two languages just have different designs).

Now back to that "practicality beats purity" side of this. Obviously there are lots of APIs out there that take a string as an argument for a file path. And I understand people don't like using str(path) or the upcoming/new getattr(path, 'path', path) idiom to work around the limitations forced upon them by other APIs that only take a str because they occasionally forget it. For this point, I have two answers. One is still "explicit is better than implicit" and you should have tests to begin with to catch when you forget to convert your pathlib.Path objects to str as needed. I'm just one of those programmers who's willing to type a bit more for easy-to-read code that's less error-prone within reason (and I obviously view this as reasonable).

But more importantly, why don't you work with the code and/or projects that are forcing you to convert to str to start accepting pathlib objects as well for those same APIs? If projects start to work on making pathlib objects acceptable anywhere a path is accepted then once Python 3.4 is the oldest version of Python that is supported by the project then they can start considering dropping support for str as paths in new releases (obviously this also includes dropping support for Python 2 unless people use pathlib2). And I fully admit the stdlib is not exempt from a place that needs updating, so for importlib I have opened an issue to update it to show this isn't just talk on my end (unfortunately there's some unique bootstrapping problems to import where stuff like sys.path probably can't be updated since that has to exist before you can import pathlib and that sort of thing ripples out, but I'm hoping I can update at least some things and this is a very unique case of potentially needing to stick with str). Hopefully if people start asking for pathlib support from projects they will add it, eventually leading to an alleviation of the desire to have pathlib.Path inherit from str.