Python 內(nèi)幕揭秘：深度刨析 Windows 系統(tǒng)下的 os.path.join()

這篇具有很好參考價值的文章主要介紹了Python 內(nèi)幕揭秘：深度刨析 Windows 系統(tǒng)下的 os.path.join()。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

參考

項目	描述
Python 官方文檔	https://docs.python.org/zh-cn/3/
搜索引擎	Google 、Bing
CPython 3.6 解釋器源碼	官方下載頁面

描述

項目	描述
Windows 操作系統(tǒng)	Windows 10 專業(yè)版
類 Unix 操作系統(tǒng)	Kali Linux 2023-04-18
PyCharm	2023.1 (Professional Edition)
Python	3.10.6

os.path

os.path 模塊是 Python 標(biāo)準(zhǔn)庫中的一個模塊，用于處理與 文件路徑相關(guān)的操作，如文件路徑字符串的拼接、分解、規(guī)范化。

路徑分隔符

路徑分隔符是用于在文件路徑中 分隔不同目錄層級 的特殊字符。路徑分隔符是根據(jù)操作系統(tǒng)的約定來確定的，不同的操作系統(tǒng)使用不同的路徑分隔符。

常見的路徑分隔符有兩種，正斜杠與反斜杠。

正斜杠 /
正斜杠是在 類 Unix 操作系統(tǒng)中使用的路徑分隔符。
反斜杠 \
反斜杠是在 Windows 操作系統(tǒng)上使用的 主要 路徑分隔符，在 Windows 操作系統(tǒng)中，你還可以使用正斜杠 / 作為路徑分隔符。

os.path.join()

os.path.join() 函數(shù)是 os.path 模塊中的一個常用函數(shù)，用于將多個路徑片段連接起來形成一個完整的路徑。os.path.join 函數(shù)會根據(jù) 當(dāng)前操作系統(tǒng)的類型 自動選擇 合適的路徑分隔符 來對路徑進(jìn)行拼接。

舉個栗子

import os


result = os.path.join('path', 'to', 'file')
print(result)

Windows 操作系統(tǒng)中的執(zhí)行效果

path\to\file

Linux 操作系統(tǒng)中的執(zhí)行效果

path/to/file

不同實現(xiàn)

os.path.join() 函數(shù)是 Python 標(biāo)準(zhǔn)庫中的一個函數(shù)，用于將多個路徑組合成一個單一的路徑。它可以根據(jù)操作系統(tǒng)的不同自動選擇適當(dāng)?shù)穆窂椒指舴ㄐ备?/ 或反斜杠 \）。

os.path.join()函數(shù)的實現(xiàn)依賴于不同的操作系統(tǒng)和底層文件系統(tǒng)。在Windows 操作系統(tǒng)中，os.path.join() 使用 ntpath.py 內(nèi)置模塊來處理路徑；而在 POSIX 系統(tǒng)（類 Unix 系統(tǒng)）中，則使用 posixpath.py 內(nèi)置模塊來處理路徑。

Windows 下的 os.path.join()

os.path.join() 與 ntpath.join()

在 Windows 操作系統(tǒng)中，os.path.join() 使用 ntpath.py 內(nèi)置模塊來處理路徑。這意味著，我們除了通過導(dǎo)入 os 模塊來使用 os.path.join() 函數(shù)外，還可以通過導(dǎo)入 ntpath 直接使用 join() 函數(shù)來完成路徑拼接的操作。對此，請參考如下示例：

通過 os.path.join() 實現(xiàn)路徑拼接操作

from os.path import join


result = join('path', 'to', 'file')
print(result)

通過 ntpath.join() 實現(xiàn)路徑拼接操作

from ntpath import join


result = join('path', 'to', 'file')
print(result)

執(zhí)行效果

在 Windows 操作系統(tǒng)中，上述代碼的執(zhí)行效果一致，均為：

path\to\file

注：

并不推薦通過 from ... import join （其中，... 代表 os.path 或 ntpath 模塊）語句直接導(dǎo)入 join() 函數(shù)。Python 提供了字符串對象的 join() 方法，用于將可迭代對象中的元素（可迭代對象中的元素需要為字符串）通過指定的字符串對象進(jìn)行連接，如果通過 from ... import join 導(dǎo)入 join() 函數(shù)則容易使人將兩者混淆。
良好實踐應(yīng)是先將 os 或 ntpath 模塊進(jìn)行導(dǎo)入后，再通過 os.path.join() 或 ntpath.join() 的方式使用 join() 函數(shù)。
通過 from ... import join （其中，... 代表 os.path 或 ntpath 模塊）語句直接導(dǎo)入 join() 函數(shù)并不會導(dǎo)致字符串對象的 join() 方法被覆蓋。這是由于起路徑拼接作用的 join() 是函數(shù)，而通過指定字符串對象將可迭代對象進(jìn)行拼接的 join() 是方法（定義在類中的函數(shù)），Python 能夠?qū)@兩者有一個很好的區(qū)分。對此，請參考如下示例：
```
from os.path import join


# 使用字符串對象的 join() 方法
# 將可迭代對象中的元素通過指定的字符串對象進(jìn)行拼接。
arr = ['Hello', 'World']

result = ' '.join(arr)
print(result)

# 通過使用 os.path 模塊提供的 join() 函數(shù)將
# 指定的多段路徑進(jìn)行正確的拼接。
result = join('path', 'ro', 'file')
print(result)
```
執(zhí)行效果
```
Hello World
path\ro\file
```

ntpath.join()

在 Windows 系統(tǒng)中，os.path.join() 的本質(zhì)是 ntpath.join()，因此，如果需要深入研究 os.path.join() 函數(shù)的行為，你需要對 ntpath.join() 函數(shù)的源碼進(jìn)行探索。

ntpath.join() 的源碼如下

# Join two (or more) paths.
def join(path, *paths):
    path = os.fspath(path)
    if isinstance(path, bytes):
        sep = b'\\'
        seps = b'\\/'
        colon = b':'
    else:
        sep = '\\'
        seps = '\\/'
        colon = ':'
    try:
        if not paths:
            path[:0] + sep  #23780: Ensure compatible data type even if p is null.
        result_drive, result_path = splitdrive(path)
        for p in map(os.fspath, paths):
            p_drive, p_path = splitdrive(p)
            if p_path and p_path[0] in seps:
                # Second path is absolute
                if p_drive or not result_drive:
                    result_drive = p_drive
                result_path = p_path
                continue
            elif p_drive and p_drive != result_drive:
                if p_drive.lower() != result_drive.lower():
                    # Different drives => ignore the first path entirely
                    result_drive = p_drive
                    result_path = p_path
                    continue
                # Same drive in different case
                result_drive = p_drive
            # Second path is relative to the first
            if result_path and result_path[-1] not in seps:
                result_path = result_path + sep
            result_path = result_path + p_path
        ## add separator between UNC and non-absolute path
        if (result_path and result_path[0] not in seps and
            result_drive and result_drive[-1:] != colon):
            return result_drive + sep + result_path
        return result_drive + result_path
    except (TypeError, AttributeError, BytesWarning):
        genericpath._check_arg_types('join', path, *paths)
        raise

準(zhǔn)備工作

os.fspath()

os.fspath() 接受一個對象作為實參，并嘗試返回表示 文件系統(tǒng)路徑 的 字符串 或 字節(jié)串 對象。如果傳遞給 os.fspath() 函數(shù)的是 str 或 bytes 類型的對象，則該對象將被原樣返回。否則實參對象的 __fspath__() 方法將被調(diào)用，如果 __fspath__() 方法返回的不是一個 str 或 bytes 類型的對象，則該方法將拋出 TypeError 異常。

舉個栗子

from os import fspath


class MyPath:
    def __fspath__(self):
        return '/path/to/file'


result = fspath(MyPath())
print(result)

print(fspath('Hello World'))
print(fspath(b'Hello World'))

執(zhí)行效果

/path/to/file
Hello World
b'Hello World'

注：

該函數(shù)在 Python 3.6 及以上版本可用，在使用該函數(shù)前，請檢查你所使用的 Python 版本。

isinstance()

isinstance() 函數(shù)是 Python 中的 內(nèi)置函數(shù)，該函數(shù)用于檢查一個對象是否是 指定類 或 其子類 的 實例。如果對象是給定類型的實例，則返回 True；如果不是，則始終返回 False。

isinstance(object, classinfo)

其中：

object
需要進(jìn)行類型檢查的對象，isinstance() 函數(shù)將判斷 object 是否是指定類型或指定類型的子類的實例。
classinfo
classinfo 的值允許為一個類型對象、多個類型對象組成的 元組 或 Union 類型。

# 判斷數(shù)值 1 是否是 int 類型或該類型的子類類型的實例
result = isinstance(1, int)
print(result)

# 判斷數(shù)值 1 是否是 str 類型或該類型的子類類型的實例
result = isinstance(1, str)
print(result)

# 判斷數(shù)值 1 是否是 str 或 int 類型或其子類類型的實例
result = isinstance(1, (str, int))
print(result)

# 判斷數(shù)值 1 是否是 str、int、bool 類型或其子類類型的實例
result = isinstance(1, str | int | bool)
print(result)

# 判斷數(shù)值 1 是否是 str、int、bool、list、tuple
# 類型或其子類型的實例
result = isinstance(1, (str | int, bool | list, tuple | tuple, tuple))
print(result)

執(zhí)行效果

True
False
True
True
True

可迭代對象僅能為元組

isinstance() 函數(shù)的參數(shù) classinfo 的值可以為包含一個或多個類型對象的元組，但這不意味著可以使用與元組同為 可迭代對象 的 列表 等數(shù)據(jù)類型。否則，Python 將拋出 TypeError 異常錯誤。

result = isinstance(1, [int, str])
print(result)

可能產(chǎn)生的 TypeError

在 isinstance 函數(shù)的 classinfo 參數(shù)不符合預(yù)期時，isinstance() 函數(shù)將拋出 TypeError 異常，但也存在例外。對此，請參考如下示例：

result = isinstance(1, (int, 1))
print(result)

執(zhí)行效果

True

倘若將 isinstance() 函數(shù)的第二個參數(shù) (int, 1) 中的內(nèi)容的順序修改為 (1, int)，則 Python 將為此拋出 TypeError 異常錯誤。
這是因為在通過 isinstance() 函數(shù)在進(jìn)行類型檢查時，isinstance() 函數(shù)會按照元組中的順序逐個檢查類型，一旦找到與 object 相匹配的類型對象，就返回 True。而如果在檢查過程中遇到無效的類型，則將引發(fā) TypeError 異常。

嵌套的元組

參數(shù) classinfo 的值允許為多個類型對象組成的 元組，并且該元組中還能夠嵌套元組。對此，請參考如下示例：

result = isinstance(1, (list, (str, (bool, (tuple | int)))))
print(result)

result = isinstance(1, (list, (str, (bool, (tuple | set)))))
print(result)

執(zhí)行效果

True
False

os.path.splitdrive()

UNC 路徑

UNC (Universal Naming Convention) 路徑是一種在 Windows 操作系統(tǒng)中用于訪問 網(wǎng)絡(luò)共享資源 的 命名約定，主要用于在本地計算機(jī)或網(wǎng)絡(luò)上引用文件、文件夾或打印機(jī)等資源。

UNC 路徑的組成

UNC 路徑 由以 四 部分組成：

反斜杠（\）
UNC 路徑以兩個反斜杠 \\ 開頭，用于表示該路徑是一個 UNC 路徑。
服務(wù)器標(biāo)識
緊跟在兩個反斜杠后面的部分是 服務(wù)器的名稱 或 IP 地址，標(biāo)識了共享資源所在的計算機(jī)。
共享資源名
服務(wù)器標(biāo)識及 單個 反斜杠之后，是 共享資源 的名稱，用于標(biāo)識共享文件夾或共享打印機(jī)。
資源路徑
位于共享資源名及 單個 反斜杠之后，是目標(biāo)資源 相對 于共享文件夾的路徑。

舉個栗子

\\ServerName\ShareFolder\ResourcePath

其中：

ServerName 為 共享資源 所在的 計算機(jī)的名稱 或 IP 地址；ShareFolder 是 共享的件夾的名稱；path 是目標(biāo)資源相對共享文件夾的路徑。

os.path.splitdrive()

在 Python 中，os.path.splitdrive() 函數(shù)用于分離 Windows 文件系統(tǒng)路徑 中的驅(qū)動器名稱和路徑。驅(qū)動器名稱 通常是指 Windows 系統(tǒng)中的盤符，而在 其他操作系統(tǒng) 中，驅(qū)動器名稱通常為 空字符串。
在 Windows 操作系統(tǒng) 中，os.path.splitdrive() 還可用于將 UNC 路徑分為 資源路徑 與 UNC 路徑中的其余部分共兩部分內(nèi)容。

os.path.splitdrive(path)

os.path.splitdrive() 函數(shù)的返回值是一個形如 (drive, path) 的元組。

其中：

drive 為 Windows 文件系統(tǒng)路徑中的盤符或 UNC 路徑中的資源路徑。
path 為 Windows 文件系統(tǒng)路徑中的 除盤符外 的 剩余 內(nèi)容或 UNC 路徑中 除資源路徑外 后的 剩余 內(nèi)容。

舉個栗子

from os.path import splitdrive


# 嘗試使用 splitdrive 分離類 Unix 系統(tǒng)文件路徑
drive, path = splitdrive('/path/to/file')
print(f'【Drive】 {drive}')
print(f'【Path】 {path}')

# 嘗試使用 splitdrive 分離 Windows 系統(tǒng)文件路徑
drive, path = splitdrive(r'C:\path\to\file')
print(f'【Drive】 {drive}')
print(f'【Path】 {path}')

# 嘗試使用 splitdrive 分離 UNC 路徑
drive, path = splitdrive(r'\\ServerName\ShareFolder\Path\To\File')
print(f'【Drive】 {drive}')
print(f'【Path】 {path}')

執(zhí)行效果

Windows 下的執(zhí)行效果

【Drive】 
【Path】 /path/to/file
【Drive】 C:
【Path】 \path\to\file
【Drive】 \\ServerName\ShareFolder
【Path】 \Path\To\File

類 Unix 系統(tǒng)下的執(zhí)行效果

【Drive】
【Path】 /path/to/file
【Drive】
【Path】 C:\path\to\file
【Drive】
【Path】 \\ServerName\ShareFolder\Path\To\File

genericpath._check_arg_types()

genericpath 模塊

genericpath 模塊是 Python 中的一個內(nèi)置模塊，該模塊提供了一些 用于處理路徑的通用函數(shù)和工具。

genericpath 模塊中定義的函數(shù)主要用于路徑處理的 通用 操作，不涉及特定的操作系統(tǒng)。這些函數(shù)可以在不同的操作系統(tǒng)上使用，因為它們不依賴于特定的路徑分隔符或操作系統(tǒng)特定的文件系統(tǒng)規(guī)則。

genericpath._check_arg_types()

genericpath._check_arg_types() 函數(shù)的源碼如下：

def _check_arg_types(funcname, *args):
    hasstr = hasbytes = False
    for s in args:
        if isinstance(s, str):
            hasstr = True
        elif isinstance(s, bytes):
            hasbytes = True
        else:
            raise TypeError(f'{funcname}() argument must be str, bytes, or '
                            f'os.PathLike object, not {s.__class__.__name__!r}') from None
    if hasstr and hasbytes:
        raise TypeError("Can't mix strings and bytes in path components") from None

在 os.path 內(nèi)部，該函數(shù)常用于檢查一個函數(shù)的一個或多個參數(shù)是否是以 bytes 或 str 類型表示的文件系統(tǒng)路徑。若 genericpath._check_arg_types() 函數(shù)中的可迭代對象 args 中存在除 bytes 或 str 類型的元素或是同時存在 bytes 或 str 類型的元素，該函數(shù)將拋出 TypeError 異常。

注：

在 Python 中，以 單個下劃線開頭 的函數(shù)或方法通常被視為 內(nèi)部實現(xiàn)細(xì)節(jié)，不是 公共 API 的一部分。這意味著它們不受官方支持，不建議直接使用，并且在未來的 Python 版本中可能發(fā)生更改。

ntpath.join() 函數(shù)的源碼刨析

ntpath.join() 函數(shù)的具體實現(xiàn)（附注釋）

def join(path, *paths):
    # 通過 fspath 將 path 轉(zhuǎn)換為 str 或 bytes
    # 類型表示的文件系統(tǒng)路徑。
    path = os.fspath(path)

    # 若 path 是 bytes 類或其子類的實例對象，
    # 則將 sep、seps 等變量設(shè)置為 bytes 類型的值。
    if isinstance(path, bytes):
        sep = b'\\'
        seps = b'\\/'
        colon = b':'
    else:
        # 若 path 不是 bytes 類或其子類的實例對象，
        # 則將 sep、seps 等變量設(shè)置為 str 類型的值。
        sep = '\\'
        seps = '\\/'
        colon = ':'
    try:

        # 這個判斷語句恕在下不能理解，(╯°□°）╯︵ ┻━┻
        if not paths:
            path[:0] + sep  #23780: Ensure compatible data type even if p is null.

        # 將路徑中的盤符與其余部分進(jìn)行分隔。
        # result_drive 表示的是 join() 函數(shù)拼接結(jié)果中的盤符（驅(qū)動器名稱）標(biāo)志。
        # result_path 表示的是 join() 函數(shù)拼接結(jié)果中除盤符外的其他內(nèi)容。
        result_drive, result_path = splitdrive(path)

        # 對可變參數(shù) paths 中的每一個元素應(yīng)用 os.fspath 函數(shù)
        for p in map(os.fspath, paths):

            # 將路徑中的盤符與其余部分進(jìn)行分隔。
            p_drive, p_path = splitdrive(p)

            r"""
            如果 p_path 以 \ 或 / 開頭，則 result_path 將被覆蓋為 p_path,
            這意味著：
            print(os.path.join('C:\\', r'\path\to\file')) -> C:\path\to\file
            print(os.path.join('C:\\', r'\path\to\file', r'\path\to\file')) -> C:\path\to\file
            """
            # 如果 p_path 是以 \ 或 / 開頭的路徑
            if p_path and p_path[0] in seps:
                # 如果 p 中不包含盤符則使用已存儲的盤符，
                # 否則，則使用 p 中的盤符替換 result_drive
                if p_drive or not result_drive:
                    result_drive = p_drive
                result_path = p_path
                # 終止當(dāng)前循環(huán)，進(jìn)入下一輪循環(huán)
                continue

            r"""
            p_drive 與 不為空字符串或空字節(jié)串的 result_drive 不同時，
            p_path 將覆蓋 result_path，p_drive 將覆蓋 result_path。
            這意味著：
            print(os.path.join(r'C:\path\to\file', r'D:\new\path')) -> D:\new\path
            """
            # 如果 p_path 不是以 \ 或 / 開頭的路徑。
            # 如果 p_drive 不為空字符串或空字節(jié)串并且 p_drive 與
            # result_drive 不同。
            elif p_drive and p_drive != result_drive:
                # 如果 p_drive、result_drive 兩者的小寫形式均不相同，則
                # 路徑中的分離出的盤符與結(jié)果路徑 result_drive 中已經(jīng)存儲的盤符不同。
                if p_drive.lower() != result_drive.lower():
                    # 使用新的路徑及盤符覆蓋 result_path 及 result_drive
                    result_drive = p_drive
                    result_path = p_path
                    # 終止當(dāng)前循環(huán)，進(jìn)入下一輪循環(huán)
                    continue
                r"""
                如果 p_drive 與 result_drive 僅存在大小寫的不同，
                則僅更新 result_drive。
                這意味著：
                print(os.path.join(r'd:\path\to\file', r'D:new\path')) -> D:\path\to\file\new\path
                """
                result_drive = p_drive

            # 如果 result_path 不為空字符串或空字節(jié)串并且 result_path
            # 的尾部字符不存在于 seps，中，則將通過 \ 將 result_path 與 p_path
            # 進(jìn)行連接。
            if result_path and result_path[-1] not in seps:
                result_path = result_path + sep
            result_path = result_path + p_path

        """結(jié)果路徑為 UNC 路徑"""
        # 判斷 result_path 是否為不為一個空字符串或空字節(jié)串，若不是，那么
        # result_path 的首個字符是否存在于 seps 中。若存在，則繼續(xù)判斷
        # 結(jié)果路徑是否將為一個 UNC 路徑。
        if (result_path and result_path[0] not in seps and
            result_drive and result_drive[-1:] != colon):

            # 若結(jié)果路徑為一個 UNC 路徑，則在 result_path 前缺少 \
            # 或 b'\' 時，使用相應(yīng)的文件系統(tǒng)路徑分隔符對兩者進(jìn)行拼接。
            return result_drive + sep + result_path

        return result_drive + result_path
    except (TypeError, AttributeError, BytesWarning):
        # 嘗試使用 genericpath._check_arg_types() 函數(shù)
        # 判斷產(chǎn)生異常錯誤的原因，以輸出適當(dāng)?shù)腻e誤信息幫助用戶排錯。
        genericpath._check_arg_types('join', path, *paths)

        # 若 genericpath._check_arg_types() 函數(shù)
        # 未檢測到錯誤產(chǎn)生的原因并將其拋出，則拋出截獲到的異常錯誤
        raise

奇怪的判斷語句

在 ntpath.join() 函數(shù)的源代碼中，下面的這個判斷語句顯得有些多余。

if not paths:
    path[:0] + sep  #23780: Ensure compatible data type even if p is null.

path[:0] + sep

if 中的 path[:0] + sep 語句并未將拼接的結(jié)果進(jìn)行保存，這是因為列表對象的 切片操作 返回的是一個新的列表對象，它是原始列表的一個子集。修改這個切片實際上是在修改新創(chuàng)建的列表對象，而不是原始列表。那么，path[:0] + sep 的作用是什么？

觀察 path[:0] + sep 語句旁邊的注釋 #23780: Ensure compatible data type even if p is null.，翻譯翻譯得到：#23780: 即使 p 為空，也要確保數(shù)據(jù)類型兼容。。也就是說， path[:0] + sep 能夠保證 path[:0] 的數(shù)據(jù)類型為 str 或 bytes 中的其中一種。讓我們對此驗證一番：

string = 'Hello World'
bytes_string = b'Hello World'
arr = [1, 2, 3]

# 即使 [:0] 無法從序列中獲取到任何元素
# 但 [:0] 仍將返回一個空字符串、空字節(jié)串或空列表等。
print(string[:0])
print(bytes_string[:0])
print(arr[:0])
print(type(string[:0]))
print(type(bytes_string[:0]))
print(type(arr[:0]))

# arr[:0] + '/' 的結(jié)果并不會保存在 arr 中
# 但，當(dāng)兩著進(jìn)行加法操作時，若兩者的類型不支持進(jìn)行
# 加法操作，則 Python 將拋出 TypeError 異常錯誤。
try:
    arr[:0] + '/'
except TypeError:
    print('TypeError')

執(zhí)行效果


b''
[]
<class 'str'>
<class 'bytes'>
<class 'list'>
TypeError

結(jié)果表明 path[:0] + sep 將在兩者不支持作為加法操作符的操作數(shù)時產(chǎn)生 TypeError 異常，并且產(chǎn)生的異常錯誤將被 ntpath.join() 中的 except (TypeError, AttributeError, BytesWarning) 所捕獲。這對 path[:0] 的數(shù)據(jù)類型是 str 或 bytes 提供了保障。但令人匪夷所思的是，os.fspath(path) 就足以保證 path 的數(shù)據(jù)類型為 str 或 bytes 中的一種。文章來源地址http://www.zghlxwxcb.cn/news/detail-485223.html

def join(path, *paths):
    path = os.fspath(path)
    
    if isinstance(path, bytes):
        sep = b'\\'
        seps = b'\\/'
        colon = b':'
    else:
        sep = '\\'
        seps = '\\/'
        colon = ':'
        
    try:
        if not paths:
            path[:0] + sep  #23780: Ensure compatible data type even if p is null.

到了這里，關(guān)于Python 內(nèi)幕揭秘：深度刨析 Windows 系統(tǒng)下的 os.path.join()的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！