国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

【代碼解讀】RRNet: A Hybrid Detector for Object Detection in Drone-captured Images

2年前作者：Re-赟分類：Toy博客閱讀(42)違法舉報

這篇具有很好參考價值的文章主要介紹了【代碼解讀】RRNet: A Hybrid Detector for Object Detection in Drone-captured Images。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

1. train.py

首先我們將代碼從GitHub上下載下來：代碼地址

找到程序的主入口train.py這個類，可以看到這個類比較簡單，大部分是引用其他類。具體每一個類的定義可以從不同小節(jié)中查看

from configs.rrnet_config import Config
from operators.distributed_wrapper import DistributedWrapper
from operators.rrnet_operator import RRNetOperator


if __name__ == '__main__':
    dis_operator = DistributedWrapper(Config, RRNetOperator)  詳見 2 節(jié)
    dis_operator.train()
    print('Training is Done!')

2. DistributedWrapper類

2.1 init函數(shù)

首先來看這個類的初始化函數(shù)

def __init__(self, cfg, operator_class):
    """
    This is a wrapper class for distributed training.
    :param cfg: configuration.
    :param operator_class: We use this class to construct the operator for training and evaluating.
    """
    self.cfg = cfg
    self.operator_class = operator_class

這是一個用于分布式訓練的包裝器（Wrapper）類。它用于在分布式環(huán)境下進行訓練。

構造函數(shù)中的參數(shù)說明如下：
	cfg: 表示配置參數(shù)，用于設置訓練過程中的各種參數(shù)和超參數(shù)。
	operator_class: 這是一個類（Class），用于構造訓練和評估操作符（Operator）

2.2 train函數(shù)

def train(self):
    """
    Start multiprocessing training.
    """
    self.setup_distributed_params()
    mp.spawn(self.dist_training_process, nprocs=self.cfg.Distributed.ngpus_per_node,
             args=(self.cfg.Distributed.ngpus_per_node, self.cfg))

mp.spawn 函數(shù)用于啟動多個訓練進程，并在每個進程中調用 self.dist_training_process 方法
nprocs 參數(shù)表示啟動的進程數(shù)，即用于分布式訓練的GPU數(shù)量（或進程數(shù)量）
args 參數(shù)是傳遞給每個進程的參數(shù)，這里傳遞了 self.cfg.Distributed.ngpus_per_node 和 self.cfg。

2.3 dist_training_process函數(shù)

def dist_training_process(self, gpu, ngpus_per_node, cfg):
   operator = self.init_operator(gpu, ngpus_per_node, cfg)
   operator.training_process()

來看一下 init_operator 函數(shù)

def init_operator(self, gpu, ngpus_per_node, cfg):
    """
    Create distributed model operator.
    :param gpu: gpu id.
    :param ngpus_per_node: to calculate the real rank.
    :param cfg: configuration.
    :return: model operator.
    """
    cfg.Distributed.gpu_id = gpu
    print("=> Use GPU: {}".format(gpu))

    # I. Init distributed process group.
    cfg.Distributed.rank = cfg.Distributed.rank * ngpus_per_node + gpu
    dist.init_process_group(backend='nccl', init_method=cfg.Distributed.dist_url,
                            world_size=cfg.Distributed.world_size, rank=cfg.Distributed.rank)
    torch.cuda.set_device(gpu)
    # II. Init operator.
    return self.operator_class(cfg)

首先將當前進程的GPU編號 gpu 賦值給配置參數(shù) cfg.Distributed.gpu_id，用于指定當前進程使用的GPU
然后，根據(jù)當前進程的GPU編號和 ngpus_per_node 計算當前進程的真實排名（rank），賦值給配置參數(shù) cfg.Distributed.rank。排名是用于在分布式訓練中標識不同進程的標識符，每個進程都有唯一的排名。
接下來，通過調用 dist.init_process_group 方法初始化分布式進程組
隨后，通過 torch.cuda.set_device(gpu) 將當前進程的GPU設備設置為 gpu，以確保模型和數(shù)據(jù)存儲在正確的GPU上。
最后，通過調用 self.operator_class(cfg) 創(chuàng)建并初始化模型操作符，并將其返回。

init_operator 的返回值是RRNetOperator類，緊接著調用operator.training_process()進行訓練，所以需要查看RRNetOperator的定義（詳見 3 節(jié)）。

3. RRNetOperator類

3.1 init函數(shù)

def __init__(self, cfg):
   self.cfg = cfg

   model = RRNet(cfg).cuda(cfg.Distributed.gpu_id)
   model = nn.SyncBatchNorm.convert_sync_batchnorm(model)

   self.optimizer = optim.Adam(model.parameters(), lr=cfg.Train.lr)

   self.lr_sch = optim.lr_scheduler.MultiStepLR(self.optimizer, milestones=cfg.Train.lr_milestones, gamma=0.1)
   self.training_loader, self.validation_loader = make_dataloader(cfg, collate_fn='rrnet')

   super(RRNetOperator, self).__init__(cfg=self.cfg, model=model, lr_sch=self.lr_sch)

   # TODO: change it to our class
   self.hm_focal_loss = FocalLossHM()
   self.l1_loss = RegL1Loss()

   self.main_proc_flag = cfg.Distributed.gpu_id == 0

初始化 RRNet 模型，并將其移動到 cfg.Distributed.gpu_id 指定的GPU上  （詳見 4 節(jié)）
將模型中的 BatchNorm 層轉換為同步 BatchNorm，以便在分布式訓練中使用

初始化 Adam 優(yōu)化器，用于更新模型參數(shù)。
初始化學習率調度器，用于調整優(yōu)化器的學習率

初始化訓練數(shù)據(jù)加載器和驗證數(shù)據(jù)加載器，用于加載訓練和驗證數(shù)據(jù) （詳見3.1.1）
調用父類 BaseOperator 的構造函數(shù)，傳遞配置參數(shù)、模型和學習率調度器。

初始化熱圖的 Focal Loss，用于計算熱圖的損失函數(shù)。
初始化回歸損失函數(shù)，用于計算目標的回歸損失

判斷當前進程是否為主進程（即 GPU 編號為 0 的進程），如果是主進程，則設置 self.main_proc_flag 為 True，否則為 False。

3.1.1 make_dataloader函數(shù)

datasets = {
    'drones_det': DronesDET
}

def make_dataloader(cfg, collate_fn=None):
    if cfg.dataset not in datasets:
        raise NotImplementedError

    train_dataset = datasets[cfg.dataset](root_dir=cfg.data_root, transforms=cfg.Train.transforms, split='train',
                                          with_road_map=cfg.Train.with_road)	（詳見 5 節(jié)）
    val_dataset = datasets[cfg.dataset](root_dir=cfg.data_root, transforms=cfg.Val.transforms, split='val')

    if collate_fn is 'ctnet':
        collate_fn = train_dataset.collate_fn_ctnet
    elif collate_fn is 'rrnet':
        collate_fn = train_dataset.collate_fn_ctnet
    else:
        collate_fn = train_dataset.collate_fn

    train_loader = _Dataloader(DataLoader(train_dataset,
                                          batch_size=cfg.Train.batch_size, num_workers=cfg.Train.num_workers,
                                          sampler=cfg.Train.sampler(train_dataset) if cfg.Train.sampler else None,
                                          pin_memory=True, collate_fn=collate_fn,
                                          shuffle=True if cfg.Train.sampler is None else False))
    val_loader = DataLoader(val_dataset,
                                        batch_size=cfg.Val.batch_size, num_workers=cfg.Val.num_workers,
                                        sampler=cfg.Val.sampler(val_dataset) if cfg.Val.sampler else None,
                                        pin_memory=True, collate_fn=train_dataset.collate_fn,
                                        shuffle=True if cfg.Val.sampler is None else False)

    return train_loader, val_loader

根據(jù)配置參數(shù) cfg.dataset 確定數(shù)據(jù)集的名稱，并檢查數(shù)據(jù)集是否在 datasets 字典中注冊
根據(jù)配置參數(shù)創(chuàng)建訓練和驗證數(shù)據(jù)集 train_dataset 和 val_dataset  

根據(jù) collate_fn 的值確定使用哪個數(shù)據(jù)集的 collate_fn
	如果 collate_fn 為 'ctnet' 或 'rrnet'：則使用相應數(shù)據(jù)集的 collate_fn_ctnet 方法
	否則使用數(shù)據(jù)集的默認 collate_fn 方法
	
創(chuàng)建訓練數(shù)據(jù)加載器 train_loader 和驗證數(shù)據(jù)加載器 val_loader
最后，返回創(chuàng)建的訓練和驗證數(shù)據(jù)加載器 train_loader 和 val_loader

3.2 training_process函數(shù)

def training_process(self):
    if self.main_proc_flag:
        logger = Logger(self.cfg)

    self.model.train()

    total_loss = 0
    total_hm_loss = 0
    total_wh_loss = 0
    total_off_loss = 0
    total_s2_reg_loss = 0

    for step in range(self.cfg.Train.iter_num):
        self.lr_sch.step()
        self.optimizer.zero_grad()
        
        try:
            imgs, annos, gt_hms, gt_whs, gt_inds, gt_offsets, gt_reg_masks, names = self.training_loader.get_batch()
            targets = gt_hms, gt_whs, gt_inds, gt_offsets, gt_reg_masks, annos
        except RuntimeError as e:
            if 'out of memory' in str(e):
                print("WARNING: ran out of memory with exception at step {}.".format(step))
            continue

        outs = self.model(imgs)
        targets = gt_hms, gt_whs, gt_inds, gt_offsets, gt_reg_masks, annos
        hm_loss, wh_loss, offset_loss, s2_reg_loss = self.criterion(outs, targets)

        if step < 2000:
            s2_factor = 0
        else:
            s2_factor = 1
        loss = hm_loss + (0.1 * wh_loss) + offset_loss + s2_reg_loss*s2_factor
        loss.backward()
        self.optimizer.step()

        total_loss += float(loss)
        total_hm_loss += float(hm_loss)
        total_wh_loss += float(wh_loss)
        total_off_loss += float(offset_loss)
        total_s2_reg_loss += float(s2_reg_loss)

        if self.main_proc_flag:
            if step % self.cfg.Train.print_interval == self.cfg.Train.print_interval - 1:
                # Loss
                for param_group in self.optimizer.param_groups:
                    lr = param_group['lr']
                log_data = {'scalar': {
                    'train/total_loss': total_loss / self.cfg.Train.print_interval,
                    'train/hm_loss': total_hm_loss / self.cfg.Train.print_interval,
                    'train/wh_loss': total_wh_loss / self.cfg.Train.print_interval,
                    'train/off_loss': total_off_loss / self.cfg.Train.print_interval,
                    'train/s2_reg_loss': total_s2_reg_loss / self.cfg.Train.print_interval,
                    'train/lr': lr
                }}

                # Generate bboxs
                s1_pred_bbox, s2_pred_bbox = self.generate_bbox(outs, batch_idx=0)

                # Visualization
                img = (denormalize(imgs[0].cpu()).permute(1, 2, 0).cpu().numpy() * 255).astype(np.uint8)
                # Do nms
                s2_pred_bbox = self._ext_nms(s2_pred_bbox)
                #
                s1_pred_on_img = visualize(img.copy(), s1_pred_bbox, xywh=True, with_score=True)
                s2_pred_on_img = visualize(img.copy(), s2_pred_bbox, xywh=True, with_score=True)
                gt_img = visualize(img.copy(), annos[0, :, :6], xywh=False)

                s1_pred_on_img = torch.from_numpy(s1_pred_on_img).permute(2, 0, 1).unsqueeze(0).float() / 255.
                s2_pred_on_img = torch.from_numpy(s2_pred_on_img).permute(2, 0, 1).unsqueeze(0).float() / 255.
                gt_on_img = torch.from_numpy(gt_img).permute(2, 0, 1).unsqueeze(0).float() / 255.
                log_data['imgs'] = {'Train': [s1_pred_on_img, s2_pred_on_img, gt_on_img]}
                logger.log(log_data, step)

                total_loss = 0
                total_hm_loss = 0
                total_wh_loss = 0
                total_off_loss = 0
                total_s2_reg_loss = 0

            if step % self.cfg.Train.checkpoint_interval == self.cfg.Train.checkpoint_interval - 1 or \
                    step == self.cfg.Train.iter_num - 1:
                self.save_ckp(self.model.module, step, logger.log_dir)

判斷當前進程是否是主進程，如果是則初始化一個記錄器，用于記錄訓練過程和指標。
將模型設置為訓練模式
初始化變量以跟蹤訓練過程中的總損失和不同損失組件 total_loss，total_hm_loss，total_wh_loss，total_off_loss，total_s2_reg_loss
循環(huán)遍歷訓練步驟（iter_num 是總訓練步數(shù)）：
	self.lr_sch.step()：使用學習率調度器調整學習率。
	self.optimizer.zero_grad()：在反向傳播之前將所有模型參數(shù)的梯度清零。
	
嘗試從訓練數(shù)據(jù)加載器中加載一個批次的訓練數(shù)據(jù)：
	self.training_loader.get_batch()：獲取一個訓練數(shù)據(jù)批次，包括圖像、注釋、gt熱圖、gt寬高、gt索引、gt偏移量、gt區(qū)域掩碼和圖像名稱
	如果數(shù)據(jù)加載過程中出現(xiàn) "out of memory" 錯誤，捕獲錯誤并跳過下一個訓練步驟

通過模型進行前向傳播，以獲取給定輸入圖像imgs的預測結果outs （詳見4.2節(jié)）
將gt_hms, gt_whs, gt_inds, gt_offsets, gt_reg_masks, annos賦值為targets
self.criterion(outs, targets)：計算損失，包括熱圖損失（hm_loss）、寬高損失（wh_loss）、偏移量損失（offset_loss）和 s2 回歸損失（s2_reg_loss）（詳見3.3.1節(jié)）

在前2000個訓練步之前，將 s2_factor 設置為 0，之后設置為 1。它是應用于 s2 回歸損失的縮放因子
將損失組件組合在一起以計算用于反向傳播的總損失（loss）。
	loss.backward()：計算損失相對于模型參數(shù)的梯度。
	self.optimizer.step()：使用計算得到的梯度更新模型參數(shù)。
更新當前迭代的總損失和各個損失組件，包括：total_loss，total_hm_loss，total_wh_loss，total_off_loss，total_s2_reg_loss

如果當前進程是主進程，并且當前步數(shù)是打印間隔的最后一步（print_interval 是打印間隔），則執(zhí)行以下操作：
	為每個參數(shù)組獲取學習率，并保存到 lr 中。
	創(chuàng)建一個字典 log_data，用于存儲要記錄的數(shù)據(jù)，包括總損失和各個損失組件的平均值以及學習率。
	生成預測的邊界框 s1_pred_bbox 和 s2_pred_bbox。
	將圖像從張量轉換為NumPy數(shù)組，用于可視化。
	執(zhí)行非最大抑制（NMS）算法，篩選出 s2_pred_bbox 中的重疊邊界框。
	用 visualize 函數(shù)，將預測的邊界框繪制在圖像上，并將結果存儲在 s1_pred_on_img 和 s2_pred_on_img 中。
	將原始注釋（ground truth）繪制在圖像上，結果存儲在 gt_img 中。
	將圖像轉換回PyTorch張量，并進行相應的歸一化操作。
	創(chuàng)建一個字典 log_data['imgs'] 來存儲生成的圖像。這些圖像將在日志中記錄。
	將損失組件的計數(shù)器重置為零，以便下一個打印間隔時重新計算平均值。
	
	如果當前步數(shù)是保存檢查點的間隔的最后一步，或者當前步數(shù)是訓練的最后一步，則執(zhí)行以下操作：
		調用 self.save_ckp 函數(shù)保存模型的檢查點

3.2.1 criterion函數(shù)

 def criterion(self, outs, targets):
     s1_hms, s1_whs, s1_offsets, s2_reg, bxyxy, scores, _ = outs
     gt_hms, gt_whs, gt_inds, gt_offsets, gt_reg_masks, gt_annos = targets
     bs = s1_hms[0].size(0)
     hm_loss = 0
     wh_loss = 0
     off_loss = 0

     # I. Stage 1
     for s in range(self.cfg.Model.num_stacks):
         s1_hm = s1_hms[s]
         s1_wh = s1_whs[s]
         s1_offset = s1_offsets[s]
         s1_hm = torch.clamp(torch.sigmoid(s1_hm), min=1e-4, max=1-1e-4)
         # Heatmap Loss
         hm_loss += self.hm_focal_loss(s1_hm, gt_hms) / self.cfg.Model.num_stacks
         # WH Loss
         wh_loss += self.l1_loss(s1_wh, gt_reg_masks, gt_inds, gt_whs) / self.cfg.Model.num_stacks
         # OffSet Loss
         off_loss += self.l1_loss(s1_offset, gt_reg_masks, gt_inds, gt_offsets) / self.cfg.Model.num_stacks

     # II. Stage2 Loss
     s2_reg_loss = 0
     # Calculate IOU between prediction and bbox
     # 1. Transform bbox.
     gt_annos[:, :, 2:4] += gt_annos[:, :, 0:2]
     for b_idx in range(bs):
         batch_flag = bxyxy[:, 0] == b_idx
         bbox = bxyxy[batch_flag][:, 1:]
         gt_anno = gt_annos[b_idx]
         iou = torchvision.ops.box_iou(bbox*self.cfg.Train.scale_factor, gt_anno[:, :4])
         max_iou, max_idx = torch.max(iou, dim=1)
         pos_idx = max_iou > 0.5
         # 2. Regression Loss
         if pos_idx.sum() == 0:
             pos_idx = torch.zeros_like(max_iou, device=max_iou.device).byte()
             pos_idx[0] = 1
             pos_factor = 0
         else:
             pos_factor = 1
         gt_reg = self.generate_bbox_target(bbox[pos_idx, :]*self.cfg.Train.scale_factor, gt_anno[max_idx[pos_idx], :4])
         s2_reg_loss += F.smooth_l1_loss(s2_reg[batch_flag][pos_idx], gt_reg) * pos_factor / bs
     return hm_loss, wh_loss, off_loss, s2_reg_loss

將outs解包為各個階段的預測結果
將targets解包為真實的標簽信息
獲取batch size
初始化heatmap、WH和Offset的損失為0

循環(huán)遍歷網(wǎng)絡輸出的每個階段：
	獲取當前階段的heatmap、WH和Offset預測結果
	對當前階段的heatmap進行sigmoid激活函數(shù)并進行范圍截斷，避免出現(xiàn)取log時的溢出和計算NaN
	計算heatmap損失，使用Focal Loss作為損失函數(shù)，并將每個階段的heatmap損失累加到hm_loss中
	計算WH損失，使用平滑L1損失函數(shù)，并將每個階段的WH損失累加到wh_loss中
	計算Offset損失，使用平滑L1損失函數(shù)，并將每個階段的Offset損失累加到off_loss中

初始化Stage2的回歸損失為0
將真實邊界框的坐標從(x_min, y_min, w, h)形式轉換為(x_min, y_min, x_max, y_max)形式
循環(huán)遍歷batch中的每個樣本：
	從bbox的第一列中得到當前樣本的標識
	獲取當前樣本對應的預測邊界框
	獲取當前樣本的真實邊界框
	計算預測邊界框和真實邊界框之間的IoU
	找到每個預測邊界框與真實邊界框最匹配的IoU和對應的真實邊界框索引
	找到IoU大于0.5的預測邊界框的索引（表示匹配的邊界框）

  	如果沒有匹配的邊界框，則選擇一個預測邊界框作為匹配，以確保至少有一個匹配的邊界框
    并將pos_factor設置為0表示沒有匹配的邊界框，否則設置為1表示有至少一個匹配的邊界框
    生成匹配的預測邊界框和對應的真實邊界框的回歸目標
	使用平滑L1損失函數(shù)計算回歸損失，并將每個樣本的回歸損失累加到s2_reg_loss中。

返回第一階段的heatmap損失hm_loss，WH損失wh_loss，Offset損失off_loss
和第二階段的回歸損失s2_reg_loss作為損失函數(shù)的輸出。

4. RRNet類（網(wǎng)絡模型類）

4.1 init函數(shù)

def __init__(self, cfg):
    super(RRNet, self).__init__()
    self.num_stacks = cfg.Model.num_stacks
    self.num_classes = cfg.num_classes
    self.nms_type = cfg.Model.nms_type_for_stage1
    self.nms_per_class = cfg.Model.nms_per_class_for_stage1

    self.backbone = get_backbone(cfg.Model.backbone, num_stacks=self.num_stacks)		詳見4.1.1
    self.hm = CenterNetDetector(planes=self.num_classes, num_stacks=self.num_stacks, hm=True) 詳見4.1.2
    self.wh = CenterNetWHDetector(planes=1, num_stacks=self.num_stacks)
    self.offset_reg = CenterNetDetector(planes=2, num_stacks=self.num_stacks)
    self.head_detector = FasterRCNNDetector()							詳見4.1.3

4.1.1 get_backbone函數(shù)

根據(jù)配置文件我們可以知道，model的backbone是hourglass
【代碼解讀】RRNet: A Hybrid Detector for Object Detection in Drone-captured Images,目標檢測,人工智能,計算機視覺

def hourglass_net(num_stacks=2):
    """
    Make Hourglass Net.
    :param num_stacks: number of stacked blocks.
    :return: model
    """
    model = HourglassNet(num_stacks=num_stacks)
    model.load_state_dict(torch.load('./hourglass.pth'), strict=False)
    return model

4.1.2 CenterNetDetector類

class CenterNetDetector(nn.Module):
    def __init__(self, planes, hm=True, num_stacks=2):
        super(CenterNetDetector, self).__init__()
        self.hm = hm
        self.num_stacks = num_stacks
        self.detect_layer = nn.ModuleList([nn.Sequential(
            BasicCov(3, 256, 256, with_bn=False),
            # BasicCov(3, 40 * (2 ** _), 256, with_bn=False),
            nn.Conv2d(256, planes, (1, 1))
        ) for _ in range(self.num_stacks)
        ])
        if self.hm:
            for heat in self.detect_layer:
                heat[-1].bias.data.fill_(-2.19)

    def forward(self, input, index):
        output = self.detect_layer[index](input)
        return output

在__init__方法中，設置了一些屬性：
	self.hm: 一個布爾值，表示是否生成熱圖（heatmap）的預測。如果hm=True，則需要生成熱圖的預測，否則不需要。
	self.num_stacks: 表示堆疊的數(shù)量。該屬性用于確定需要生成多少個堆疊的預測結果。
	創(chuàng)建了一個nn.ModuleList，其中每個元素是一個包含幾個層的nn.Sequential對象。
	對于每個堆疊，nn.Sequential中包含：
		一個BasicCov層，這是一個自定義的卷積層，輸入通道為3，輸出通道為256。
		一個nn.Conv2d層，用于將256通道的特征圖輸出到指定的planes通道。這里默認為num_classes
	如果self.hm為True，則對所有的nn.Conv2d層的bias進行初始化

在forward方法中，輸入input和索引index，然后調用相應堆疊的detect_layer，并將input傳遞給它，得到輸出output。該輸出表示對應堆疊的檢測器的預測結果。

4.1.3 FasterRCNNDetector類

class FasterRCNNDetector(nn.Module):
    def __init__(self):
        super(FasterRCNNDetector, self).__init__()

        self.top_layer = Bottleneck(inplanes=256, planes=64)
        self.regressor = nn.Conv2d(256, 4, kernel_size=1)

    def forward(self, feat):
        feat = self.top_layer(feat)
        feat = F.adaptive_avg_pool2d(feat, 1)
        reg = self.regressor(feat)
        reg = reg.view(reg.size(0), reg.size(1))
        return reg

在__init__方法中，創(chuàng)建了兩個成員變量：
	self.top_layer：表示Faster R-CNN中的頂層特征層。這里采用了Bottleneck作為頂層特征層。
					Bottleneck是一個自定義的卷積層，其參數(shù)inplanes=256表示輸入通道數(shù)為256，planes=64表示輸出通道數(shù)為64。
	self.regressor：表示回歸層，用于預測目標框的邊界框坐標。
					nn.Conv2d(256, 4, kernel_size=1)定義了一個卷積層，輸入通道數(shù)為256，輸出通道數(shù)為4，即每個目標框有4個邊界坐標。

在forward方法中，輸入feat是從CenterNet中傳遞過來的特征圖。
	首先，將feat傳遞給self.top_layer，得到頂層特征層feat。
	對feat進行自適應平均池化（adaptive average pooling）操作，將其尺寸調整為1x1，以得到一個固定大小的特征向量。
	將特征向量傳遞給self.regressor，進行回歸操作，得到目標框的邊界框坐標預測
	將預測結果展平為(batch_size, 4)的形狀，其中4表示每個目標框的邊界框坐標信息
	返回邊界框坐標預測reg

4.2 forward函數(shù)

def forward(self, x, k=1500):
  # I. Forward Backbone
  pre_feat = self.backbone(x)
  # II. Forward Stage 1 to generate heatmap, wh and offset.
  hms, whs, offsets = self.forward_stage1(pre_feat)  								詳見4.2.1
  # III. Generate the true xywh for Stage 1.
  bboxs = self.transform_bbox(hms[-1], whs[-1], offsets[-1], k)  # (bs, k, 6)		詳見4.2.2

  # IV. Stage 2.
  bxyxys = []
  scores = []
  clses = []
  for b_idx in range(bboxs.size(0)):
      # Do nms
      bbox = bboxs[b_idx]
      bbox = self.nms(bbox)
      xyxy = bbox[:, :4]
      scores.append(bbox[:, 4])
      clses.append(bbox[:, 5])
      batch_idx = torch.ones((xyxy.size(0), 1), device=xyxy.device) * b_idx
      bxyxy = torch.cat((batch_idx, xyxy), dim=1)
      bxyxys.append(bxyxy)
  bxyxys = torch.cat(bxyxys, dim=0)
  scores = torch.cat(scores, dim=0)
  clses = torch.cat(clses, dim=0)
  #  Generate the ROIAlign features.
  roi_feat = torchvision.ops.roi_align(torch.relu(pre_feat[-1]), bxyxys, (3, 3))
  # Forward Stage 2 to predict and wh offset.
  stage2_reg = self.forward_stage2(roi_feat)											詳見4.2.3
  return hms, whs, offsets, stage2_reg, bxyxys, scores, clses

首先，通過self.backbone(x)調用網(wǎng)絡的backbone部分來對輸入x進行前向傳播，得到pre_feat。
然后，調用self.forward_stage1(pre_feat)來將pre_feat傳遞給Stage 1，以生成預測的熱圖（heatmap）、寬高（wh）和偏移（offsets）。這些預測存儲在hms、whs和offsets變量中。

接下來，通過調用self.transform_bbox(hms[-1], whs[-1], offsets[-1], k)
對Stage 1的輸出進行后處理，以生成真實的邊界框坐標。這些邊界框存儲在變量bboxs中。

然后，對每個邊界框進行非極大值抑制（NMS），以去除冗余的預測框。
處理后的邊界框存儲在變量bxyxys中，其中包含邊界框的坐標（xyxy）、得分和類別

使用torchvision.ops.roi_align函數(shù)，將pre_feat[-1]和bxyxys作為輸入，生成ROIAlign特征roi_feat。

最后，將roi_feat傳遞給Stage 2，即調用self.forward_stage2(roi_feat)，以預測邊界框的寬高和偏移
將預測結果以元組的形式返回：hms、whs、offsets、stage2_reg、bxyxys、scores和clses。

4.2.1 forward_stage1函數(shù)

 def forward_stage1(self, feats):
     hms = []
     whs = []
     offsets = []
     for i in range(self.num_stacks):
         feat = feats[i]
         feat = torch.relu(feat)
         hm = self.hm(feat, i)
         wh = self.wh(feat, i)
         offset = self.offset_reg(feat, i)
         hms.append(hm)
         whs.append(wh)
         offsets.append(offset)
     return hms, whs, offsets

創(chuàng)建三個空列表：hms、whs和offsets
用for循環(huán)遍歷feats中的每個特征圖，并進行以下操作：
	通過torch.relu(feat)將特征圖進行ReLU激活。
	將ReLU激活后的特征圖傳遞給self.hm，并傳遞堆疊的索引i，得到熱圖預測hm。
	將ReLU激活后的特征圖傳遞給self.wh，并傳遞堆疊的索引i，得到寬高預測wh。
	將ReLU激活后的特征圖傳遞給self.offset_reg，并傳遞堆疊的索引i，得到偏移預測offset。
將每個堆疊的熱圖、寬高和偏移預測分別添加到對應的列表hms、whs和offsets中
最后，將三個列表hms、whs和offsets作為結果返回，這些列表分別包含了不同堆疊的熱圖、寬高和偏移預測結果

4.2.2 transform_bbox函數(shù)

 def transform_bbox(self, hm, wh, offset, k=250):
      batchsize, cls_num, h, w = hm.size()
      hm = torch.sigmoid(hm)

      scores, inds, clses, ys, xs = self._topk(hm, k)

      offset = self._transpose_and_gather_feat(offset, inds)
      offset = offset.view(batchsize, k, 2)
      xs = xs.view(batchsize, k, 1) + offset[:, :, 0:1]
      ys = ys.view(batchsize, k, 1) + offset[:, :, 1:2]
      wh = self._transpose_and_gather_feat(wh, inds).clamp(min=0)

      wh = wh.view(batchsize, k, 2)
      clses = clses.view(batchsize, k, 1).float()
      scores = scores.view(batchsize, k, 1)

      pred_x = (xs - wh[..., 0:1] / 2)
      pred_y = (ys - wh[..., 1:2] / 2)
      pred_w = wh[..., 0:1]
      pred_h = wh[..., 1:2]
      pred = torch.cat([pred_x, pred_y, pred_w + pred_x, pred_h + pred_y, scores, clses], dim=2)
      return pred

對熱圖hm應用Sigmoid激活函數(shù)，將其轉換為概率值，表示每個像素點是目標的概率。
調用_topk函數(shù)，從熱圖中選取前k個最高概率的像素點，并獲取這些像素點的坐標、類別、分數(shù)等信息。這個函數(shù)用于篩選預測結果。

對偏移offset進行變換和采樣，將其應用到對應的高分概率像素點的坐標上，得到修正后的目標中心點坐標。

對寬高wh進行變換和采樣，將其應用到對應的高分概率像素點上，并取值大于等于零，確保預測的寬高是非負的。
將預測的中心點坐標和寬高信息拼接在一起，形成最終的邊界框預測結果。
返回包含邊界框預測信息的pred

4.2.3 forward_stage2函數(shù)

def forward_stage2(self, feats,):
    stage2_reg = self.head_detector(feats)
    return stage2_reg

5. DronesDET類（數(shù)據(jù)集類）

5.1 init函數(shù)

def __init__(self, root_dir, transforms=None, split='train', with_road_map=False):
     '''
     :param root_dir: root of annotations and image dirs
     :param transform: Optional transform to be applied
             on a sample.
     '''
     # get the csv
     self.images_dir = os.path.join(root_dir, split, 'images')
     self.annotations_dir = os.path.join(root_dir, split, 'annotations')
     self.roadmap_dir = os.path.join(root_dir, split, 'roadmap')
     mdf = os.listdir(self.images_dir)
     restr = r'\w+?(?=(.jpg))'
     for index, mm in enumerate(mdf):
         mdf[index] = re.match(restr, mm).group()
     self.mdf = mdf
     self.transforms = transforms
     self.with_road_map = with_road_map

根據(jù)root_dir和split參數(shù)構建了指向'images'目錄的路徑。
根據(jù)root_dir和split參數(shù)構建了指向'annotations'目錄的路徑
根據(jù)root_dir和split參數(shù)構建了指向'roadmap'目錄的路徑
列出了'images'目錄中的所有文件，并將它們賦值給變量mdf。
定義了一個正則表達式模式。用于匹配文件名中的字母數(shù)字字符（和下劃線）

定義一個循環(huán)，它遍歷mdf列表中的每個元素
	使用re.match函數(shù)將正則表達式模式（restr）應用于當前文件名（mm），提取文件名中的字母數(shù)字部分（不包括'.jpg'擴展名），并將其重新賦值給mdf列表的對應索引。
循環(huán)結束后，將只包含文件名（不帶'.jpg'）的修改后的mdf列表賦值給實例變量self.mdf。

將傳遞給構造方法的transforms參數(shù)賦值給實例變量self.transforms
將傳遞給構造方法的with_road_map參數(shù)賦值給實例變量self.with_road_map	（這里默認是true）

5.1.1 self.transforms組合類

查看self.transforms的具體定義

Config.Train.transforms = Compose([
    MultiScale(scale=(1, 1.15, 1.25, 1.35, 1.5)),
    ToTensor(),
    MaskIgnore(Config.Train.mean),
    FillDuck(),
    HorizontalFlip(),
    RandomCrop(Config.Train.crop_size),
    Normalize(Config.Train.mean, Config.Train.std),
    ToHeatmap(scale_factor=Config.Train.scale_factor)
])

MultiScale是一個多尺度縮放轉換。它將圖像按照指定的尺度因子進行多次縮放，以增加訓練數(shù)據(jù)的多樣性
ToTensor將圖像和注釋數(shù)據(jù)轉換為張量形式
MaskIgnore是一個mask忽略轉換。它使用指定的均值（Config.Train.mean）來標記忽略區(qū)域

FillDuck這是一個填充“Duck”的轉換				（論文中的數(shù)據(jù)增強，詳見5.1.1.1）  

HorizontalFlip這是一個水平翻轉轉換。它以一定的概率水平翻轉圖像，從而增加數(shù)據(jù)的多樣性。
RandomCrop(Config.Train.crop_size)是一個隨機裁剪轉換。它將圖像隨機裁剪到指定的尺寸
Normalize(Config.Train.mean, Config.Train.std)是一個圖像歸一化轉換。它將圖像像素值標準化為均值為Config.Train.mean，標準差為Config.Train.std的數(shù)據(jù)
ToHeatmap(scale_factor=Config.Train.scale_factor)是一個轉換，將圖像數(shù)據(jù)轉換為熱圖（heatmap）數(shù)據(jù)。熱圖常用于一些特定的目標檢測或姿態(tài)估計任務，用于標記目標的位置或關鍵點。

5.1.1.1 FillDuck類

class FillDuck(object):
    def __init__(self, cls_list=(1, 2, 3, 7, 8, 10), factor=0.00005):
        self.cls_list = torch.tensor(cls_list).unsqueeze(0)
        self.factor = factor

    def __call__(self, data):
        return F.fill_duck(data, self.cls_list, self.factor)

cls_list 是一個包含需要填充的目標類別的列表，默認包含類別 1、2、3、7、8 和 10。（論文中提到的類別）
factor 是一個填充因子，用于控制填充的程度，默認為 0.00005

接下來來看fill_duck的具體定義文章來源地址http://www.zghlxwxcb.cn/news/detail-633544.html

def fill_duck(data, cls_list, factor):
    try:
        img, annos, roadmap = data

        # I. Get valid area.
        valid_idx = roadmap.view(-1)
        idx = torch.nonzero(valid_idx).view(-1)
        if idx.size(0) == 0:
            return img, annos
        xs = idx % roadmap.size(1)
        ys = idx // roadmap.size(1)
        coor = torch.stack((xs, ys), dim=1)

        annos_cls = annos[:, 5]
		
從data中解包出圖像、注釋和roadmap數(shù)據(jù)，分別賦值給img、annos和roadmap。
將roadmap數(shù)據(jù)展平為一維張量，valid_idx中的元素是原始roadmap圖像中每個像素的值。
通過torch.nonzero函數(shù)找到valid_idx中非零元素的索引，即有效區(qū)域的索引。然后使用view(-1)將索引展平為一維張量。
如果有效區(qū)域中的像素數(shù)量為0（即沒有有效區(qū)域）
	則直接返回原始圖像和注釋數(shù)據(jù)，不進行后續(xù)的處理。
計算有效區(qū)域中每個像素的x坐標
計算有效區(qū)域中每個像素的y坐標
將x坐標和y坐標合并為一個坐標張量coor，其中每一行包含一個有效像素的(x, y)坐標。
從注釋數(shù)據(jù)annos中提取出目標類別信息
		
        # II Calculate scale factor for depth.
        people_flag = annos_cls == 1
        people_bbox = annos[people_flag, :4]
        if people_bbox.size(0) != 0:
            people_diag = people_bbox[:, 2:4].pow(2).sum(dim=1).sqrt()
            topk = min(3, people_diag.size(0))
            max_diag, max_idx = torch.topk(people_diag, k=topk)
            min_diag, min_idx = torch.topk(people_diag, k=1, largest=False)
            y_diff = people_bbox[max_idx, 1] - people_bbox[min_idx, 1]
            scale_factor = ((max_diag - min_diag) / (y_diff.abs() + 1e-5)).mean()
        else:
            scale_factor = 1
            
創(chuàng)建了一個布爾索引，用于選擇目標類別為1的目標
使用布爾索引people_flag來選擇目標類別為1的目標的邊界框信息，用people_flag選擇出這些目標的前4列，即包含邊界框的左上角坐標和右下角坐標的信息。
判斷是否存在目標類別為1的目標
	計算目標類別為1的目標框的對角線長度
	取其右下角坐標減去左上角坐標得到邊界框的寬和高，然后使用勾股定理計算對角線長度。
	確定了最大尺度因子的計算個數(shù)
	找到目標類別為1的目標中，對角線長度最大的k個目標，并返回它們的對角線長度和對應的索引。
	找到目標類別為1的目標中，對角線長度最小的1個目標，并返回它的對角線長度和對應的索引。
	算了目標類別為1的目標中，對角線長度最大和最小的目標的上下邊界之間的差值。
	計算目標類別為1的目標的尺度因子。
		它通過最大和最小對角線長度之間的差值除以上下邊界之間的差值得到尺度因子，并取平均值作為最終的尺度因子。
如果目標類別為1的目標不存在（即people_bbox.size(0) == 0），則尺度因子設為1，表示不進行尺度變換。

        # III. For relation class.

        people_flag = annos_cls == 2
        people_select_annos = annos[people_flag, :]

        relation_flag = torch.zeros_like(annos_cls).byte()

        if people_select_annos.size(0) != 0:
            iou = bbox_iou(people_select_annos[:, :4], annos[:, :4], x1y1x2y2=False)
            if iou.size(1) > 2:
                max_v, max_i = torch.topk(iou, dim=1, k=2)
                flag = max_v[:, 1] > 0
                max_i = max_i[flag, :]
                people_idx = max_i[:, 0]
                vechile_idx = max_i[:, 1]

                relation_flag[people_idx] = 1
                relation_flag[vechile_idx] = 1

創(chuàng)建了一個布爾索引，用于選擇目標類別為2的目標
使用布爾索引people_flag來選擇目標類別為2的目標的所有信息
創(chuàng)建了一個與annos_cls形狀相同的零張量relation_flag ，并將其轉換為布爾型
判斷是否存在目標類別為2的目標
	計算目標類別為2的目標與所有目標之間的IOU（交并比）
	判斷IOU矩陣的列數(shù)是否大于2
		找到IOU矩陣中每行的最大和次大的值，并返回它們的值和索引
		創(chuàng)建一個布爾索引，用于選擇次大的IOU值大于0的行
		使用布爾索引flag來選擇滿足條件的行
		分別提取次大IOU值對應的行的第一個索引和第二個索引
		將人目標的索引和其他與人目標有關系的目標的索引設置為1

        # IV. Calculate aug N.
        cls = cls_list.repeat(annos.size(0), 1)
        normal_flag = (cls == annos_cls.unsqueeze(1).repeat(1, cls.size(1)).long()).sum(dim=1) > 0
        normal_flag = normal_flag * (1 - relation_flag)

        total_n = max(int(factor * valid_idx.sum()), 5)
        relation_n = relation_flag.float().sum() / 2
        normal_n = normal_flag.float().sum()
        if relation_n + normal_n == 0:
            return img, annos
        r_n = int(relation_n / (relation_n + normal_n) * total_n)
        n_n = total_n - r_n

將目標類別列表cls_list重復annos.size(0)次，生成一個形狀為(annos.size(0), len(cls_list))的張量cls
通過布爾索引生成一個標記向量normal_flag，用于標記目標是否為普通（normal）目標
根據(jù)normal_flag和relation_flag的取值，對普通目標的標記向量進行進一步調整
計算總樣本數(shù)，用于控制數(shù)據(jù)增強的采樣數(shù)量
計算關系目標的數(shù)量
計算普通目標的數(shù)量
判斷關系目標和普通目標的數(shù)量之和是否為0。如果為0，表示沒有需要采樣的目標，直接返回原始圖像和注釋數(shù)據(jù)
計算關系目標的采樣數(shù)量
計算普通目標的采樣數(shù)量

        # V. Fill image
        paste_idx = torch.randint(low=0, high=coor.size(0), size=(total_n,))
        paste_coors = coor[paste_idx]

        new_annos = []
        # 1. Sample normal object.
        if n_n != 0:
            normal_annos = annos[normal_flag, :]
            sample_idx = torch.randint(low=0, high=normal_annos.size(0), size=(n_n,))
            sample_annos = normal_annos[sample_idx]
            for i, anno in enumerate(sample_annos):
                paste_coor = paste_coors[i].float()

                # Apply depth scale.
                anno_ct_y = anno[1] + anno[3] / 2
                diff = (anno_ct_y - paste_coor[1]).abs() * scale_factor
                anno_diag = (anno[2].pow(2) + anno[3].pow(2)).sqrt()
                if anno_ct_y > paste_coor[1]:
                    # Do reduce.
                    factor = 1 - diff / anno_diag
                else:
                    factor = 1 + diff / anno_diag
                cropped_obj = img[:, int(anno[1]):int(anno[1]+anno[3]), int(anno[0]):int(anno[0]+anno[2])]
                factor = factor.clamp(min=0.5, max=2)
                cropped_obj = F.interpolate(
                    cropped_obj.unsqueeze(0),
                    scale_factor=float(factor),
                    mode='bilinear',
                    align_corners=True
                )[0]
                obj_h, obj_w = cropped_obj.size()[-2:]
                paste_coor[0] -= obj_w / 2
                paste_coor[1] -= obj_h / 2
                paste_coor[0] = paste_coor[0].clamp(min=1, max=img.size(2)-obj_w - 1)
                paste_coor[1] = paste_coor[1].clamp(min=1, max=img.size(1)-obj_h - 1)
                img[:, int(paste_coor[1]):int(paste_coor[1]+obj_h),
                int(paste_coor[0]):int(paste_coor[0]+obj_w)] = cropped_obj
                new_annos.append(torch.tensor([[int(paste_coor[0]), int(paste_coor[1]), int(obj_w), int(obj_h), anno[4], anno[5], anno[6], anno[7]]]))

生成一個隨機索引paste_idx，用于從坐標張量coor中隨機采樣total_n個坐標。
使用隨機索引paste_idx從坐標張量coor中選取對應的坐標，得到paste_coors，即采樣得到的隨機坐標。
創(chuàng)建一個空列表new_annos，用于存儲生成的新的目標注釋
判斷是否需要對普通目標進行采樣
	使用布爾索引normal_flag，選擇普通目標的注釋數(shù)據(jù)
	生成一個隨機索引sample_idx，用于從普通目標的注釋數(shù)據(jù)中隨機采樣n_n個目標。
	使用隨機索引sample_idx從普通目標的注釋數(shù)據(jù)中選取對應的目標
	for循環(huán)，遍歷隨機采樣得到的普通目標的注釋數(shù)據(jù)
		獲取當前目標的隨機坐標，將其轉換為浮點數(shù)類型
		計算目標的中心y坐標
		計算目標中心y坐標與隨機坐標y的差值，并乘以尺度因子scale_factor，用于調整目標的尺度。
		計算目標邊界框的對角線長度
		如果目標中心y坐標大于隨機坐標y，說明隨機坐標位于目標下方，此時將尺度因子設為1減去差值與對角線長度比例的值。
		如果目標中心y坐標小于隨機坐標y，說明隨機坐標位于目標上方，此時將尺度因子設為1加上差值與對角線長度比例的值。
		從原始圖像img中裁剪出目標的圖像塊
		將尺度因子限制在0.5到2之間，避免過大或過小的尺度變換
		使用雙線性插值對目標圖像塊進行尺度變換
		獲取經(jīng)過尺度變換后的目標圖像塊的高度和寬度
		將隨機坐標paste_coor的x和y分別減去目標圖像塊的寬度和高度的一半，將隨機坐標對準到目標圖像塊的中心。
		將隨機坐標的x和y限制在圖像的有效范圍內，避免出現(xiàn)坐標越界
		將經(jīng)過尺度變換后的目標圖像塊插入到原始圖像img中的隨機坐標位置處
		將當前增強后的目標的信息添加到new_annos列表中

        # 2. Sample Relation Object.
        if r_n != 0:
            people_annos = annos[people_idx, :]
            vechile_annos = annos[vechile_idx, :]

            sample_idx = torch.randint(low=0, high=people_annos.size(0), size=(r_n,))
            sample_people_annos = people_annos[sample_idx]
            sample_vechile_annos = vechile_annos[sample_idx]
            sample_people_annos[:, 2:4] += sample_people_annos[:, 0:2]
            sample_vechile_annos[:, 2:4] += sample_vechile_annos[:, 0:2]

            for i in range(r_n):
                paste_coor = paste_coors[i + n_n].float()

                people_anno = sample_people_annos[i]
                vechile_anno = sample_vechile_annos[i]

                min_x = int(min(people_anno[0], vechile_anno[0]))
                min_y = int(min(people_anno[1], vechile_anno[1]))
                max_x = int(max(people_anno[2], vechile_anno[2]))
                max_y = int(max(people_anno[3], vechile_anno[3]))

                # Apply depth scale.
                anno_ct_y = (min_y + max_y) / 2
                diff = (anno_ct_y - paste_coor[1]).abs() * scale_factor
                anno_diag = math.sqrt((max_x-min_x)**2 + (max_y-min_y)**2)
                if anno_ct_y > paste_coor[1]:
                    # Do reduce.
                    factor = 1 - diff / anno_diag
                else:
                    factor = 1 + diff / anno_diag
                cropped_obj = img[:, min_y:max_y, min_x:max_x]
                factor = factor.clamp(min=0.5, max=2)
                cropped_obj = F.interpolate(
                    cropped_obj.unsqueeze(0),
                    scale_factor=float(factor),
                    mode='bilinear',
                    align_corners=True
                )[0]

                obj_h, obj_w = cropped_obj.size()[-2:]
                paste_coor[0] -= obj_w / 2
                paste_coor[1] -= obj_h / 2
                paste_coor[0] = paste_coor[0].clamp(min=1, max=img.size(2)-obj_w - 1)
                paste_coor[1] = paste_coor[1].clamp(min=1, max=img.size(1)-obj_h - 1)
                img[:, int(paste_coor[1]):int(paste_coor[1]+obj_h),
                int(paste_coor[0]):int(paste_coor[0]+obj_w)] = cropped_obj
                x_bias = min_x - paste_coor[0]
                y_bias = min_y - paste_coor[1]
                new_people = people_anno
                new_people[2:4] -= new_people[0:2]
                new_people[2:4] *= factor
                new_people[0] -= x_bias
                new_people[1] -= y_bias

                new_vechile = vechile_anno
                new_vechile[2:4] -= new_vechile[0:2]
                new_vechile[2:4] *= factor
                new_vechile[0] -= x_bias
                new_vechile[1] -= y_bias

                new_annos.append(new_people.unsqueeze(0).floor())
                new_annos.append(new_vechile.unsqueeze(0).floor())
        new_annos = torch.cat(new_annos)
        annos = torch.cat((annos, new_annos))

判斷是否需要對關系目標進行采樣
	使用索引people_idx和vechile_idx分別從原始目標注釋數(shù)據(jù)中選擇關系目標和與之相關的目標
	生成一個隨機索引sample_idx，用于從關系目標的注釋數(shù)據(jù)中隨機采樣r_n個目標
	使用隨機索引sample_idx從關系目標和與之相關的目標的注釋數(shù)據(jù)中選取對應的目標。
	將目標的邊界框坐標轉換為(x_min, y_min, x_max, y_max)的形式。
	遍歷關系目標的采樣結果
		獲取當前關系目標的隨機坐標，將其轉換為浮點數(shù)類型
		分別獲取當前關系目標和與之相關的目標的注釋數(shù)據(jù)
		別計算當前目標的左上角x和y坐標
		分別計算當前目標的右下角x和y坐標
		計算目標的中心y坐標，并計算其與隨機坐標y的差值，并乘以尺度因子scale_factor。
		計算目標的對角線長度，用于后續(xù)計算尺度變換的縮放因子
		根據(jù)目標的中心y坐標和隨機坐標y的關系來選擇尺度變換的因子
			如果目標的中心y坐標大于隨機坐標y，說明隨機坐標位于目標下方，此時將尺度因子設為1減去差值與對角線長度比例的值。
			如果目標的中心y坐標小于隨機坐標y，說明隨機坐標位于目標上方，此時將尺度因子設為1加上差值與對角線長度比例的值。
		從原始圖像img中裁剪出包含目標的圖像塊
		行將尺度因子限制在0.5到2之間，避免過大或過小的尺度變換
		使用雙線性插值對目標圖像塊進行尺度變換
		獲取縮放后的目標圖像塊的高度和寬度
		將隨機坐標paste_coor的x和y分別減去目標圖像塊的寬度和高度的一半，將隨機坐標對準到目標圖像塊的中心。
		將隨機坐標的x和y限制在圖像的有效范圍內，避免出現(xiàn)坐標越界
		將經(jīng)過尺度變換后的目標圖像塊插入到原始圖像img中的隨機坐標位置處，完成數(shù)據(jù)增強的操作。
		
		分別計算目標圖像塊左上角相對于隨機坐標的x和y偏移量
		分別創(chuàng)建新的張量new_people和new_vechile，用于存儲經(jīng)過尺度變換和偏移后的目標注釋信息。
		將目標的右下角坐標轉換為寬度和高度
		將目標的寬度和高度乘以尺度因子，完成尺度變換
		將目標的左上角坐標加上x和y偏移量，完成位置偏移
		將經(jīng)過尺度變換和偏移后的人和車輛目標的注釋信息添加到new_annos列表中。
		使用torch.cat()函數(shù)將所有增強后的目標注釋信息拼接成一個張量，形狀為(N, 8)，N是增強后的目標數(shù)量。
		將原始目標注釋信息和增強后的目標注釋信息拼接在一起，形成最終的目標注釋信息。
		返回增強后的圖像img和增強后的目標注釋信息annos
		
        return img, annos
    except:
        return data[0], data[1]

5.2 getitem函數(shù)

def __getitem__(self, item):
    name = self.mdf[item]
    img_name = os.path.join(self.images_dir, '{}.jpg'.format(name))
    txt_name = os.path.join(self.annotations_dir, '{}.txt'.format(name))
    # read image
    image = Image.open(img_name).convert("RGB")

    # read annotation
    annotation = pd.read_csv(txt_name, header=None)
    annotation = np.array(annotation)[:, :8]
    annotation = annotation[annotation[:, 5] != 11]

    # read road segmentation
    roadmap = None
    if self.with_road_map:
        roadmap_name = os.path.join(self.roadmap_dir, '{}.jpg'.format(name))
        roadmap = cv2.imread(roadmap_name)

    sample = (image, annotation, roadmap)

    if self.transforms:
        sample = self.transforms(sample)
    return sample + (name,)

根據(jù)傳入的item索引，從self.mdf列表中獲取相應的文件名（不包括'.jpg'擴展名）
構建了圖像文件的完整路徑，用于讀取圖像數(shù)據(jù)（加入了后綴名jpg）
構建了注釋文件的完整路徑，用于讀取注釋數(shù)據(jù)

使用PIL庫打開圖像文件，然后將其轉換為RGB格式。Image.open()用于讀取圖像數(shù)據(jù)。
使用Pandas庫從注釋文件中讀取CSV格式的注釋數(shù)據(jù)
將讀取的注釋數(shù)據(jù)轉換為NumPy數(shù)組，并保留前8列數(shù)據(jù)
篩選掉注釋中第5列等于11的行。這可能是為了排除某個特定的類別。
創(chuàng)建一個變量roadmap并初始化為None
判斷self.with_road_map是否為True：
	如果數(shù)據(jù)集包含roadmap數(shù)據(jù)，這一行構建了roadmap圖像文件的完整路徑，用于讀取roadmap數(shù)據(jù)。
	數(shù)據(jù)集包含roadmap數(shù)據(jù)，則使用OpenCV庫讀取roadmap圖像數(shù)據(jù)

將圖像、注釋和roadmap數(shù)據(jù)（如果有的話）打包成一個元組，并賦值給變量sample
檢查self.transforms是否存在（非None）。
	如果存在，說明數(shù)據(jù)集已經(jīng)定義了數(shù)據(jù)變換（數(shù)據(jù)增強等），則將sample應用到這些變換上

將打包好的樣本元組返回，并附加文件名（不包括'.jpg'）作為元組的最后一個元素。這樣，樣本數(shù)據(jù)和對應的文件名就一并返回了。

到了這里，關于【代碼解讀】RRNet: A Hybrid Detector for Object Detection in Drone-captured Images的文章就介紹完了。如果您還想了解更多內容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。如若轉載，請注明出處：如若內容造成侵權/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領支付寶紅包贊助服務器費用

解讀 Centralized Feature Pyramid for Object Detection
視覺特征金字塔在廣泛的應用中顯示出其有效性和效率的優(yōu)越性。然而，現(xiàn)有的方法過分地集中于層間特征交互，而忽略了層內特征規(guī)則，這是經(jīng)驗證明是有益的。盡管一些方法試圖在注意力機制或視覺變換器的幫助下學習緊湊的層內特征表示，但它們忽略了對密集預測任
2024年02月04日
瀏覽(21)
【論文解讀】Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
因為Hybrid-SORT的baseline是基于OCSORT進行改進的，在這之前建議先了解byteTrack和【】的相關知識多目標跟蹤(MOT)將問題分為兩個子任務。第一個任務是檢測每個幀中的對象。第二個任務是將它們在不同的框架中聯(lián)系起來。關聯(lián)任務主要通過顯式或隱式地利用強線索來解決，包括空
2024年02月12日
瀏覽(17)
MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection 論文解讀
單目目標檢測在自動駕駛領域，一直是一個具有挑戰(zhàn)的任務?，F(xiàn)在大部分的方式都是沿用基于卷積的2D 檢測器，首先檢測物體中心，后通過中心附近的特征去預測3D屬性。但是僅僅通過局部的特征去預測3D特征是不高效的，且并沒有考慮一些長距離的物體之間的深度關系，丟
2024年02月09日
瀏覽(24)
【Deformable DETR 論文+源碼解讀】Deformable Transformers for End-to-End Object Detection
上一篇講完了DETR相關原理和源碼，打算繼續(xù)再學習DETR相關改進。這次要解讀的是21年發(fā)表的一篇論文: ICLR 2021：Deformable DETR: Deformable Transformers for End-to-End Object Detection 。先感謝這位知乎大佬，講的太細了： Deformable DETR: 基于稀疏空間采樣的注意力機制，讓DCN與Transformer一起玩
2023年04月16日
瀏覽(17)
【論文解讀】PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
我們提出了一種新的高性能3D對象檢測框架，稱為PointVoxel RCNN（PV-RCNN），用于從點云中精確檢測3D對象。我們提出的方法深度集成了三維體素卷積神經(jīng)網(wǎng)絡（CNN）和基于PointNet的集合抽象，以學習更具判別力的點云特征。它利用了3D體素CNN的高效學習和高質量建議以及基于Poi
2024年01月23日
瀏覽(47)
論文閱讀＜GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions＞
????????這篇文章是在2022年AAAI上發(fā)表的一篇文章IA-YOLO上進行改進的，基本思想是一致的，利用的相機ISP的pipeline進行圖像增強，和YOLOv3進行聯(lián)合訓練。論文鏈接：[2209.14922] GDIP: Gated Differentiable Image Processing for Object-Detection in Adverse Conditions (arxiv.org) 代碼鏈接：GitHub - Gate
2024年02月04日
瀏覽(23)
論文閱讀RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection
論文：https://arxiv.org/pdf/2103.10039.pdf 代碼：https://github.com/tusen-ai/RangeDet 提出了一個名為RangeDet的新型3D物體檢測技術，利用激光雷達數(shù)據(jù)。 RangeDet的核心在于使用了一種緊湊的表示方法，稱為范圍視圖，與其他常用方法相比，它避免了計算誤差。根據(jù)論文中的討論，使用范圍視
2024年04月13日
瀏覽(27)
論文閱讀——CRNet: Channel-Enhanced Remodeling-Based Network for Salient Object Detection in Optical
這篇是老師發(fā)的，主要是用來解決遙感顯著性檢測的邊緣問題期刊 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 年份 2023 論文地址 https://ieeexplore.ieee.org/abstract/document/10217013 代碼地址 https://github.com/hilitteq/CRNet.git CRNet：一種基于網(wǎng)格增強重構的光學遙感圖像顯著目標檢測網(wǎng)絡除了它
2024年02月03日
瀏覽(19)
【論文合集】Awesome Object Detection in Aerial Images
No. Year Pub. Title Links 08 2022 arXiv Towards Large-Scale Small Object Detection: Survey and Benchmarks Gong Cheng, Xiang Yuan, Xiwen Yao, Kebing Yan, Qinghua Zeng, Junwei Han Paper/Data 07 2021 PAMI Object Detection in Aerial Images: A Large-Scale Benchmark and Challenges? DOTA ??? Jian Ding, Nan Xue, Gui-Song Xia, Xiang Bai, et al. Paper/Proj 06 20
2024年02月05日
瀏覽(23)
【Adversarial Attack in Object Detection】物理對抗攻擊和防御
在計算機視覺中，根據(jù)實現(xiàn)領域，對抗性攻擊可以分為數(shù)字攻擊和物理攻擊。數(shù)字攻擊是指在攝像頭成像之后對數(shù)字像素進行攻擊，物理攻擊是指在攝像頭成像之前對物理對象進行攻擊。雖然數(shù)字攻擊（如 PGD [ madry2017towards ]、 MI-FGSM [ dong2018boosting ]、 CW [ carlini2017towards ]和
2024年02月10日
瀏覽(11)

<option id="2avpk"><pre id="2avpk"></pre></option>

<option id="2avpk"><pre id="2avpk"></pre></option>

<strong id="2avpk"><div id="2avpk"></div></strong>