前言
cfDNA(無細(xì)胞DNA,游離DNA,Circulating free DNA
or Cell free DNA
)是指在血液循環(huán)中存在的DNA片段。這些DNA片段不屬于任何細(xì)胞,因此被稱為“無細(xì)胞”或“游離”的。cfDNA來源廣泛,可以來自正常細(xì)胞和病變細(xì)胞(如腫瘤細(xì)胞)的死亡和分解過程。cfDNA的長度通常在160-180堿基對左右,這與核小體保護(hù)的DNA片段長度相符。
cfDNA的研究對于非侵入性診斷、疾病監(jiān)測、早期檢測以及了解生理和病理狀態(tài)具有重要意義。特別是在腫瘤學(xué)領(lǐng)域,通過分析循環(huán)腫瘤DNA(ctDNA
),即來源于腫瘤細(xì)胞的cfDNA,可以獲取腫瘤的遺傳信息,從而指導(dǎo)癌癥的診斷、治療選擇和治療效果監(jiān)測。
cfDNAPro
主要功能:
- 數(shù)據(jù)表征: 計算片段大小分布的整體、中位數(shù)和眾數(shù),以及片段大小輪廓中的峰和谷,還有振蕩周期性。
- 數(shù)據(jù)可視化: 提供了多種函數(shù)來可視化這些數(shù)據(jù),包括整體到單個片段的可視化、度量可視化、模式和摘要可視化等。
demo
1.片段長度可視化
-
上圖:橫軸表示片段長度,范圍為30bp至500bp。縱軸表示具有特定讀取長度的讀取比例。這里的線并不是平滑曲線,而是連接不同數(shù)據(jù)點的直線。
-
下圖:首先統(tǒng)計長度小于或等于30bp的讀取數(shù)量(例如N),然后將其歸一化為比例。重復(fù)這一過程,直至處理完所有片段長度(即30bp, 31bp, …, 500bp),然后以線圖的形式呈現(xiàn)。與非累積圖一樣,這里的線也是連接各個數(shù)據(jù)點,而不是平滑曲線。
library(scales)
library(ggpubr)
library(ggplot2)
library(dplyr)
# Define a list for the groups/cohorts.
grp_list<-list("cohort_1"="cohort_1",
"cohort_2"="cohort_2",
"cohort_3"="cohort_3",
"cohort_4"="cohort_4")
# Generating the plots and store them in a list.
result<-sapply(grp_list, function(x){
result <-callSize(path = data_path) %>%
dplyr::filter(group==as.character(x)) %>%
plotSingleGroup()
}, simplify = FALSE)
#> setting default outfmt to df.
#> setting default input_type to picard.
#> setting default outfmt to df.
#> setting default input_type to picard.
#> setting default outfmt to df.
#> setting default input_type to picard.
#> setting default outfmt to df.
#> setting default input_type to picard.
# Multiplexing the plots in one figure
suppressWarnings(
multiplex <-
ggarrange(result$cohort_1$prop_plot +
theme(axis.title.x = element_blank()),
result$cohort_4$prop_plot +
theme(axis.title = element_blank()),
result$cohort_1$cdf_plot,
result$cohort_4$cdf_plot +
theme(axis.title.y = element_blank()),
labels = c("Cohort 1 (n=5)", "Cohort 4 (n=4)"),
label.x = 0.2,
ncol = 2,
nrow = 2))
multiplex
2.片段長度分布比較
- callMetrics:計算了每個組的中位片段大小分布
- 上圖:每個隊列中位數(shù)片段大小分布的比例。y軸顯示讀取比例,x軸顯示片段大小。圖中顯示的線不是平滑的曲線,而是連接不同數(shù)據(jù)點的線
- 下圖:中位數(shù)累積分布函數(shù)(CDF)的圖形。y軸顯示累積比例,x軸仍然顯示片段大小。這是一個逐步上升的圖形,反映了不同片段大小下讀取的累積分布情況。
# Set an order for those groups (i.e. the levels of factors).
order <- c("cohort_1", "cohort_2", "cohort_3", "cohort_4")
# Generate plots.
compare_grps<-callMetrics(data_path) %>% plotMetrics(order=order)
#> setting default input_type to picard.
# Modify plots.
p1<-compare_grps$median_prop_plot +
ylim(c(0, 0.028)) +
theme(axis.title.x = element_blank(),
axis.title.y = element_text(size=12,face="bold")) +
theme(legend.position = c(0.7, 0.5),
legend.text = element_text( size = 11),
legend.title = element_blank())
p2<-compare_grps$median_cdf_plot +
scale_y_continuous(labels = scales::number_format(accuracy = 0.001)) +
theme(axis.title=element_text(size=12,face="bold")) +
theme(legend.position = c(0.7, 0.5),
legend.text = element_text( size = 11),
legend.title = element_blank())
# Finalize plots.
suppressWarnings(
median_grps<-ggpubr::ggarrange(p1,
p2,
label.x = 0.3,
ncol = 1,
nrow = 2
))
median_grps
3.可視化DNA片段模態(tài)長度
- 柱狀圖:這里的模態(tài)片段大小是指在樣本中出現(xiàn)次數(shù)最多的DNA片段長度
# Set an order for your groups, it will affect the group order along x axis!
order <- c("cohort_1", "cohort_2", "cohort_3", "cohort_4")
# Generate mode bin chart.
mode_bin <- callMode(data_path) %>% plotMode(order=order,hline = c(167,111,81))
#> setting default mincount as 0.
#> setting default input_type to picard.
# Show the plot.
suppressWarnings(print(mode_bin))
- 堆疊柱狀圖:可以看到每個組中不同長度片段的分布
# Set an order for your groups, it will affect the group order along x axis.
order <- c("cohort_1", "cohort_2", "cohort_3", "cohort_4")
# Generate mode stacked bar chart. You could specify how to stratify the modes
# using 'mode_partition' arguments. If other modes exist other than you
# specified, an 'other' group will be added to the plot.
mode_stacked <-
callMode(data_path) %>%
plotModeSummary(order=order,
mode_partition = list(c(166,167)))
#> setting default input_type to picard.
# Modify the plot using ggplot syntax.
mode_stacked <- mode_stacked + theme(legend.position = "top")
# Show the plot.
suppressWarnings(print(mode_stacked))
4.片段化振蕩模式比較
- 間峰距離:通過測量和比較間距距離(峰值之間的距離),比較不同隊列中的10bp周期性振蕩模式
# Set an order for your groups, it will affect the group order.
order <- c("cohort_1", "cohort_2", "cohort_4", "cohort_3")
# Plot and modify inter-peak distances.
inter_peak_dist<-callPeakDistance(path = data_path, limit = c(50, 135)) %>%
plotPeakDistance(order = order) +
labs(y="Fraction") +
theme(axis.title = element_text(size=12,face="bold"),
legend.title = element_blank(),
legend.position = c(0.91, 0.5),
legend.text = element_text(size = 11))
#> setting the mincount to 0.
#> setting the xlim to c(7,13).
#> setting default outfmt to df.
#> Setting default mincount to 0.
#> setting default input_type to picard.
# Show the plot.
suppressWarnings(print(inter_peak_dist))
- 間谷距離:與之前介紹的間峰距離可視化相比,間谷距離的可視化重點在于表示讀取次數(shù)下降的區(qū)域,而不是上升的區(qū)域。這兩個圖表的區(qū)別在于它們關(guān)注的是碎片大小譜的不同特點,一個是峰點(即頻率的局部最高點),另一個是谷點(即頻率的局部最低點)。
# Set an order for your groups, it will affect the group order.
order <- c("cohort_1", "cohort_2", "cohort_4", "cohort_3")
# Plot and modify inter-peak distances.
inter_valley_dist<-callValleyDistance(path = data_path,
limit = c(50, 135)) %>%
plotValleyDistance(order = order) +
labs(y="Fraction") +
theme(axis.title = element_text(size=12,face="bold"),
legend.title = element_blank(),
legend.position = c(0.91, 0.5),
legend.text = element_text(size = 11))
#> setting the mincount to 0.
#> setting the xlim to c(7,13).
#> setting default outfmt to df.
#> setting the mincount to 0.
#> setting default input_type to picard.
# Show the plot.
suppressWarnings(print(inter_valley_dist))
5. ggplot2美化
library(ggplot2)
library(cfDNAPro)
# Set the path to the example sample.
exam_path <- examplePath("step6")
# Calculate peaks and valleys.
peaks <- callPeakDistance(path = exam_path)
#> setting default limit to c(35,135).
#> setting default outfmt to df.
#> Setting default mincount to 0.
#> setting default input_type to picard.
valleys <- callValleyDistance(path = exam_path)
#> setting default limit to c(35,135).
#> setting default outfmt to df.
#> setting the mincount to 0.
#> setting default input_type to picard.
# A line plot showing the fragmentation pattern of the example sample.
exam_plot_all <- callSize(path=exam_path) %>% plotSingleGroup(vline = NULL)
#> setting default outfmt to df.
#> setting default input_type to picard.
# Label peaks and valleys with dashed and solid lines.
exam_plot_prop <- exam_plot_all$prop +
coord_cartesian(xlim = c(90,135),ylim = c(0,0.0065)) +
geom_vline(xintercept=peaks$insert_size, colour="red",linetype="dashed") +
geom_vline(xintercept = valleys$insert_size,colour="blue")
# Show the plot.
suppressWarnings(print(exam_plot_prop))
# Label peaks and valleys with dots.
exam_plot_prop_dot<- exam_plot_all$prop +
coord_cartesian(xlim = c(90,135),ylim = c(0,0.0065)) +
geom_point(data= peaks,
mapping = aes(x= insert_size, y= prop),
color="blue",alpha=0.5,size=3) +
geom_point(data= valleys,
mapping = aes(x= insert_size, y= prop),
color="red",alpha=0.5,size=3)
# Show the plot.
suppressWarnings(print(exam_plot_prop_dot))
文章來源:http://www.zghlxwxcb.cn/news/detail-854986.html
想做cfDNA,邁出分析的第一步,數(shù)據(jù)表征。文章來源地址http://www.zghlxwxcb.cn/news/detail-854986.html
到了這里,關(guān)于cfDNAPro|cfDNA片段數(shù)據(jù)生物學(xué)表征及可視化的R包的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!