Last updated: 2021-02-19

Checks: 7 0

Knit directory: Human_Development_ATACseq_bulk/

This reproducible R Markdown analysis was created with workflowr (version 1.6.2). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20210216) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 294d830. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  *.noXYMT.bed.tidy.bed
    Untracked:  *xls.bg.bed
    Untracked:  *xls.dn.bed
    Untracked:  *xls.up.bed
    Untracked:  Development_noXY.jn.rnk
    Untracked:  FetalvsYoung_noXY.jn.rnk
    Untracked:  Homo_sapiens.GRCh38.96.fulllength.saf
    Untracked:  YoungvsAdult_noXY.jn.rnk
    Untracked:  analysis/*.dn.bed.homeranno.txt
    Untracked:  analysis/*.up.bed.homeranno.txt
    Untracked:  analysis/00.WorkFlowR_setting.R
    Untracked:  code/EnDrich.R
    Untracked:  code/EnDrichProc_Development_noXY.R
    Untracked:  code/EnDrichProc_FetalvsYoung_noXY.R
    Untracked:  code/EnDrichProc_YoungvsAdult_noXY.R
    Untracked:  header.sam
    Untracked:  humanATAC*bed.saf
    Untracked:  humanATAC*bed.saf.pe.q30.mx
    Untracked:  humanATAC*bed.saf.pe.q30.mx.all
    Untracked:  humanATAC*bed.saf.pe.q30.mx.all.fix
    Untracked:  humanATAC*bed.saf.pe.q30.mx.chr
    Untracked:  humanATAC*bed.saf.pe.q30.mx.fix
    Untracked:  humanATAC*bed.saf.pe.q30.mx.hum.fix
    Untracked:  output/20190801_ATAC_samplesheet.txt
    Untracked:  output/ATACseq_samplesheet.txt
    Untracked:  output/atac_hum_tss_pe_mapk30_q30.mx.all_unfiltered.csv
    Untracked:  output/atac_hum_tss_pe_mapk30_q30.mx.chr
    Untracked:  output/atac_hum_tss_pe_mapk30_q30.mx.hum.fix_filt.csv
    Untracked:  output/humanATAC_peaks_cov2_rmBL.bed.saf.pe.q30.mx.MvsF.fix_filt.csv
    Untracked:  output/humanATAC_peaks_cov2_rmBL.bed.saf.pe.q30.mx.all.fix_filt.csv
    Untracked:  output/humanATAC_peaks_cov2_rmBL.bed.saf.pe.q30.mx.all_unfiltered.csv
    Untracked:  output/humanATAC_peaks_cov2_rmBL.bed.saf.pe.q30.mx.hum.fix_filt.csv
    Untracked:  output/logCPM_humanATAC_peaks_cov2_rmBL.bed.saf.pe.q30.mx.all.fix_filt.csv

Unstaged changes:
    Modified:   analysis/about.Rmd
    Modified:   analysis/index.Rmd
    Modified:   analysis/license.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/03.Peak_Annotation_and_Homer_Analysis.Rmd) and HTML (docs/03.Peak_Annotation_and_Homer_Analysis.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 294d830 evangelynsim 2021-02-19 wflow_publish(c(“analysis/01.Generate_reference_genome.Rmd”,

Introduction

Following merging samples from the same group in the Primary Analysis section, perform peak annotationa and transcription factor motif analysis using Homer.

Used libraries and functions

  • Homer
  • bedtools/2.27.1
library(ggplot2)
library(moonBook)
library(webr)
library(waffle)
library(extrafont)
Registering fonts with R
library(grid)
library(gridExtra)
library(cowplot)

********************************************************
Note: As of version 1.0.0, cowplot does not change the
  default ggplot2 theme anymore. To recover the previous
  behavior, execute:
  theme_set(theme_cowplot())
********************************************************
library(ggpubr)

Attaching package: 'ggpubr'
The following object is masked from 'package:cowplot':

    get_legend

1. Tidy up data

Remove peaks from Chr X, Y, Mt

files = list.files(path = "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT", pattern = ".bed", full.names = T)
mx = lapply(files, read.delim, header=F)

for(i in 1:length(mx)){
 mx[[i]] = mx[[i]][!grepl("Y",mx[[i]]$V1),]
 mx[[i]] = mx[[i]][!grepl("X",mx[[i]]$V1),]
 mx[[i]] = mx[[i]][!grepl("MT",mx[[i]]$V1),]
  #write.table(mx[[i]], file = paste0(files[[i]], ".noXYMT.bed"), col.names = F, row.names = F, sep = "\t")
}

Tidy peaks


#!/bin/bash

for BED in *.noXYMT.bed ; do

  cat $BED | sed 's/\"//' |  sed 's/\"//' | sed 's/\"//' | sed 's/\"//' > $BED.tidy.bed

done
cat: *.noXYMT.bed: No such file or directory

2. Homer Peak Annotation

Annotate peaks

#!/bin/bash
set -x

REF=/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/refgenome/Homo_sapiens.GRCh38.96.gtf

#PATH=$PATH:/group/card2/Evangelyn_Sim/NGS/app/homer/.//bin/

for BED in *.tidy.bed ; do

  OUT=$BED.homeranno.txt
  mkdir go/$BED
  annotatePeaks.pl $BED hg38 -gtf $REF -go go/$BED -annStats $BED.stats.txt > $OUT

done

Make plots

files = list.files(path = "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT", pattern = ".stats.txt", full.names = T)
mx = lapply(files, read.delim, header=T)

files
[1] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/adult.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt"  
[2] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/adultF.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt" 
[3] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/adultM.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt" 
[4] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/fetal.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt"  
[5] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/ipsccm.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt" 
[6] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/ipsccmF.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt"
[7] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/ipsccmM.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt"
[8] "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT/young.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.stats.txt"  
for(i in 1:length(mx)){
  mxFDR = mx[[i]][c(1:5),]
  #write.table(mxFDR, 
  #            file = paste0(gsub("./|.txt","",files[[i]]),".tidy.txt"),
  #            col.names = T, row.names = F, sep = "\t")
}



#PieDonut
files1 = list.files(path = "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homerpkstats_noXYMT", pattern = ".stats.tidy.txt", full.names = T)
mx1 = lapply(files1, read.delim, header=T)

for(j in 1:length(mx1)){
  mx1[[j]]$totalpeaks = sum(mx1[[j]]$Number.of.peaks)
  mx1[[j]]$percentage = round(mx1[[j]]$Number.of.peaks/sum(mx1[[j]]$Number.of.peaks) *100, digits = 2)
  #write.table(mx1[[j]], 
  #            file = paste0(gsub("./|.txt","",files[[j]]),".tidy.txt"),
  #            col.names = T, row.names = F, sep = "\t")
  
  mx[[j]]=PieDonut(mx1[[j]],aes(Annotation,count=Number.of.peaks),r0=0.5,start=3*pi/2,labelpositionThreshold=0.1)
}
multi = arrangeGrob(mx[[5]],mx[[4]],mx[[8]],mx[[1]],
                    ncol = 2, nrow = 2)
plot = as_ggplot(multi)

plot

3. Homer Transcription Factor Binding Motif Enrichment

#!/bin/bash

set -x

CWD=/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homertf_noXYMT
echo $CWD
REF=/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/refgenome/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa

cd $CWD

#Call Homer enriched motifs with default background, then with ATAC peak BG
for FG in *.tidy.bed ; do
  cd $CWD
  BED=$CWD/$FG

  #find enriched motifs
  findMotifsGenome.pl $FG $REF $FG.df.out -p 10 -keepFiles

  cd $FG.df.out
  rm -rf instances
 mkdir instances
  cd instances

  for i in ../homerResults/motif*.motif ; do
    BASE=`basename $i`
    mkdir $BASE
    findMotifsGenome.pl $BED $REF $BASE -find $i | sort -k6gr > $BASE/$BASE &
  done

  cd $CWD
done
wait


for MOTIF in `find . | grep instances | grep motif$` ; do
  OUT=$MOTIF.bed
  awk '{print $1,$2,length($3)}' $MOTIF \
  | grep -v PositionID | cut -d '_' -f2 \
 | tr ':' '\t' | sed 's/-/\t/' \
 | awk '{printf "%s\t%.0f\t%.0f\n", $1,(($2+$3)/2)+$4-10,(($2+$3)/2)+$4+10}' > $OUT
done

Make plots

PATH = "/group/card2/Evangelyn_Sim/Transcriptome_chromatin_human/Sequencing_ATAC_RNA/20180726_hATACseq_MF/seqaln/rename/rmdup/merge/combine/mapq30/group/macs/homertf_noXYMT/"

ipsccm = read.delim(paste0(PATH,"ipsccm.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
fetal = read.delim(paste0(PATH,"fetal.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
young = read.delim(paste0(PATH,"young.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
adult = read.delim(paste0(PATH,"adult.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
ipsccmF = read.delim(paste0(PATH,"ipsccmF.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
ipsccmM = read.delim(paste0(PATH,"ipsccmM.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
adultF = read.delim(paste0(PATH,"adultF.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)
adultM = read.delim(paste0(PATH,"adultM.mapq30.mg.bam_macs_peaks.bed.rmBL.bed.noXYMT.bed.tidy.bed.df.out/knownResults.txt"), header = T)

ipsccm$`-Log.P.value` = -(ipsccm$Log.P.value)
fetal$`-Log.P.value` = -(fetal$Log.P.value)
young$`-Log.P.value` = -(young$Log.P.value)
adult$`-Log.P.value` = -(adult$Log.P.value)
ipsccmF$`-Log.P.value` = -(ipsccmF$Log.P.value)
ipsccmM$`-Log.P.value` = -(ipsccmM$Log.P.value)
adultF$`-Log.P.value` = -(adultF$Log.P.value)
adultM$`-Log.P.value` = -(adultM$Log.P.value)

ipsccm = ipsccm[,c(1,10)]
fetal = fetal[,c(1,10)]
young = young[,c(1,10)]
adult = adult[,c(1,10)]
ipsccmF = ipsccmF[,c(1,10)]
ipsccmM = ipsccmM[,c(1,10)]
adultF = adultF[,c(1,10)]
adultM = adultM[,c(1,10)]

ipsccm$Group = "ipsccm"
fetal$Group =  "fetal"
young$Group =  "young"
adult$Group =  "adult"
ipsccmF$Group =  "ipsccmF"
ipsccmM$Group =  "ipsccmM"
adultF$Group =  "adultF"
adultM$Group =  "adultM"

ipsccmsl = ipsccm[c(1:50),]
fetalsl = fetal[c(1:50),]
youngsl = young[c(1:50),]
adultsl = adult[c(1:50),]
ipsccmFsl = ipsccmF[c(1:50),]
ipsccmMsl = ipsccmM[c(1:50),]
adultFsl = adultF[c(1:50),]
adultMsl = adultM[c(1:50),]

motifssl = rbind(ipsccmsl, fetalsl, youngsl, adultsl, ipsccmFsl, ipsccmMsl, adultFsl, adultMsl)
motifssl = motifssl[!duplicated(motifssl$Motif.Name),]
motifssl = motifssl[order(motifssl$Motif.Name),]
#write.table(motifssl, file = "top50motif_from_8gps.txt", col.names = T, row.names = F, sep = "\t")

mt2 = read.delim(paste0(PATH,"top50motif_from_8gps_selected_final.txt"), header = T)
mt2
                                                         Motif.Name X.NAME.
1                  AP-1(bZIP)/ThioMac-PU.1-ChIP-Seq(GSE21512)/Homer 453.000
2                 AR-halfsite(NR)/LNCaP-AR-ChIP-Seq(GSE27824)/Homer 103.600
3                         ARE(NR)/LNCAP-AR-ChIP-Seq(GSE27824)/Homer  91.810
4                      Atf3(bZIP)/GBM-ATF3-ChIP-Seq(GSE33912)/Homer 530.700
5                   Bach1(bZIP)/K562-Bach1-ChIP-Seq(GSE31477)/Homer  81.940
6                 Bach2(bZIP)/OCILy7-Bach2-ChIP-Seq(GSE44420)/Homer 306.300
7                     BATF(bZIP)/Th17-BATF-ChIP-Seq(GSE39756)/Homer 502.200
8                     BORIS(Zf)/K562-CTCFL-ChIP-Seq(GSE32465)/Homer 142.300
9                  CTCF(Zf)/CD4+-CTCF-ChIP-Seq(Barski_et_al.)/Homer 386.600
10                     E2F1(E2F)/Hela-E2F1-ChIP-Seq(GSE22478)/Homer  28.530
11                      E2F3(E2F)/MEF-E2F3-ChIP-Seq(GSE71376)/Homer  52.060
12                     E2F4(E2F)/K562-E2F4-ChIP-Seq(GSE31477)/Homer  77.640
13                     E2F6(E2F)/Hela-E2F6-ChIP-Seq(GSE31477)/Homer  54.770
14                     E2F7(E2F)/Hela-E2F7-ChIP-Seq(GSE32673)/Homer  38.250
15                     Elk1(ETS)/Hela-Elk1-ChIP-Seq(GSE31477)/Homer 117.900
16                     Elk4(ETS)/Hela-Elk4-ChIP-Seq(GSE31477)/Homer 133.000
17                 Fosl2(bZIP)/3T3L1-Fosl2-ChIP-Seq(GSE56872)/Homer 524.200
18             Foxa2(Forkhead)/Liver-Foxa2-ChIP-Seq(GSE25694)/Homer  65.800
19              Foxh1(Forkhead)/hESC-FOXH1-ChIP-Seq(GSE29422)/Homer  15.750
20                   Fra1(bZIP)/BT549-Fra1-ChIP-Seq(GSE46166)/Homer 598.400
21                Fra2(bZIP)/Striatum-Fra2-ChIP-Seq(GSE43429)/Homer 575.900
22                    Gata1(Zf)/K562-GATA1-ChIP-Seq(GSE18829)/Homer 103.500
23                    Gata2(Zf)/K562-GATA2-ChIP-Seq(GSE18829)/Homer  88.290
24                   GATA3(Zf)/iTreg-Gata3-ChIP-Seq(GSE20898)/Homer 125.400
25                   Gata4(Zf)/Heart-Gata4-ChIP-Seq(GSE35151)/Homer 115.500
26                   Gata6(Zf)/HUG1N-GATA6-ChIP-Seq(GSE51936)/Homer 135.200
27                     GRE(NR),IR3/A549-GR-ChIP-Seq(GSE32465)/Homer 102.500
28             GRE(NR),IR3/RAW264.7-GRE-ChIP-Seq(Unpublished)/Homer  84.770
29   Hoxa9(Homeobox)/ChickenMSG-Hoxa9.Flag-ChIP-Seq(GSE86088)/Homer 102.800
30 Hoxd10(Homeobox)/ChickenMSG-Hoxd10.Flag-ChIP-Seq(GSE86088)/Homer  78.260
31                 Jun-AP1(bZIP)/K562-cJun-ChIP-Seq(GSE31477)/Homer 489.300
32          JunB(bZIP)/DendriticCells-Junb-ChIP-Seq(GSE36099)/Homer 546.800
33              KLF10(Zf)/HEK293-KLF10.GFP-ChIP-Seq(GSE58341)/Homer  95.830
34              KLF14(Zf)/HEK293-KLF14.GFP-ChIP-Seq(GSE58341)/Homer 247.300
35                       KLF3(Zf)/MEF-Klf3-ChIP-Seq(GSE44748)/Homer 450.600
36                       Klf4(Zf)/mES-Klf4-ChIP-Seq(GSE11431)/Homer 100.600
37                      KLF5(Zf)/LoVo-KLF5-ChIP-Seq(GSE49402)/Homer 285.000
38                      KLF6(Zf)/PDAC-KLF6-ChIP-Seq(GSE64557)/Homer 281.000
39                       Klf9(Zf)/GBM-Klf9-ChIP-Seq(GSE62211)/Homer 296.600
40                Mef2d(MADS)/Retina-Mef2d-ChIP-Seq(GSE61391)/Homer 164.800
41                 MyoD(bHLH)/Myotube-MyoD-ChIP-Seq(GSE21614)/Homer 136.000
42                   MyoG(bHLH)/C2C12-MyoG-ChIP-Seq(GSE36024)/Homer  90.970
43          NF1-halfsite(CTF)/LNCaP-NF1-ChIP-Seq(Unpublished)/Homer 245.300
44                   NF1(CTF)/LNCAP-NF1-ChIP-Seq(Unpublished)/Homer 168.800
45          OCT:OCT(POU,Homeobox)/NPC-OCT6-ChIP-Seq(GSE43916)/Homer  95.090
46   Oct4:Sox17(POU,Homeobox,HMG)/F9-Sox17-ChIP-Seq(GSE44553)/Homer   7.823
47                          PR(NR)/T47D-PR-ChIP-Seq(GSE31130)/Homer  72.210
48                    Smad3(MAD)/NPC-Smad3-ChIP-Seq(GSE36673)/Homer 117.600
49                Tbx20(T-box)/Heart-Tbx20-ChIP-Seq(GSE29636)/Homer  89.720
50                   TEAD1(TEAD)/HepG2-TEAD1-ChIP-Seq(Encode)/Homer 126.500
51                    TEAD3(TEA)/HepG2-TEAD3-ChIP-Seq(Encode)/Homer  48.250
52             TEAD4(TEA)/Tropoblast-Tead4-ChIP-Seq(GSE37350)/Homer  78.750
53             Tbx5(T-box)/HL1-Tbx5.biotin-ChIP-Seq(GSE21529)/Homer   0.000
54                     Erra(NR)/HepG2-Erra-ChIP-Seq(GSE31477)/Homer   0.000
55                    THRb(NR)/Liver-NR1A2-ChIP-Seq(GSE52613)/Homer   0.000
56                     THRa(NR)/C17.2-THRa-ChIP-Seq(GSE38347)/Homer   0.000
57            Mef2a(MADS)/HL1-Mef2a.biotin-ChIP-Seq(GSE21529)/Homer   0.000
58              FOXM1(Forkhead)/MCF7-FOXM1-ChIP-Seq(GSE72977)/Homer   0.000
59             Mef2b(MADS)/HEK293-Mef2b.V5-ChIP-Seq(GSE67450)/Homer   0.000
60               Mef2c(MADS)/GM12878-Mef2c-ChIP-Seq(GSE32465)/Homer   0.000
61      Nkx2.5(Homeobox)/HL1-Nkx2.5.biotin-ChIP-Seq(GSE21529)/Homer   0.000
62             Oct4(POU,Homeobox)/mES-Oct4-ChIP-Seq(GSE11431)/Homer   0.000
63               Nanog(Homeobox)/mES-Nanog-ChIP-Seq(GSE11724)/Homer   0.000
64                      Sox2(HMG)/mES-Sox2-ChIP-Seq(GSE11431)/Homer   0.000
65                     Esrrb(NR)/mES-Esrrb-ChIP-Seq(GSE11431)/Homer   0.000
66                 PGR(NR)/EndoStromal-PGR-ChIP-Seq(GSE69539)/Homer   0.000
67            TEAD(TEA)/Fibroblast-PU.1-ChIP-Seq(Unpublished)/Homer   0.000
68              Hand2(bHLH)/Mesoderm-Hand2-ChIP-Seq(GSE61475)/Homer   0.000
     Group
1    young
2    young
3   adultF
4    young
5    young
6    young
7    young
8    fetal
9    fetal
10  ipsccm
11  ipsccm
12  ipsccm
13  ipsccm
14  ipsccm
15  ipsccm
16  ipsccm
17   young
18   fetal
19 ipsccmF
20   young
21   young
22   fetal
23   fetal
24   young
25   fetal
26   fetal
27   adult
28   adult
29   fetal
30   fetal
31   young
32   young
33  ipsccm
34  ipsccm
35  ipsccm
36  ipsccm
37  ipsccm
38  ipsccm
39  ipsccm
40   fetal
41   fetal
42   fetal
43   fetal
44   fetal
45 ipsccmF
46 ipsccmF
47   young
48   fetal
49   fetal
50   fetal
51  ipsccm
52   fetal
53   added
54   added
55   added
56   added
57   added
58   added
59   added
60   added
61   added
62   added
63   added
64   added
65   added
66   added
67   added
68   added
  ipsccm_mt2_scale = ipsccm[ipsccm$Motif.Name %in% as.factor(mt2$Motif.Name),]
  fetal_mt2_scale = fetal[fetal$Motif.Name %in% as.factor(mt2$Motif.Name),]
  young_mt2_scale = young[young$Motif.Name %in% as.factor(mt2$Motif.Name),]
  adult_mt2_scale = adult[adult$Motif.Name %in% as.factor(mt2$Motif.Name),]
  
  ipsccm_mt2_scale$scaleP = scale(ipsccm_mt2_scale$`-Log.P.value`)
  fetal_mt2_scale$scaleP = scale(fetal_mt2_scale$`-Log.P.value`)
  young_mt2_scale$scaleP = scale(young_mt2_scale$`-Log.P.value`)
  adult_mt2_scale$scaleP = scale(adult_mt2_scale$`-Log.P.value`)
  
  mmg4_mt2_scale = rbind(ipsccm_mt2_scale, fetal_mt2_scale, young_mt2_scale, adult_mt2_scale)
  mmg4_mt2_scale$Motif.Name = gsub("GRE[(]NR[)][,]IR3[/]RAW","GRE2(NR),IR3/RAW",mmg4_mt2_scale$Motif.Name)
  mmg4_mt2_scale$Motifs = gsub("[(].*", "", mmg4_mt2_scale$Motif.Name)
  mmg4_mt2_scale$Motifs = gsub("/.*", "", mmg4_mt2_scale$Motifs)
  mmg4_mt2_scale$Motifs = toupper(mmg4_mt2_scale$Motifs)
  mmg4_mt2_scale$Group = factor(mmg4_mt2_scale$Group, levels = c("ipsccm", "fetal","young","adult"))
  mmg4_mt2_scale$Motifs = factor(mmg4_mt2_scale$Motifs, 
                           levels = c("JUNB","JUN-AP1","FRA2","FRA1","FOSL2","BATF","BACH2","BACH1","ATF3","AP-1", 
                                      "SOX2","OCT4:SOX17","OCT:OCT","OCT4","NANOG",
                                      "KLF14","KLF10","KLF9","KLF6","KLF5","KLF4","KLF3","CTCF","BORIS",
                                      "ELK4","ELK1",
                                      "NF1-HALFSITE","NF1","HOXD10","HOXA9","FOXH1","FOXA2",
                                      "ESRRB","ERRA","THRB","THRA","PR","PGR","GRE2","GRE","ARE","AR-HALFSITE",
                                      "TEAD4", "TEAD3","TEAD1","TEAD","HAND2","FOXM1","E2F7","E2F6","E2F4","E2F3","E2F1",
                                      "GATA6","GATA4","GATA3","GATA2","GATA1",
                                      "TBX20","TBX5","SMAD3","NKX2.5","MYOG","MYOD","MEF2D","MEF2C","MEF2B","MEF2A"))
  ggplot(mmg4_mt2_scale,aes_(x= ~Group , y= ~Motifs, size=~scaleP))+
    geom_point(aes(color = `-Log.P.value`)) +
    theme_classic()+
    theme(axis.text.x = element_text(angle = 90, hjust = 1))+
    scale_color_gradient2(midpoint = mean(mmg4_mt2_scale$`-Log.P.value`)+250, low = "yellow", mid = "orange",
                          high = "red", space = "Lab")


sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /hpc/software/installed/R/3.6.1/lib64/R/lib/libRblas.so
LAPACK: /hpc/software/installed/R/3.6.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] ggpubr_0.4.0    cowplot_1.0.0   gridExtra_2.3   extrafont_0.17 
[5] waffle_0.7.0    webr_0.1.5      moonBook_0.2.3  ggplot2_3.3.2  
[9] workflowr_1.6.2

loaded via a namespace (and not attached):
  [1] nlme_3.1-150       fs_1.5.0           RColorBrewer_1.1-2
  [4] insight_0.9.0      rprojroot_1.3-2    tools_3.6.1       
  [7] backports_1.1.10   R6_2.5.0           DT_0.14           
 [10] sjlabelled_1.1.6   colorspace_1.4-1   withr_2.3.0       
 [13] tidyselect_1.1.0   mnormt_1.5-6       extrafontdb_1.0   
 [16] curl_4.3           compiler_3.6.1     git2r_0.27.1      
 [19] flextable_0.5.10   xml2_1.3.2         officer_0.3.12    
 [22] labeling_0.4.2     scales_1.1.1       lmtest_0.9-38     
 [25] psych_1.9.12.31    readr_1.4.0        systemfonts_0.2.3 
 [28] stringr_1.4.0      digest_0.6.27      foreign_0.8-71    
 [31] editData_0.1.2     rmarkdown_2.5      rio_0.5.16        
 [34] base64enc_0.1-3    pkgconfig_2.0.3    htmltools_0.5.0   
 [37] fastmap_1.0.1      rvg_0.2.5          htmlwidgets_1.5.2 
 [40] rlang_0.4.7        readxl_1.3.1       rstudioapi_0.11   
 [43] shiny_1.5.0        farver_2.0.3       generics_0.1.0    
 [46] zoo_1.8-8          jsonlite_1.7.0     dplyr_1.0.2       
 [49] zip_2.1.1          car_3.0-10         magrittr_1.5      
 [52] Rcpp_1.0.5         munsell_0.5.0      abind_1.4-5       
 [55] gdtools_0.2.2      lifecycle_0.2.0    stringi_1.5.3     
 [58] whisker_0.4        yaml_2.2.1         carData_3.0-4     
 [61] MASS_7.3-51.6      parallel_3.6.1     promises_1.1.1    
 [64] sjmisc_2.8.5       forcats_0.5.0      crayon_1.3.4      
 [67] miniUI_0.1.1.1     lattice_0.20-41    haven_2.3.1       
 [70] hms_0.5.3          knitr_1.30         pillar_1.4.6      
 [73] uuid_0.1-4         ggsignif_0.6.0     glue_1.4.2        
 [76] evaluate_0.14      data.table_1.13.2  vcd_1.4-8         
 [79] vctrs_0.3.2        tweenr_1.0.1       httpuv_1.5.4      
 [82] Rttf2pt1_1.3.8     cellranger_1.1.0   gtable_0.3.0      
 [85] purrr_0.3.4        polyclip_1.10-0    tidyr_1.1.2       
 [88] xfun_0.18          ggforce_0.3.2      openxlsx_4.2.3    
 [91] mime_0.9           xtable_1.8-4       broom_0.7.0       
 [94] rstatix_0.6.0      later_1.1.0.1      tibble_3.0.3      
 [97] shinyWidgets_0.5.4 rrtable_0.2.1      ellipsis_0.3.1    
[100] ztable_0.2.0       devEMF_3.8