How to...



  • How to get the path for the folder containing a bash script within the bash script:

    # Get the directory where the script resides
    SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
    echo $SCRIPTDIR
    


  • How to convert a minimap2 alignment to gff/gtf:
    https://github.com/lh3/minimap2/issues/455
    https://github.com/lh3/minimap2/files/9591008/bam2gff_fixGffread.zip
    bam2gff_fixGffread.zip

    minimap2 -t 10 -ax splice:hq -uf ref.fa cdna.fa |/Bio/bin/samtools-1.14 view -b > minimap2.tr.bam
    perl bam2gff.pl -b minimap2.tr.bam -o minimap2.tr.gff -s /Bio/bin/samtools-1.14 gffread minimap2.tr.gff -T -o minimap2.tr.gtf
    perl fixGffread.pl -i minimap2.tr.gtf -o minimap2.tr.fix.gtf
    

    Alternative method:

    #Align sequences and convert to BAM
    minimap2 -ax splice --cs target.fa query.fa | samtools sort -O BAM - > alignments.bam
    #Convert to BED12 using BEDtools
    bedtools bamtobed -bed12 -i alignments.bam > alignments.bed 
    #Convert to genePred using UCSC tools
    bedToGenePred alignments.bed alignments.genepred
    #Convert to GTF2 using UCSC tools. genePredToGtf has additional options that might be useful in specific use cases.
    genePredToGtf "file" alignments.genepred alignments.gtf
    


  • How to do a dotplot with minimap2:

    minimap2 -DP ref.fa query.fa|miniasm/minidot - > dot.eps
    


  • How to interpret genome dot plots (#dotplots #genomic).

    interpretation of dotplots.jpg



  • How to clone a public GitHub repository with VS Code and push it to a private GitHub repository.

    • Make sure Git is installed
    • Open VS Code and use the source control icon on the far left to clone a git repository to a local folder
      503ebed1-4ee0-4739-9e75-2888814b4ddb-image.png

    Open a terminal in VS Code (View>terminal)

    PS C:\Users\github> cd sarek
    PS C:\Users\github\sarek> git remote remove origin
    PS C:\Users\github\sarek> git remote add origin https://github.com/ink-blot/sarek.git
    PS C:\Users\github\sarek> git branch
    * master
    PS C:\Users\github\sarek> git push -u origin master
    

    If it doesn't promptly start pushing, an authorisation screen should (eventually) appear (it may take a few minutes).

    92b031e2-3d77-414f-a4f3-1221b9f8afb6-image.png

    The token method is preferred:

    • Go to your GitHub account settings: GitHub Token Settings.
    • Click Generate new token (classic).
    • Select the scopes you need (e.g., repo for private repositories).
    • Generate the token and copy it (you won’t be able to see it again later).
    • In the GitHub sign-in window, switch to the Token tab.
    • Paste the generated token into the input field and confirm.

    Optional steps if you want to fetch updates from the original nf-core/sarek repository in the future, add it as an upstream remote:

    
    PS C:\Users\github\sarek> git remote add upstream https://github.com/nf-core/sarek.git
    PS C:\Users\github\sarek> git fetch upstream
    PS C:\Users\github\sarek> git merge upstream/main
    
    


  • How to get rid of "WARNING : No mitochondrion chromosome found" in SnpEff:

    Prefix the contig name with MT.



  • How to collect all files from one channel and associate/combine them with elements of another channel in NextFlow:

    Example Input channels:

    bam_for_collect_ch2:
    [ file('73-50_L002.bam') ]
    [ file('73-50_L001.bam') ]

    interval_vcfs_3:
    [ file('73-50_L001_raw_variants_1.vcf.gz') ]
    [ file('73-50_L001_raw_variants_2.vcf.gz') ]
    [ file('73-50_L002_raw_variants_1.vcf.gz') ]
    [ file('73-50_L002_raw_variants_2.vcf.gz') ]

    Process code:

    input:
    set val(pair_id), val(all_vcf) from bam_for_collect_ch2.map({ file -> file.baseName }).combine(interval_vcfs_3.collect().map({ file -> file.baseName }).toList())
    

    Example output:

    [ '73-50_L002', ['73-50_L002_raw_variants_1.vcf.gz', '73-50_L002_raw_variants_2.vcf.gz'] ]
    [ '73-50_L001', ['73-50_L001_raw_variants_1.vcf.gz', '73-50_L001_raw_variants_2.vcf.gz'] ]



  • How to collect all files related by prefix in NextFlow:

    Example input channel:

    interval_bams_ch:
    [ file('73-50_L001_raw_variants_1.bam') ]
    [ file('73-50_L001_raw_variants_2.bam') ]
    [ file('73-50_L002_raw_variants_1.bam') ]
    [ file('73-50_L002_raw_variants_2.bam') ]

    Process code:

    bam_name_parts_ch = interval_bams_ch.map { file ->
        def name = file.baseName.replaceFirst(/_raw_variants_.*/, '')
        tuple(name, file)
    }.groupTuple()
    

    Example output:

    [ '73-50_L001', [file('73-50_L001_raw_variants_1.bam'), file('73-50_L001_raw_variants_2.bam')] ]
    [ '73-50_L002', [file('73-50_L002_raw_variants_1.bam'), file('73-50_L002_raw_variants_2.bam')] ]



  • How to conditionally choose from two channels in NextFlow:

    grouped_interval_vcf_ch=(params.splitIntervalOverlapLength && params.splitIntervalOverlapLength.toInteger() > 0
    	? trimmed_vcf_ch
    	: interval_vcfs_3
    )
    


  • How to re-use a file channel and indeterminate number of times (eg combining with a channle with an unknkown number of elements in NextFlow DSL1:

    process collectGVCF {
      publishDir "${params.combined_1_vcf}", mode: 'copy'
    
      output:
        set val(pair_id), val(round), file("${pair_id}_raw_variants_${round}.vcf.gz") into collected_vcf
    }
    
    // Later after splitting:
    sample_files
      .map { it -> it.getBaseName() }
      .combine(Channel.fromPath("${params.combined_1_vcf}/*_raw_variants_1.vcf.gz"))
      .set { reuse_pairs }
    

Log in to reply
 

Powered by ShareZomics