The following code block is simply a dump from my bash history. My instructors, Vinodh and Aishwarya were excellent tutors in guiding us step by step through the generalities of MutSig analyses. This specific script is written with the goal of counting tumor recurrences (n=47) from a TCGA sample data set (n = 97k). You might find some errors in processing the steps, and that will either be due to my beginners knowledge of Linux syntax or restricted access to any data. Overall, this is achieved by the following steps:
1) Compile all relevant clinical, histological, and mapping data into one table/txt file for processing. These steps and codes will be written in a subsequent post and were achieved through R/Rstudio programming.
2) With our final table set to the proper working directory, execute the "head" command piped to the "grep" command to count the number of diseased from disease status column:
3) Determine which column contains our tumor recurrence status, then execute the 'cut' command on the proper column (in this case, 39):
4) We then execute the 'awk' command and direct it to a txt file for further processing. This will insure that all missing values are dropped:
5) Run counts on the .txt file and .maf file in place, then check the columns of the .maf file:
6) Next, we execute another 'cut' command on Column 16 for the unique samples, then again on 5 and 6 for chromosomal locations:
7) Provided that your machine has access to python, you can "tell" python to match the two files and spit out a data set for further analysis in one line (WARNING: make sure you do not make the same mistake I did, which if you will notice below, is to not write a .txt file but another .maf file, this will run an error in further processing):
8) 'Cut' the 16th column again for tumor recurrence:
9) Now move the output file to a new file name, and/or your own personal directory for ease of editing, then open the bash script that it will run on the cluster, edit it to your email address and source it to your new file name. Also, note the space in the original bash script and retain it- simply paste the name of your own file (here "recurrent_samples.maf" over whatever is written at the end of the source file path):
10) Check your job and output for any errors!
11) Full script (including messy commands) below: