SINE issues [SOLVED]

Go down

SINE issues [SOLVED]

Post by Clément Goubert on Tue Jan 05, 2016 11:25 am

Here is a summary of an e-mail exchange about SINE annotation issues running dnaPipeTE:

We were trying to use dnaPipeTE to annotate a genome, but we have encountered some issues with the annotation.
I am trying to use pipe line to annotate an elephant genome (sampled from the raw reads), the pipe line runs successfully and it gives me all the graphs and out files, however when I take a look at the pie chart, SINEs are not shown.
I looked for the presence of SINEs in both the trinity.fasta and my raw read and SINEs were found in both.
Could there be a reason why SINEs are not showing up?

If you can think of anything that can be causing this, please let me know.
Thank you very much for your help, it would be greatly appreciated.

Do you have any expectations about the number and also the relative age of the SINE you are looking for ? This could help, because either they are not numerous enough to be properly assembled with the low coverage sampling done by dnaPipeTE, or they could be also so old that the different copies are not enough similar to be assembled together et thus discovered by dnaPipeTE.
For example, when I tried the pipeline on the Human genome, I just found half of the expected TE because they are very old, that is usual in mammals, so this could be possible with elephant. However, I still find some SINEs in the Human genome, so, it could be something else.

Do you use a custom or the normal Repeat Masker library ? If personal library, check that the SINEs are well labelled as expected by RM to annotate them.

When you said that you didn’t find the SINEs in Trinity.fasta or in the raw reads, how did you proceed exactly ? By raw reads, do you mean all your reads or only those sampled and used by dnaPipeTE ?

ok so...
1. I am using a personal library, I can send it to you if you'd like to see it.
2. aAbout the age of the SINE we expect them to be a little bit of both (old and not so old)
3. So I blasted the trinity.fasta (created by dnaPipeTE) against my library... and I GOT hits that looked like SINES ( I can send you that aswell)
2. I did the same thing with the library vs raw reads (blasted my library against my reads to see if indeed SINEs were there and they were.

so the SINEs are in the trinity.fasta file that the pipe creats but they are not shown in the pie chart =/

Ok! So dnaPipeTE finds and assembles them but it does not identify them properly.
Send me your library, I’ll check if it seems good.
In parallel, check the file « one hit per trinity config… .out » that is in the annotation folder. Here, you have for each config the best repeat masker hit. You should normally see for the contigs that match you SINE library (when you did it manually), to what they hit in dnaPipeTE. Maybe some repeats of your library, that are not SINEs have a better hit on them… If you don’t find your SINEs contigs in this file, it means that they are not annotated, and so this would point toward an error in the library nomenclature.

Tell me!

We have looked the RepeatMasker annotations file and RepeatMasker is finding a large number of SINEs. Also in that same annotation directory there is a SINE_annoted.fasta file where we can see ~700 sequences that were identified as SINEs.

We have also thought it may be a header issue in the library ad experimented a little bit to find out, but we have only just started. If you can provide some insight that would be great.

I have uploaded what I think are the relevant files to a dropbox folder and the link is below.

I found the problem: your SINEs are annotated

in your fasta library, and they are

in mine.

dnaPipeTE uses the name just next the # to quantify the repeat classes in the piechart (the list of handled repeat classes are in the file « pieColors » of the dnaPipeTE folder).

Also, for the TE ages graph (landscapes), it needs to be in this format to be considered. However, I never tested this analysis with custom library, and I don’t know if it will work properly since dnaPipeTE tries to match the #SINE/something to the list that is in the file « colors_landscape » (in the dnaPipeTE folder). If you don’t find your SINEs in the landscape graph, try to add them manually in the « colors_landscape » file, and provide a color (format style: #B966F5 for example) for it (you can use the same color for different TEs).
The format of this file is per line:
repeat_type/superfamily repeat_type color

e.g. :
SINE/Mermaid SINE #B966F5

In addition, if you want to add some extra classes of TE in the piechart (some that could be absent in the pieColor file) it is still possible but tricky, we have to add some lines in the code, but if you are interested, I can have a look on it.
Clément Goubert

Posts : 30
Join date : 2016-01-05
Age : 30

View user profile

Back to top Go down

Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum