Dear All,

I have an RNAseq dataset I am using to study alternative splicing.

From the RNAseq data I used a package to calculate Exon-to-Exon "Link Fractions". This measure is a proportion calculated by taking the number of times that ExonA links to ExonB, divided by the total number of links ExonA makes with any other Exon.

For example, if ExonA links to ExonB 5 times according to the RNA reads in my dataset, and links to ExonC 15 times, then the "Link Fraction" for the Link ExonA-ExonB would be 0.25 and for the Link ExonA-ExonC would be 0.75.

I want to Gaussian normalise these Link Fractions before taking them to the next stage of my analysis, and I am using the "rnatransform" function of GenABEL to do this.

However, when I run this programme, some of my values become negative. The normalisation has worked perfectly in every other respect, and if you look at the data points you can see how they correspond to one another.

Original_Data[1:10,6]

# [1] 0.954334000 0.022617100 0.000000000 0.020236400 0.000000000 0.000902863

# [7] 0.000978521 0.000930751 0.474414000 0.484839000

Normalised_Data[1:10]

# [1] 1.7954745 0.1553292 -1.1448015 0.1333943 -1.1448015 -0.4631316

# [7] -0.4499622 -0.4581991 1.0348768 1.0663013

However my issue is that one would never expect to get a negative value when calculating a Link Fraction. The lowest you can get is Zero links between any given exon and another exon. You can't have a negative number of links.

My question therefore is whether there are any options that can be supplied to rnatransform or if there is a way I can write the formula which could stipulate that the distribution must be bounded at zero (or a value of your choosing)? Or if there was a way to specify what the median value of the new normal distribution should be? Or could anyone recommend a different normalisation package that does have this capability? (I would ideally like the median to remain the same and the data to simply be normalised around it - but I appreciate that might not be possible or mathematically valid and not be how the algorithm works)

Say my lowest negative value after normalisation was -1.14 (as you can see from the data, an original value of 0.0 Link Fraction was converted to -1.14), a possible fudge of a solution could be to just add 1.14 to every value to translate it along the x-axis? I feel like there could be a reason why this would not be considered statistically valid though?

Many thanks!

## Normalisation via the rnatransform function

**Forum rules**

Please remember not to post any sensitive data on this public forum.

The first few posts of newly registered users will be moderated in order to filter out any spammers.

When get a solution to the problem you posted, please change the topic name (e.g. from "how to ..." to "[SOLVED] how to ..."). This will make it easier for the community to follow the posts yet to be attended.

### Who is online

Users browsing this forum: Yahoo [Bot] and 3 guests