High Throughput Biology
BioIDE Platform    
Data Processing Tools    
Visualization Tools    
  Data Processing Tools  
Detailed Descriptions

BioIDE contains many convinient data processing tools that are easy to understand and simple to use. Because a user can string many such tools together in every imaginable way s/he wants, the end result can easily exceed the expectation. Below are a few examples of BioIDE's data processing tools.


The MissingDataFilter removes markers or individuals that have too much missing genotype data. A user can choose whether to filter on markers or on individuals. Furthermore, the user can choose the percentage threshold above which markers or individuals will be removed. P


The QualityScoreFilter is only applicable to data that contain quality information, such as data generated from the Affymetrix SNP platforms and the Illumina SNP platforms. It allows a user to specify a threshold on gentype calling quality and filter out those genotypes that have the quality score above the threshold.


Sometimes additional information needs to be added to a dataset. For example, data from the Affymetrix platform usually only contains individual IDs but without any other phenotype data for those individuals. To conduct an association analysis, disease status is needed. For family based analysis, family information is needed. In the past scripts have been written to append phenotype information with genotype data. In BioIDE, one can use the LoadPhenotype component to load any additional phenotype information to a dataset in BioIDE. It only takes two parameters to specify the directory and the name of the file containing the additional information. With this component and other data processing components, a user can also do any data processing tasks s/he can think of.


For certain haplotype-based analysis, it might be useful to convert each LD region into a pseudo marker. After the conversion, the new markers can be used in association tests. LDBlockToMarkerTransformer does just that. Given the LD structure for a dataset, this component will resolve phases for each individual for each LD region and assign one marker per LD region. Each observed haplotype for a LD region is treated as an allele.


As the name suggested, the RandomizeCaseControl component can randomlly select a certain number of cases and controls from a merged case/control population. Different randomization strategies can be applied. The resulting dataset can then be fed into a workflow for permutation test.

previous pageOverview