• Create data frame called Annotation with a column of gene names ("Gene_1", "Gene_2", "Gene_3","Gene_4","Gene_5"), ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens007", "Ens010"), pathway information ("Glycolysis", "TGFb", "Glycolysis", "TGFb", "Glycolysis") and gene lengths (100, 3000, 200, 1000,1200). • Create data frame called Sample1 with ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens010") and expression (1000, 3000, 10000,5000) Create data frame called Sample2 with ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens007", "Ens010") and expression (1500, 1500, 17000,500,10000) • Create a data frame containing only those gene names common to all data frames with all information from Annotation and the expression from Sample 1 and Sample 2. ensembl geneNames pathway geneLengths expression.x expression.y 1 Ens001 Gene 1 Glycolysis Ens003 Gene 2 TGFb 2 3 Ens006 Gene 3 Glycolysis 4 Ens010 Gene 5 Glycolysis • Add an extra two columns containing the length normalised expressions for Sample 1 and Sample 2 ensembl geneNames pathway geneLengths expression.x expression.y 1 En 001 #2 Ens003 Ens006 Gene 1 Glycolysis Gene 2 TGFb Gene 3 Glycolysis EN5010 Gene 5 Glycolysis Sample1 Ine Sample2 Ine 3 4 10.000000 15.000000 1.000000 0.500000 50.000000 85.000000 4.166667 8.333333 100 3000 200 1200 100 3000 200 1200 1000 3000 10000 5000 Gene S 1.0000000 1000 3000 10000 5000 • Identify the total length of genes in Glycolysis pathway. [1] 1500 1500 1500 17000 10000 Identify the mean length normalised expression across Sample 1 and Sample2 for Ens006 genes [1] 67.5 1500 1500 17000 10000 For all genes, identify the log2 fold change in length normalised expression from Sample 1 to Sample 2. Gene 3 Gene 1 Gene 2 0.5849625 -1.0000000 0.7655347

Database System Concepts
7th Edition
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Chapter1: Introduction
Section: Chapter Questions
Problem 1PE
icon
Related questions
Question

By using R language do the following: 

• Create data frame called Annotation with a column of gene names ("Gene_1", "Gene_2",
"Gene_3","Gene_4","Gene_5"), ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens007", "Ens010"),
pathway information ("Glycolysis", "TGFb", "Glycolysis", "TGFb", "Glycolysis") and gene lengths (100, 3000, 200,
1000,1200).
• Create data frame called Sample1 with ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens010") and
expression (1000, 3000, 10000,5000)
• Create data frame called Sample2 with ensembl gene names ("Ens001", "Ens003", "Ens006",
"Ens007", "Ens010") and expression (1500, 1500, 17000,500,10000)
• Create a data frame containing only those gene names common to all data frames with all information from
Annotation and the expression from Sample 1 and Sample 2.
ensembl geneNames pathway geneLengths expression.x expression.y
## 1 Ens881
## 2 Ens883
Ens886
## 3
## 4 Ens010
##
##
## 2 Ens003
## 3 Ens006
##4
##
## 1
## 2
## 3
## 4
• Add an extra two columns containing the length normalised expressions for Sample 1 and Sample 2
ensembl geneNames pathway geneLengths expression.x expression.y
##1 Ens001
Gene 1 Glycolysis
Gene_2
TGFb
Gene 3 Glycolysis
Ens010 Gene 5 Glycolysis
Sample1 Ine Sample2_1ne
15.000000
8.500000
85.000000
8.333333
18.000000
1.000000
##
Gene_1 Glycolysis
Gene_2
TGFb
Gene 3 Glycolysis
Gene 5 Glycolysis
58.000000
4.166667
100
3000
288
1200
188
3000
288
1200
1000
3808
18888
5888
• Identify the total length of genes in Glycolysis pathway.
## [1] 1500
1500
1500
17888
10000
1808
3808
10000
5888
• Identify the mean length normalised expression across Sample 1 and Sample2 for Ens006 genes
## [1] 67.5
• For all genes, identify the log2 fold change in length normalised expression from Sample 1 to Sample 2.
Gene 1
Gene 2
Gene 3
Gene 5
## 8.5849625 -1.0080000 8.7655347 1.0800000
1500
1500
17800
10000
Transcribed Image Text:• Create data frame called Annotation with a column of gene names ("Gene_1", "Gene_2", "Gene_3","Gene_4","Gene_5"), ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens007", "Ens010"), pathway information ("Glycolysis", "TGFb", "Glycolysis", "TGFb", "Glycolysis") and gene lengths (100, 3000, 200, 1000,1200). • Create data frame called Sample1 with ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens010") and expression (1000, 3000, 10000,5000) • Create data frame called Sample2 with ensembl gene names ("Ens001", "Ens003", "Ens006", "Ens007", "Ens010") and expression (1500, 1500, 17000,500,10000) • Create a data frame containing only those gene names common to all data frames with all information from Annotation and the expression from Sample 1 and Sample 2. ensembl geneNames pathway geneLengths expression.x expression.y ## 1 Ens881 ## 2 Ens883 Ens886 ## 3 ## 4 Ens010 ## ## ## 2 Ens003 ## 3 Ens006 ##4 ## ## 1 ## 2 ## 3 ## 4 • Add an extra two columns containing the length normalised expressions for Sample 1 and Sample 2 ensembl geneNames pathway geneLengths expression.x expression.y ##1 Ens001 Gene 1 Glycolysis Gene_2 TGFb Gene 3 Glycolysis Ens010 Gene 5 Glycolysis Sample1 Ine Sample2_1ne 15.000000 8.500000 85.000000 8.333333 18.000000 1.000000 ## Gene_1 Glycolysis Gene_2 TGFb Gene 3 Glycolysis Gene 5 Glycolysis 58.000000 4.166667 100 3000 288 1200 188 3000 288 1200 1000 3808 18888 5888 • Identify the total length of genes in Glycolysis pathway. ## [1] 1500 1500 1500 17888 10000 1808 3808 10000 5888 • Identify the mean length normalised expression across Sample 1 and Sample2 for Ens006 genes ## [1] 67.5 • For all genes, identify the log2 fold change in length normalised expression from Sample 1 to Sample 2. Gene 1 Gene 2 Gene 3 Gene 5 ## 8.5849625 -1.0080000 8.7655347 1.0800000 1500 1500 17800 10000
Expert Solution
steps

Step by step

Solved in 3 steps with 1 images

Blurred answer
Knowledge Booster
Fundamentals of Datawarehouse
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Database System Concepts
Computer Science
ISBN:
9780078022159
Author:
Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:
McGraw-Hill Education
Starting Out with Python (4th Edition)
Starting Out with Python (4th Edition)
Computer Science
ISBN:
9780134444321
Author:
Tony Gaddis
Publisher:
PEARSON
Digital Fundamentals (11th Edition)
Digital Fundamentals (11th Edition)
Computer Science
ISBN:
9780132737968
Author:
Thomas L. Floyd
Publisher:
PEARSON
C How to Program (8th Edition)
C How to Program (8th Edition)
Computer Science
ISBN:
9780133976892
Author:
Paul J. Deitel, Harvey Deitel
Publisher:
PEARSON
Database Systems: Design, Implementation, & Manag…
Database Systems: Design, Implementation, & Manag…
Computer Science
ISBN:
9781337627900
Author:
Carlos Coronel, Steven Morris
Publisher:
Cengage Learning
Programmable Logic Controllers
Programmable Logic Controllers
Computer Science
ISBN:
9780073373843
Author:
Frank D. Petruzella
Publisher:
McGraw-Hill Education