R for Corpus Linguistics
I’ve used R for several projects, and each time that I do, I have felt that I was having to learn it all over. Part of that is because I’d used functions and code that others have used, and I adjust for my purposes. Since I hadn’t completely understood what each parameter and combination did and hadn’t started with R from the basics, I easily forgot what I'd learned.
Since I intend to keep using R and using it with a set of corpus data, I took the opportunity to work through R with Stefan Gries practical introduction called Quantitative Corpus Linguistics with R. I didn’t find anyone working through R at my speed and level when I was first starting, so my Youtube channel R for Corpus Linguistics shares my exploration and learning. You’re welcome to follow along with me, either with Gries' book or just with R.
Since I intend to keep using R and using it with a set of corpus data, I took the opportunity to work through R with Stefan Gries practical introduction called Quantitative Corpus Linguistics with R. I didn’t find anyone working through R at my speed and level when I was first starting, so my Youtube channel R for Corpus Linguistics shares my exploration and learning. You’re welcome to follow along with me, either with Gries' book or just with R.
Reference—Summary of episodes, topics, and functions
Episode | Topic | Functions | ||||||
---|---|---|---|---|---|---|---|---|
1 | Introduction to R | mean() | sqrt() | log() | ||||
2 | R data assignment | ls() | rm() | sample() | q() | |||
3 | R Data structures | is.vector() | vector() | class() | length() | str() | ||
4 | R Reading Files | scan() | ||||||
5 | R Vector Elements Basics | min() | max() | which() | [basic logical expressions] | |||
6 | R Subset and replace vector elements | length() | which() | sum() | %in% | |||
7 | R Compare and combine vector elements | match() | setdiff() | intersect() | union() | unique() | table() | |
8 | R Sort and order vector elements | sort() | order() | |||||
9 | R Write vector to file | cat() | scan() | |||||
10 | R matching word lists (exercises) | letters() | LETTERS() | table() | match() | %in% | setdiff() | length() |
11 | R Merge frequency lists (exercise) | names() | sort() | length() | as.table() | tapply() | ||
12 | R Factors and data frames | factor() | is.factor() | data.frame() | ||||
13 | R Reading tables | read.table() | choose.files() | setwd() | ||||
14 | R Writing tables to files | write.table() | ||||||
15 | R Data frames and missing data | read.table() | summary() | na.fail() | is.na() | na.omit() | complete.cases() | |
16 | R Creating subsets from data frames | read.table() | attach() | which() | subset() | %in% | ||
17 | R Ordering subsets from data frames | subset() | dim() | sample() | order() | rank() | ||
18 | R Splitting data frames (exercises) | letters() | data.frame() | read.table() | choose.files(default="") | rm() | split() | |
19 | R Creating, ordering, saving data frames (exercises) | write.table() | order() | rank() | data.frame() | |||
20 | R Further ordering and adding to data frames (exercises) | data.frame() | which() | subset() | names() | order() | write.table() | |
21 | R Changing and deleting from data frames (exercises) | read.table() | subset() | order() | sort() | names() | write.table() | |
22 | R Lists versus data frames | read.table(default=) | list() | is.list() | is.vector() | is.data.frame() | [] vs [[]] | |
23 | R Accessing and splitting lists | split() | list() | [] vs [[]] | ||||
24 | R Conditional if statements | if() {} | else if() {} | cat() | ||||
25 | R Conditional if statements (exercise) | if() {} | else if() {} | |||||
26 | R Basic loops | if() {} | for() {} | seq() | cat() | while() {} | repeat {} | |
27 | R Basic loops (exercise) | for() {} | letters() | |||||
28 | R Alternatives to basic loops | for() {} | if() {} else {} | sum() | which() | tapply() | ||
29 | R tapply functions (exercises) | tapply() | read.table(file=) | read.table(choose.files()) | ||||
30 | R More "apply" alternatives to basic loops | for() {} | list() | sapply() | lapply() | |||
31 | R Accessing character strings | nchar() | substr() | tolower() | toupper() | chartr() | ||
32 | R Merging and splitting character strings | paste() | strsplit() | unlist() | ||||
33 | R Trimming character strings | substr() | strtrim() | abbreviate() | ||||
34 | R Locating one character string match | grep() | strsplit() | regexpr() | ||||
35 | R Locating multiple matches in character strings | gregexpr() | is.vector() | unlist() | attributes() | attr() | ||
36 | R Search and Substitute matches in character strings | strapply() | gsub() | grep() | nchar() | cat() | "regular expressions, including ^ , . , …, \\" | |
37 | R Search and Substitute character strings with regular expressions | gsub() | grep() | "regular expressions, including [] , {} , ( ), + , |" | ||||
38 | R Search and Substitute character strings exercises | gsub() | grep() | "regular expressions, including [] , {} , ( ), | , ^ , $" | ||||
39 | R Character string match exercises | strapply() | grep() | gregexpr() | "regular expression, including []" | |||
40 | R Matching with character classes | gsub() | character classes | |||||
41 | R Greedy and non-greedy matching | gregexpr() | attr() | unlist() | subset() | "regular expressions, including [ ], (?U), .*, + " | ||
42 | R Back-referencing character expressions | gsub() | "regular expressions, including [ ] , +, \\w, \\b, \\1" | |||||
43 | R Back-referencing digit matches (for dates) | sub() | gsub() | "regular expressions, including { } , +, \\d, \\D, \\b, \\1, \\2, …" | ||||
44 | R Back-referencing character strings (for tagging) | gsub() | "regular expressions, including ( ) , +, ?, \\W, \\b, \\1, \\2, …" | |||||
45 | R Looking ahead when replacing | grep() | gsub() | "regular expressions, including ( ) , (?=), \\1" | ||||
46 | R Looking ahead greedy versus non-greedy | gregexpr() | "regular expressions, including *, .*, .*? , (?=)" | |||||
47 | R Negative lookahead versus not a match | gsub() | "regular expressions, including (?!), [^]" | |||||