TUTORIALS
Below are links to tutorials I created for the Language Technology and Data Analysis Laboratory (LADAL).
- DATA SCIENCE BASICS
- This LADAL tutorial provides some useful tips and tricks about working with computers, e.g. how to keep your computer running smoothly.
- This LADAL tutorial introduces basic conecpts of data science
- This LADAL tutorial introduces quantitative reasoning
- This LADAL tutorial introduces basic concepts of quantitative research (methodology)
- INTRODUCTION TO R
- This LADAL tutorial represents an introduction R for (absolute) beginners
- This LADAL tutorial introduces string processing with R
- This LADAL tutorial introduces regular expression in R
- This LADAL tutorial shows how to deal with (cerate, manipulate, and process) tabulated data in R
- DATA VISUALIZATION
- This LADAL tutorial introduces data visualization with R
- This LADAL tutorial exemplifies how to create common visualization types (scatter plot, line graph, bar pots, box plots, etc.) with R
- This LADAL tutorial exemplifies how to generate some lesser known but very useful visualization types in R
- This LADAL tutorial introduces geo-spatial data visualization (mapping) with R
- This LADAL tutorial shows how to generate interactive data visualizations in R using GoogleViz
- STATISTICS
- This LADAL tutorial introduces descriptive statistics.
- This LADAL tutorial introduces basic inferential statistics
- This LADAL tutorial introduces fixed- and mixed effects regression
- This LADAL tutorial introduces tree-based models
- This LADAL tutorial introduces cluster and correspondence analysis
- This LADAL tutorial introduces other grouping procedures like Semantic Vector Space models
- TEXT ANALYTICS / TEXT MINING / CORPUS LINGUISTICS
- This LADAL tutorial introduces text analysis and distant reading.
- This LADAL tutorial shows how to generate keyword-in-context concordances in R
- This LADAL tutorial introduces Network Analysis in R
- This LADAL tutorial introduces Co-occurrence and Collocation Analysis in R
- This LADAL tutorial introduces Topic Modeling with R
- This LADAL tutorial introduces Sentiment Analysis with R
- This LADAL tutorial shows how to add part-of-speech annotation (pos-tagging) and syntactic parsing in R for English, German, Spanish, Italian, and Dutch.
- CASE STUDIES /FOCUS TUTORIALS
- Creating vowel charts with Praat and R
This LADAL tutorial shows how to extract formant values in Praat and use these to create a vowel chart in R. - Text Mining with R: Building a Text Classifier
This tutorial exemplifies how to create a text classifier with R, i.e. it will implement a machine-learning algorithm, which classifies texts as being either a speech by Barack Obama or Mitt Romney. The script is based on Timothy DAuria’s YouTube tutorial „How to Build a Text Mining, Machine Learning Document Classification System in R!“ (https://www.youtube.com/watch?v=j1V2McKbkLo). The data is available here and the code for downloading the speeches is available here. - Corpus Linguistics: Gender and Age Differences in Swearing
This LADAL tutorial exemplifies how to perform a simple corpus analysis with R by focusing on gender and age differences in swear word use in Irish English. - PDF to txt
This LADAL tutorial shows how to extract the text from pdf-files into txt-files for further processing. - Webcrawling and -scraping with R
This LADAL tutorial shows how to crawl and scrape websites using R.
- Creating vowel charts with Praat and R
FOR STUDENTS
- General Notes for Students attending my Courses (Merkblatt für Seminare)
You will find a documents with general information about my seminars here. Please read this document in case you are attending or plan to attend one of my seminars! (last updated 2015/02/16) - Model term paper
You will find a model term paper here. This model term paper includes information about the structure, content, and formatting of term papers. You can also use it as a template for your own term papers and use the formatting within the model. (last updated 2015/04/08) - Course Materials
„Introduction to English Linguistics“ [sdm_download id=“469″ fancy=“0″]
„Methods in Linguistics/Methoden der Linguistik“ [sdm_download id=“461″ fancy=“0″]
PROGRAMMING / SOFTWARE DEVELOPMENT / CORPUS LINGUISTICS
Below you can find some resources such as scripts and data sets that you may find useful.
- R scripts
- Chi Squared test for subtables of 2*k tables (R script)
- Configural Frequency Analysis for data with only two level configurations (R script)
- Function written by Tony Breyal for downloading text from websites (to create corpora containing web data) (R script)
- Function providing nice summaries of simple linear regressions (R script)
- Function providing nice summaries of multiple linear regressions (R script)
- Function providing nice summaries of fixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-up model fitting of fixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-up model fitting of mixed-effects binomial logistic regressions linear regression (R script)
- Function providing nice summaries of step-wise step-down model fitting of mixed-effects binomial logistic regressions linear regression (R script)
- Biodata scripts & data sets (last updated 2015/02/09)
If you find any bugs in the code or mistakes in the results, please let me know so I can correct the scripts and update the results.- ICE Canada: word counts and biodata (R script, result)
- ICE GB-R2: word counts and biodata (R script, result)
- ICE India: word counts and biodata (R script, result)
- ICE Ireland 1.2.2: word counts and biodata (R script, result)
- ICE Jamaica: word counts and biodata (R script, result)
- ICE New Zealand: word counts and biodata (R script, result)
- ICE Philippines: word counts and biodata (R script, result)
- ICE Singapore: word counts (R script, result)
- ICE Hong Kong: word counts (R script, result)
- SBCAE: word counts and biodata (R script, result)
- TestCorpus
A small sample corpus for testing functions.
(last updated 2020/09/25)