Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookbook Measures of Central Tendency #2540

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions data/calculate-the-median/00-stdlib.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
packages: []
discussion: |
- **Description:** The [Median](https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php) measure of central tendency
is the middle value in an ordered set of data. It is useful for statistical analysis as it isn't easily influenced by extreme values (outliers)
- **Data Assumption:** This code assumes the data is of type float. While we can calculate the median for any numeric value, it makes sense to use floats in OCaml since all
integers can be represented as floats (i.e., 1 can be written 1.0), but not all floats can be represented by integer data types. For example, 1.5 does not equal 1 or 2.
- **Note:** When there is an even number of elements in the data, the median will be the average of the two middle values.
---

(* `median` returns the middle value of the elements of `data`.
An empty list will return `invalid_arg "empty list"`.

To find the median, the data needs to be sorted. `List.sort` comes with the OCaml standard library
and is used to create the `sorted_data` object.

We then check if we have an even or odd number of elements in `data`. This determines how we find the middle value.
*)
let median data =
let n = List.length data in
if data = []
then invalid_arg "empty list"
else (
let sorted_data = List.sort compare data in
if n mod 2 = 0
then (
let mid1 = List.nth sorted_data ((n / 2) - 1) in
let mid2 = List.nth sorted_data (n / 2) in
(mid1 +. mid2) /. 2.0)
else List.nth sorted_data (n / 2))

(* Example usage *)
let data = [ 1.0; 2.0; 3.0; 4.0; 5.0; 6.0; 1.0 ]

Printf.printf "The median value is %f" (median data)
33 changes: 33 additions & 0 deletions data/cookbook/calculate-the-mean/00-stdlib.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
packages: []
discussion: |
- **Description:** One of most commonly used measure of central tendency is the [Mean](https://statistics.laerd.com/statistical-guides/measures-central-tendency-mean-mode-median.php)
- **Calculation:** Using the Ocaml Standard Library
- **Data Assumption:** This code assumes the data is of type float. While we can calculate the mean for any numeric value, it makes sense to use floats in Ocaml since all
integers i.e 1 can be represented as floats 1. but not all floats can be represented by integer data types i.e. 1.5 does not equal 1 or 2
---


(* sample data for testing *)
let data = [ 2.; 3.; 4.; 5.; 3. ]

(* `mean` accepts a list of floats `data` and returns the mean or average of that data.

The total number of elements `total_elemnts` in the list is found using the `List.length` method from the standard libary.

`List.length` returns an integer, so we use the `float_of_int` function
to convert the `length` to a float so it can be used in the next calculation.

`summed_elements` is the sum of all elements in the list `data`.
It is calculated using the built-in `fold_left` method with the float addition operator `+.`
*)
let mean data =
match data with
| [] -> invalid_arg "list must not be empty"
| _ ->
let total_elements = float_of_int (List.length data) in
let summed_elements = List.fold_left ( +. ) 0. data in
summed_elements /. total_elements

(* Example usage. Note the final value is rounded to the second decimal place in the print statment. *)
Printf.printf "The mean value is %.2f" (mean data)
6 changes: 6 additions & 0 deletions data/cookbook/tasks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,12 @@ categories:
tasks:
- title: Calculate Geodistance Between Two Points
slug: calculate-geodistance-between-points
- title: Measures of Central Tendency
tasks:
- title: Calculate the Mean
slug: calculate-the-mean
- title: Calculate the Median
slug: calculate-the-median
- title: Operating System
tasks:
- title: Run an External Command and Process Stdout
Expand Down
Loading