Maxkit: 4月 2020

2020/04/20

rust10 Generic Types, Traits, and Lifetimes

每個程式語言都有處理重複概念的工具，rust 的工具之一是 generic types，泛型是具體類別或其他屬性的抽象替代。撰寫程式碼，就是表達 generics 的行為，或是如何跟其他 generics 互動，不需要知道在寫程式或是編譯時，實際上代表什麼東西。

首先回顧提取函數減少重複程式碼的機制。然後使用一個只在參數類別上不同的泛型函數，來實現相同的功能，另外也會討論 struct 及 enum 的泛型。

然後討論 trait，這是定義泛型行為的方法。 trait 可跟泛型結合，將泛型限制為有特定行為的類別。

最後是 lifetimes，是允許向 compiler 提供引用如何相互關聯的泛型。rust 的 lifetime 可在很多 borrow values 狀況下，還能讓 compiler 檢查 references 是否 valid。

提取函數

提取函數可不使用 generic types 處理重複程式碼的問題。

// 在整數 vector 中，尋找最大值的函數
fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let mut largest = number_list[0];

    // iterate 所有整數，將最大值存放到 largest
    for number in number_list {
        if number > largest {
            largest = number;
        }
    }

    println!("The largest number is {}", largest);
}

如果還有另一個 vector，會發生重複 code 的問題，可提取 largest function

fn largest(list: &[i32]) -> i32 {
    let mut largest = list[0];

    for &item in list.iter() {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let number_list = vec![102, 34, 6000, 89, 54, 2, 43, 8];

    let result = largest(&number_list);
    println!("The largest number is {}", result);
}

步驟：

找出重複 code
將重複的 code 提到一個函數中，在函數定義上，指定輸入及返回值
重複 code 改為呼叫函數

Generic Data Types

瞭解如何用泛型定義 function, structs, enum, method

在函數定義中使用泛型

這是兩種不同的參數類別的 largest function

fn largest_i32(list: &[i32]) -> i32 {
    let mut largest = list[0];

    for &item in list.iter() {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn largest_char(list: &[char]) -> char {
    let mut largest = list[0];

    for &item in list.iter() {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest_i32(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest_char(&char_list);
    println!("The largest char is {}", result);
}

任何 id 都可以當作 type parameter name，通常習慣會使用 T

// 泛型的參數宣告，必須放在函數名稱後面，參數前面的 <> 裡面
// 這個函數有一個參數 list，類別為 T 的 slice
// 會回傳 T
fn largest<T>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list.iter() {
        // 會發生編譯錯誤
        // error[E0369]: binary operation `>` cannot be applied to type `T`
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

編譯錯誤

error[E0369]: binary operation `>` cannot be applied to type `T`
 --> src/main.rs:8:12
  |
8 |         if item > largest {
  |            ^^^^^^^^^^^^^^
  |
  = note: `T` might need a bound for `std::cmp::PartialOrd`

error: aborting due to previous error

std::cmp::PartialOrd是一個 trait，這表示 largest 實際上不適用所有類別，因為函數中需要比較 T 類別的值。標準庫中定義的 std::cmp::PartialOrd trait 可以實現類別的比較功能，在後面的章節討論。

structs 定義中的泛型

可使用 <> 定義一個或多個泛型參數的 structs

struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    let integer = Point { x: 5, y: 10 };
    let float = Point { x: 1.0, y: 4.0 };
}

在 struct 名稱後面用 <T> 宣告泛型參數的名稱，然後就可在 struct 中使用 T

因為 x, y 使用了相同的泛型宣告 T，因此 x, y 如果是不同的類別，就會無法編譯

struct Point<T> {
    x: T,
    y: T,
}

fn main() {
    // 編譯錯誤
    let wont_work = Point { x: 5, y: 4.0 };
}

改用不同的泛型宣告

struct Point<T, U> {
    x: T,
    y: U,
}

fn main() {
    let both_integer = Point { x: 5, y: 10 };
    let both_float = Point { x: 1.0, y: 4.0 };
    let integer_and_float = Point { x: 5, y: 4.0 };
}

enum 定義中的泛型

enum Option<T> {
    Some(T),
    None,
}

enum Result<T, E> {
    Ok(T),
    Err(E),
}

method 定義中的泛型

struct Point<T> {
    x: T,
    y: T,
}

// 實作 method x，會回傳 T 類別的 x 的 reference
// 必須在 impl 後面加上泛型宣告 <T>，這樣 compiler 才知道這是泛型，就可以用在 Point<T>
impl<T> Point<T> {
    fn x(&self) -> &T {
        &self.x
    }
}

// 僅針對 T 是 f32 的狀況，提供一個 method
impl Point<f32> {
    fn distance_from_origin(&self) -> f32 {
        (self.x.powi(2) + self.y.powi(2)).sqrt()
    }
}

fn main() {
    let p = Point { x: 5, y: 10 };

    println!("p.x = {}", p.x());

    let p2 = Point { x: 5.0, y: 10.0 };
    println!("distance_from_origin={}", p2.distance_from_origin());

    let p3 = Point { x: 5, y: 10 };
    // error[E0599]: no method named `distance_from_origin` found for type `Point<{integer}>` in the current scope
    //println!("distance_from_origin={}", p3.distance_from_origin());
}

struct Point<T, U> {
    x: T,
    y: U,
}

// 這個泛型宣告是針對 struct 定義
impl<T, U> Point<T, U> {
    // mixup 另一個 Point，且其泛型宣告跟上面的宣告不同
    // 也就是參數裡面的 Point 裡面的型別，跟呼叫這個 method 的型別不同
    // 這邊的泛型宣告是針對 method 裡面的參數，回傳的泛型宣告，就混合了兩個泛型定義
    fn mixup<V, W>(self, other: Point<V, W>) -> Point<T, W> {
        Point {
            x: self.x,
            y: other.y,
        }
    }
}

fn main() {
    let p1 = Point { x: 5, y: 10.4 };
    let p2 = Point { x: "Hello", y: 'c'};

    let p3 = p1.mixup(p2);

    println!("p3.x = {}, p3.y = {}", p3.x, p3.y);
    // p3.x = 5, p3.y = c
}

使用泛型程式碼的效能

使用泛型不會影響效能

rust 在編譯時，會將泛型程式碼進行 monomorphization 單態化，也就是填充編譯時使用的具體類別，將通用程式碼轉換為特定程式碼的過程。編譯器會尋找所有泛型程式碼被呼叫的位置，並使用泛型程式碼針對具體類型生成binary。

let integer = Some(5);
let float = Some(5.0);

rust 在編譯時會進行 monomorphization，發現傳給 Option 的值有兩種，一個是 i32，一個是 f64，然後會將 Option<T> 展開變成 Option_i32 及 Option_f64，然後將泛型定義替換為這兩個具體的定義。

enum Option_i32 {
    Some(i32),
    None,
}

enum Option_f64 {
    Some(f64),
    None,
}

fn main() {
    let integer = Option_i32::Some(5);
    let float = Option_f64::Some(5.0);
}

Traits: 定義分享的行為

trait 是告訴 compiler 某些類別共享的行為，另外 trait bounds 指定泛型是有某些特定行為的類別。

trait 類似其他程式語言中的 interface，但有一些不同。

定義 trait

一個類別的行為由其提供呼叫的 method 組成。如果可對不同類別呼叫相同的 method，這些類別就共享了相同的行為。

trait 是一種將 method signatures 組合起來的方法，目的是定義一個實現某種功能所需的所有行為的集合。

ex: 有存放多種不同長度的文字 structs: (1) NewsArticle struct 儲存在某特定地區的 news story (2) Tweet 存 280 chars 的文字，有 meta 代表這是 new tweet/retweet/reply

現在想做一個 aggregator library，顯示所有資料的 summary。我們需要對每一個類別都提供 summary method。這是 summary trait 的定義：

pub trait Summary {
    fn summarize(&self) -> String;
}

compiler 會檢查，所有包含 Summary trait 的類別，都實作了 summarize method 的 method body。

實作 trait

剛剛已經定義了 Summary trait，現在要在 NewsArticle, Tweet 都實作 summarize

pub struct NewsArticle {
    pub headline: String,
    pub location: String,
    pub author: String,
    pub content: String,
}

impl Summary for NewsArticle {
    fn summarize(&self) -> String {
        format!("{}, by {} ({})", self.headline, self.author, self.location)
    }
}

pub struct Tweet {
    pub username: String,
    pub content: String,
    pub reply: bool,
    pub retweet: bool,
}

impl Summary for Tweet {
    fn summarize(&self) -> String {
        format!("{}: {}", self.username, self.content)
    }
}

在類別實作 trait 就跟一般的 method 一樣，差別是 impl 後面，要提供 trait 的名稱。

現在就可以呼叫 summarize

let tweet = Tweet {
    username: String::from("horse_ebooks"),
    content: String::from("of course, as you probably already know, people"),
    reply: false,
    retweet: false,
};

println!("1 new tweet: {}", tweet.summarize());

如果 Summary trait 放在另一個 aggregator crate 裡面，必須將 trait 引入 scope，也就是 use aggregator:Summary; ，另外 Summary 必須是 pub，讓其他 crate 可以使用。

實作 trait 要注意，只有 trait 或實作 trait 的類別位於 crate 的本地 scope 中，才能為該類別實作 trait。例如可對 Tweet 實作 std lib 裡面的 Display trait，因為 Summary trait 位於 lib crate 裡面，所以可以在 Tweet 實作 Summary。

無法對外部類別實作外部 trait，例如不能在 lib crate 為 Vec<T> 實作 Display trait，這是因為 Display 跟 Vec<T> 都定義在 std lib 裡面。這個限制稱為: coherence，也稱為 orpan rule。可確保其他人的程式碼，不會破壞你的 code。沒有這個規則時，兩個 crate 可同時對相同的類別實作相同的 trait，compiler 會無法判斷該使用哪一個實作。

Default Implementation

有時候可為某些 method 提供預設的行為。在類別中實作 trait 時，可自行決定要不要 override 預設行為。

以下定義了 Summary trait 同時提供預設實作。

pub trait Summary {
    fn summarize(&self) -> String {
        String::from("(Read more...)")
    }
}

以下是 NewsArticle 使用預設的 Summary

impl Summary for NewsArticle {
}

traits as parameters

瞭解如何使用 trait 來接受不同類別的參數。

先前已經定義了 NewsArticle, Tweet 並實作 Summary trait。可再定義一個函數 notify 呼叫參數 item 的 summarize method，該參數是實作了Summary trait 的某種類別。

pub fn notify(item: impl Summary) {
    println!("Breaking news! {}", item.summarize());
}

在 notify 函數可以呼叫任何來自 Summary trait 的方法，比如 summarize。

trait bound syntax

impl Trait 語法適用於精簡的例子，實際上完整的 trait bound 應該是

pub fn notify<T: Summary>(item: T) {
    println!("Breaking news! {}", item.summarize());
}

trait bound 跟泛型參數宣告放在一起，放在 <> 後面，因為 T 的 trait bound，我們可傳入 NewsArticle or Tweet 的 instance，並呼叫 notify。

trait bound 適合複雜的場景，例如需要獲取兩個實作 Summary 的不同類別

pub fn notify(item1: impl Summary, item2: impl Summary) {

這是獲取兩個實作 Summary 的相同類別，要用 trait bound

pub fn notify<T: Summary>(item1: T, item2: T) {

利用 `+` 語法，指定多個 trait bounds

如果 notify 需要顯示 item 的格式化形式，同時要使用 summarize method，那麼 item 就需要實作兩個不同的 trait: Display and Summary

pub fn notify(item: impl Summary + Display) {

也適用 trait bound

pub fn notify<T: Summary + Display>(item: T) {

用 `where` 簡化程式碼

使用太多 trait bound 也有缺點，每個泛型有自己的 trait bound，所以有多個泛型參數的函數，在名稱與參數列表之間，會有很長的 trait bound，這會讓 method 很難閱讀。因此可改用 where 指定 trait bound

fn some_function<T: Display + Clone, U: Clone + Debug>(t: T, u: U) -> i32 {

改用 where

fn some_function<T, U>(t: T, u: U) -> i32
    where T: Display + Clone,
          U: Clone + Debug
{

實作 traits 的回傳值

在回傳值的地方，用 impl Summary 語法

fn returns_summarizable() -> impl Summary {
    Tweet {
        username: String::from("horse_ebooks"),
        content: String::from("of course, as you probably already know, people"),
        reply: false,
        retweet: false,
    }
}

這表示要回傳某個實現 Summary trait 的類別，但不確定是哪一個類別。實際上在實作中是回傳 Tweet

在 chap13 會介紹 clousures, iterators 這兩個依賴 trait 的功能。

但目前這樣的實作方式，只適合用在回傳單一種類別的情況，以下的程式碼無法編譯。因為會回傳 NewsArticle or Tweet，但 impl Trait 的實作，不接受這種程式碼。

fn returns_summarizable(switch: bool) -> impl Summary {
    if switch {
        NewsArticle {
            headline: String::from("Penguins win the Stanley Cup Championship!"),
            location: String::from("Pittsburgh, PA, USA"),
            author: String::from("Iceburgh"),
            content: String::from("The Pittsburgh Penguins once again are the best
            hockey team in the NHL."),
        }
    } else {
        Tweet {
            username: String::from("horse_ebooks"),
            content: String::from("of course, as you probably already know, people"),
            reply: false,
            retweet: false,
        }
    }
}

用 trait bounds 修正 largest

剛剛有問題的程式碼：

// 泛型的參數宣告，必須放在函數名稱後面，參數前面的 <> 裡面
// 這個函數有一個參數 list，類別為 T 的 slice
// 會回傳 T
fn largest<T>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list.iter() {
        // 會發生編譯錯誤
        // error[E0369]: binary operation `>` cannot be applied to type `T`
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

因 largest 要用 > 比較兩個 T 類別的值，> 是定義在 std::cmp::PartialOrd ，需要在 T 的 trait bound 中指定 PartialOrd，讓 largest 可用在任何可比較大小的類別的 slice。修改 largest 的定義

fn largest<T: PartialOrd>(list: &[T]) -> T {

編譯錯誤

error[E0508]: cannot move out of type `[T]`, a non-copy slice
 --> src/main.rs:5:23

這是因為 i32, char 是已知大小，只能存在 stack，因此他們實現了 Copy trait，當把 largest 改為泛型，list 的參數類別有可能沒有實作 Copy trait

fn largest<T: PartialOrd + Copy>(list: &[T]) -> T {
    let mut largest = list[0];

    for &item in list.iter() {
        if item > largest {
            largest = item;
        }
    }

    largest
}

fn main() {
    let number_list = vec![34, 50, 25, 100, 65];

    let result = largest(&number_list);
    println!("The largest number is {}", result);

    let char_list = vec!['y', 'm', 'a', 'q'];

    let result = largest(&char_list);
    println!("The largest char is {}", result);
}

fn largest<T: PartialOrd + Copy>(list: &[T]) -> T { largest 要增加 PartialOrd + Copy trait

也可以將 Copy 換成 Clone trait，clone 就代表可處理類似String 這種在 heap 的資料，但可能會因為大量資料造成速度變慢。

另一種 largest 實作方法是回傳 slice 中 T 的引用，改為 &T，這樣就不需要用 Copy/Clone trait bounds

使用 trait bound 有條件地實作方法

Pair<T> 實作了 new，但只有為 T 實作了 PartialOrd 與 Display trait 的 Pair<T> 才會實作 cmp_display method

use std::fmt::Display;

struct Pair<T> {
    x: T,
    y: T,
}

impl<T> Pair<T> {
    fn new(x: T, y: T) -> Self {
        Self {
            x,
            y,
        }
    }
}

impl<T: Display + PartialOrd> Pair<T> {
    fn cmp_display(&self) {
        if self.x >= self.y {
            println!("The largest member is x = {}", self.x);
        } else {
            println!("The largest member is y = {}", self.y);
        }
    }
}

對任何滿足特定 trait bound 的類別實作 trait 也稱為 blanket implementations，常用在 rust std lib。例如 std lib 為 Display trait 實作 ToString trait

impl<T: Display> ToString for T {
    // --snip--
}

因此可對任何實作 Display trait 的類別呼叫 ToString 的 to_string

blanket implementation 會出現在 trait 文件的 "Implementers" 部分。

trait 及 trait bound 讓我們使用泛型減少重複 code，且能向 compiler 明確指定類別行為。因為 trait bound 讓 compiler 能檢查類別是否提供正確的行為。

rust 將錯誤由執行期移動到編譯期。

lifetimes 是另一種泛型，可確保引用類別時，一直有效

以 Lifetimes 驗證 references 有效性

rust 每一個引用都有 lifetime，也就是引用有效的 scope。rust 需要用泛型 lifetime 參數註明 lifetime 的關係，確保引用永遠有效。

lifetime 是這個語言最特別的功能

Preventing Dangling References with Lifetimes

dangling reference: 引用了非預期引用的資料。


{
        // 宣告沒有初始值的變數，存在於 outer scope
    let r;

    {
            // 在 inner scope 將 r 設定為 inner 變數 x 的 reference
        let x = 5;
        // 編譯錯誤：error[E0597]: `x` does not live long enough
        r = &x;
    }

    // 列印 r
    println!("r: {}", r);
}

rust 是透過 borrow checker 檢查這段程式碼

borrow checker

r 的 lifetime 為 'a，x 為 'b， 'b 明顯比 'a 小

{
    let r;                // ---------+-- 'a
                          //          |
    {                     //          |
        let x = 5;        // -+-- 'b  |
        r = &x;           //  |       |
    }                     // -+       |
                          //          |
    println!("r: {}", r); //          |
}                         // ---------+

這是有效的引用，因為資料比引用有更長的 lifetime

{
    let x = 5;            // ----------+-- 'b
                          //           |
    let r = &x;           // --+-- 'a  |
                          //   |       |
    println!("r: {}", r); //   |       |
                          // --+       |
}                         // ----------+

函數中的泛型生命週期

一個返回兩個字符串 slice 中較長者的函數。這個函數獲取兩個字符串 slice 並返回一個 String slice。

因為 rust 不知道要回傳的引用是指向 x or y

// 編譯錯誤：error[E0106]: missing lifetime specifier
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

fn main() {
    let string1 = String::from("abcd");
    let string2 = "xyz";

    let result = longest(string1.as_str(), string2);
    println!("The longest string is {}", result);
}

透過生命週期參數，定義引用之間的關係，讓 borrow checker 能進行分析。

Lifetime Annotation Syntax

lifetime annotation 並不改變任何引用的 lifetime 長短。當指定泛型 lifetime parameter 後，函數也可以接受任何 lifetime 的引用。lifetime annotation 描述多個引用 lifetime 相互的關係，不影響其 lifetime。

lifetime annotation 名稱要以 ' 開頭，名稱通常是小寫，且非常短。預設都是使用 'a 。 lifetime annotation 位於引用的 & 後面，有一個空格將引用類別與生命週期註解分開。

&i32        // 引用
&'a i32     // 帶有顯式生命週期的引用
&'a mut i32 // 帶有顯式生命週期的可變引用

單一個 lifetime annotation 本身沒有意義，因為這是用在多個引用的泛型生命週期參數之間的關係。如果函數有一個 lifetime 'a 的 i32的引用參數 first，還有一個 'a 的 i32 引用的參數 second，就表示 first, second 必須跟這個泛型的 lifetime 一樣久。

Lifetime Annotations in Function Signatures

泛型生命週期參數需要需宣告在函數名稱和參數列表間的<>中間，用意是告訴 Rust 關於參數中的引用和返回值之間的限制是他們都必須擁有相同的生命週期。

以下 longest 的所有的引用必須有相同的生命週期 'a

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

通過在函數簽名中指定生命週期參數時，並沒有改變任何傳入後返回的值的生命週期。而是指出任何不遵守這個協議的傳入值都將被借用檢查器拒絕。longest 函數並不需要知道 x 和 y 具體會存在多久，而只需要知道有某個可以被 'a 替代的作用域將會滿足這個簽名。當函數引用或被函數之外的代碼引用時，讓 Rust 自己分析出參數或返回值的生命週期幾乎是不可能的。生命週期在每次函數被調用時都可能不同。這也就是為什麼我們需要手動標記生命週期。

當具體的引用被傳遞給 longest 時，被 'a 所替代的具體生命週期是 x 的作用域與 y 的作用域相重疊的那一部分。換一種說法就是泛型生命週期 'a 的具體生命週期等同於 x 和 y 的生命週期中較小的那一個。因為我們用相同的生命週期參數 'a 標註了返回的引用值，所以返回的引用值就能保證在 x 和 y 中較短的那個生命週期結束之前保持有效。

如何通過傳遞擁有不同具體生命週期的引用來限制 longest 函數的使用。

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

fn main() {
    // string1 在外部 scope 有效
    let string1 = String::from("long string is long");
    {
        // string2 在內部 scope 有效
        let string2 = String::from("xyz");
        // result 使用內部 scope 有效的 reference
        let result = longest(string1.as_str(), string2.as_str());
        println!("The longest string is {}", result);
    }
}

如果在 string2 離開 scope 後使用 result，會發生編譯錯誤

fn main() {
    let string1 = String::from("long string is long");
    let result;
    {
        let string2 = String::from("xyz");
        // error[E0597]: `string2` does not live long enough
        result = longest(string1.as_str(), string2.as_str());
    }
    println!("The longest string is {}", result);
}

Thinking in Terms of Lifetimes

如果將 longest 函數的實現修改為總是返回第一個參數而不是最長的字符串 slice，就不需要為參數 y 指定一個生命週期。

fn longest<'a>(x: &'a str, y: &str) -> &'a str {
    x
}

如果返回的引用沒有指向任何一個參數，那麼唯一的可能就是它指向一個函數內部創建的值，它將會是一個 dangling reference。

fn longest<'a>(x: &str, y: &str) -> &'a str {
    // error[E0597]: `result` does not live long enough
    let result = String::from("really long string");
    result.as_str()
}

Lifetime Annotations in Struct Definitions

定義包含引用的 struct，但需要為每一個引用都加上 lifetime annotations

struct ImportantExcerpt<'a> {
    part: &'a str,
}

fn main() {
    let novel = String::from("Call me Ishmael. Some years ago...");
    let first_sentence = novel.split('.')
        .next()
        .expect("Could not find a '.'");
    let i = ImportantExcerpt { part: first_sentence };
}

part，它存放了一個字符串 slice，這是一個引用。

ImportantExcerpt 的實例不能比其 part 字段中的引用存在的更久。

main 函數創建了一個 ImportantExcerpt 的實例，它存放了變數 novel 所擁有的 String 的第一個句子的引用。novel 的數據在 ImportantExcerpt 實例創建之前就存在。另外，直到 ImportantExcerpt 離開作用域之後 novel 都不會離開作用域，所以 ImportantExcerpt 實例中的引用是有效的

Lifetime Elision

每一個引用都有一個生命週期，而且我們需要為那些使用了引用的函數或結構體指定生命週期。

但以下程式，沒有生命週期註解卻能編譯成功：

fn first_word(s: &str) -> &str {
    let bytes = s.as_bytes();

    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[0..i];
        }
    }

    &s[..]
}

這個程式在pre-1.0 的 rust 是不能編譯的，當時必須寫成

fn first_word<'a>(s: &'a str) -> &'a str {

但 rust 團隊發現在特定情況下，可以預測生命週期，因此可讓 borrow checker 自己推測出生命週期。

被編碼進入 rust 引用分析的模式被稱為 lifetime elision rules 生命週期省略規則。這些規則是在某些特定狀況下，編譯器可以自己推測生命週期，不需要明確指定。如果無法推測出生命週期，compiler 會直接給出錯誤，而必須要填寫生命週期註解來解決這個問題。

函數/method 的參數的生命週期稱為 input lifetimes，回傳值的生命週期稱為 output lifetimes。

compiler 採用三條規則，判斷需不需要明確的註解。rule 1 適用於 input lifetime，rule 2, 3 適用於 output lifetimes。這些規則適用於 fn 及 impl 區塊。

rule 1: 每一個是引用的參數，都有自己的生命週期參數。有一個引用參數的函數有一個生命週期參數 ex: fn foo<'a>(x: &'a i32)，有兩個引用參數的函數，有兩個生命週期參數 ex: fn foo<'a, 'b>(x: &'a i32, y: &'b i32)。

rule 2: 如果只有一個 input lifetime 參數，該 lifetime 會被指定給所有 output lifetime parameters ex: fn foo<'a>(x: &'a i32) -> &'a i32

rule 3: 如果 method 有多個 input lifetime 參數，其中有一個是 &self 或 &mut self， self 的生命週期會被賦予給所有 output lifetime 參數。

fn first_word(s: &str) -> &str {

compiler 套用 rule 1

fn first_word<'a>(s: &'a str) -> &str {

套用 rule 2，因為只有一個 input lifetime 參數，因此所有引用都有了生命週期

fn first_word<'a>(s: &'a str) -> &'a str {

另一個例子

fn longest(x: &str, y: &str) -> &str {

compiler 套用 rule 1

fn longest<'a, 'b>(x: &'a str, y: &'b str) -> &str {

因有多個input lifetime 參數，不適用 rule 2

因沒有 self 不適用 rule 3

結果還是無法推測出回傳值的 lifetime，因此會出現編譯錯誤

Lifetime Annotations in Method Definitions

在為 struct with lifetime 實作 method 時，使用跟以下的 generic type parameters 相同的語法，宣告及使用 lifetime 參數的位置，跟 struct fields 或 method parameter 與 return values 有關。

struct Point<T, U> {
    x: T,
    y: U,
}

impl<T, U> Point<T, U> {
    fn mixup<V, W>(self, other: Point<V, W>) -> Point<T, W> {
        Point {
            x: self.x,
            y: other.y,
        }
    }
}

fn main() {
    let p1 = Point { x: 5, y: 10.4 };
    let p2 = Point { x: "Hello", y: 'c'};

    let p3 = p1.mixup(p2);

    println!("p3.x = {}, p3.y = {}", p3.x, p3.y);
}

struct fields 的 lifetime names 要宣告在 impl 後面，在 struct 名稱的前面，因為 lifetime 是 struct type 的一部分。

在 impl 區塊裡面的 method signatures，引用會跟 struct 的引用相關或是獨立無關。另外，可套用 lifetime elision rules。

例子

level 方法只有一個 self 參數，且回傳是 i32，不是任何值的引用

impl<'a> ImportantExcerpt<'a> {
    fn level(&self) -> i32 {
        3
    }
}

impl 後面的 lifetime 宣告是必要的，但不需要對 &self 加上 lifetime annotation，因為 elision rule 1

套用 rule 3 的例子

有兩個 input lifetime parameter，套用 rule 1，有兩個 lfietime，因為其中一個是 &self ，套用 rule 3，回傳值被賦予 &self 的 lifetime

impl<'a> ImportantExcerpt<'a> {
    fn announce_and_return_part(&self, announcement: &str) -> &str {
        println!("Attention please: {}", announcement);
        self.part
    }
}

static lifetime `'static`

'static 生命週期存在於整個程式，所有 string literal 都有 'static lifetime

let s: &'static str = "I have a static lifetime.";

這會直接存在程式的 binary code 裡面，讓這個字串永久可以使用

如果錯誤訊息提到了 'static，要先考慮該引用是否是整個程式都有效，才將引用指定為 'static。大部分的情況是遇到 dangling reference，或是無法匹配可用的 lifetimes

Generic Type Parameters, Trait Bounds, and Lifetimes Together

在同一個 fn 裡面同時指定泛型類別參數、trait bounds 和生命週期的語法

use std::fmt::Display;

fn longest_with_an_announcement<'a, T>(x: &'a str, y: &'a str, ann: T) -> &'a str
    where T: Display
{
    println!("Announcement! {}", ann);
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

ann 的類型是泛型 T，可被放入任何實現 where 裡面指定的 Display trait 的類別。這個額外參數，會在函數比較 slice 長度之前被列印出來，這是 Display trai bound 必要的原因。因為 lifetime 也是泛型，所以 'a 跟 T 都位於函數名稱後面，同一個 <> 裡面

Summary

泛型類別參數、trait 和 trait bounds 以及泛型生命週期類別

trait 和 trait bounds 保證了即使類別是泛型的，這些類別也會擁有所需要的行為。由生命週期註解所指定的引用生命週期之間的關係保證了這些靈活多變的代碼不會出現懸垂引用。而所有的這一切發生在編譯時所以不會影響運行時效率！

References

2020/04/13

rust09 Error Handling

rust 有很多特性可處理錯誤狀況。通常 rust 希望你在編譯前，就找出很多錯誤。

rust 將錯誤分成兩類：recoverable and unrecoverable errors。recoverable error 通常是向 user 報告錯誤並重試，例如找不到檔案。unrecoverable error 就是 bug，例如存取超過 array 長度的 index。

大部分程式語言不區分這兩種錯誤，以 exception 方式處理。但 Rust 沒有 exception。對於 recoverable error 提供 Result<T, E> 及遇到 unrecoverable error 的 panic!，會停止程式執行。

以下會先介紹 panic!，然後說明如何回傳 Result<T, E>。

Unrecoverable Errors with `panic!`

有時程式出問題，沒辦法處理，rust 可使用 panic! macro 列印錯誤訊息，unwind 並清理 stack，然後 quit。通常是發生在檢測到有 bug，但 programmer 不知道怎麼處理。

unwinding the Stack or Aborting in Response to a Panic

出現 panic 時，程式會開始 unwinding，rust 會開始回朔 stack 並清理每個函數中的資料，但這個動作會耗費很多時間。另一種處理方式是直接 abort，不清理資料直接 quit，由 OS 清理程式使用的 memory。如果希望讓 binary 很小，可在 Cargo.toml 的 [profile] 加上 panic='abort'。

[profile.release]
panic = 'abort'

以簡單程式測試 panic!

fn main() {
    panic!("crash and burn");
}

執行結果

$ cargo run
   Compiling guessing_game v0.1.0 (/Users/charley/project/panic)
    Finished dev [unoptimized + debuginfo] target(s) in 0.89s
     Running `target/debug/guessing_game`
thread 'main' panicked at 'crash and burn', src/main.rs:2:4
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

panic 出現的位置是 src/main.rs:2:4 ，也就是 src/main.rs 文件的第二行第四個字元。

但 panic! 可能是由 library或其他macro 呼叫的，可使用 panic! 的 backtrace 找到出問題的地方。

使用 `panic!` 的 backtrace

這是存取超過 vector 長度的 panic!

fn main() {
    let v = vec![1, 2, 3];

    v[99];
}

執行結果：發生 buffer overread，導致安全漏洞

$ cargo run
   Compiling guessing_game v0.1.0 (/Users/charley/project/panic)
    Finished dev [unoptimized + debuginfo] target(s) in 1.04s
     Running `target/debug/guessing_game`
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 99', /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libcore/slice/mod.rs:2539:10
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

可設定 RUST_BACKTRACE=1 環境變數，列印 backtrace

$ RUST_BACKTRACE=1 cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
     Running `target/debug/panic`
thread 'main' panicked at 'index out of bounds: the len is 3 but the index is 99', /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libcore/slice/mod.rs:2539:10
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:70
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:58
             at src/libstd/panicking.rs:200
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:215
   4: <std::panicking::begin_panic::PanicPayload<A> as core::panic::BoxMeUp>::get
             at src/libstd/panicking.rs:478
   5: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:385
   6: std::panicking::try::do_call
             at src/libstd/panicking.rs:312
   7: <T as core::any::Any>::type_id
             at src/libcore/panicking.rs:85
   8: <T as core::any::Any>::type_id
             at src/libcore/panicking.rs:61
   9: <usize as core::slice::SliceIndex<[T]>>::index
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libcore/slice/mod.rs:2539
  10: core::slice::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libcore/slice/mod.rs:2396
  11: <alloc::vec::Vec<T> as core::ops::index::Index<I>>::index
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/liballoc/vec.rs:1677
  12: panic::main
             at src/main.rs:4
  13: std::rt::lang_start::{{closure}}
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libstd/rt.rs:64
  14: std::panicking::try::do_call
             at src/libstd/rt.rs:49
             at src/libstd/panicking.rs:297
  15: panic_unwind::dwarf::eh::read_encoded_pointer
             at src/libpanic_unwind/lib.rs:87
  16: <std::panicking::begin_panic::PanicPayload<A> as core::panic::BoxMeUp>::get
             at src/libstd/panicking.rs:276
             at src/libstd/panic.rs:388
             at src/libstd/rt.rs:48
  17: std::rt::lang_start
             at /rustc/6c2484dc3c532c052f159264e970278d8b77cdc9/src/libstd/rt.rs:64
  18: panic::main

這些資訊只在 debug 版本會產生，也就是不要使用 --release 參數。

在第 12 行有指出發生問題的地方

  12: panic::main
             at src/main.rs:4

Recoverable Errors with `Result`

大部分的錯誤沒有嚴重到要停止程式。Result enum 有兩個成員: Ok 及 Err

enum Result<T, E> {
    Ok(T),
    Err(E),
}

T, E 是泛型類別參數。

由文件得知 File::open 會回傳 Result，直接編譯程式，也可以知道類別不匹配的錯誤。

如果刻意把 f 類別改成 u32，let f: u32 = File::open("hello.txt");可以由 compiler 得知錯誤的問題點，該 method 會回傳 std::result::Result<std::fs::File, std::io::Error>。

error[E0308]: mismatched types
 --> src/main.rs:4:17
  |
4 |     let f:u32 = File::open("hello.txt");
  |                 ^^^^^^^^^^^^^^^^^^^^^^^ expected u32, found enum `std::result::Result`
  |
  = note: expected type `u32`
             found type `std::result::Result<std::fs::File, std::io::Error>`

error: aborting due to previous error

use std::fs::File;

fn main() {
    let f = File::open("hello.txt");

    let f = match f {
        Ok(file) => file,
        Err(error) => {
            panic!("There was a problem opening the file: {:?}", error)
        },
    };
}

執行時會得到自訂的 panic! 錯誤訊息

     Running `target/debug/panic`
thread 'main' panicked at 'There was a problem opening the file: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/main.rs:9:13

也可以匹配 error 的類別，判斷不同類型的 error

use std::fs::File;
use std::io::ErrorKind;

fn main() {
    let f = File::open("hello.txt");

    let f = match f {
        Ok(file) => file,
        Err(error) => match error.kind() {
            ErrorKind::NotFound => match File::create("hello.txt") {
                Ok(fc) => fc,
                Err(e) => panic!("Tried to create file but there was a problem: {:?}", e),
            },
            other_error => panic!("There was a problem opening the file: {:?}", other_error),
        },
    };
}

這裡用 error.kind() 區分了 ErrorKind::NotFound 及其他錯誤

Shortcuts for Panic on Error: `unwrap` and `expect`

match 語法有點冗長。Result<T, E> 定義了其他 method 來輔助處理錯誤。

unwrap 會回傳 Ok 裡面的值，如果 Result 是 Err，unwrap 會呼叫 panic!

use std::fs::File;

fn main() {
    let f = File::open("hello.txt").unwrap();
}

執行結果

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:997:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

expec 可自訂 panic!的訊息內容

use std::fs::File;

fn main() {
    let f = File::open("hello.txt").expect("Failed to open hello.txt");
}

執行結果

thread 'main' panicked at 'Failed to open hello.txt: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/libcore/result.rs:997:5

Propagating Errors 傳遞錯誤

除了在函數中處理錯誤外，也可以選擇讓呼叫 method 的程式決定如何處理錯誤，這稱為 propagating error。

以下是從檔案中讀取 username 的函數，如果檔案不存在或無法讀取，會回傳錯誤。用 match 將錯誤回傳給呼叫者。

use std::io;
use std::io::Read;
use std::fs::File;

// 回傳 Result<String, io::Error>，也就是 Result<T, E>
fn read_username_from_file() -> Result<String, io::Error> {
    let f = File::open("hello.txt");

    // 用 match 處理 File::open 的回傳值
    let mut f = match f {
        Ok(file) => file,
        Err(e) => return Err(e),
    };

    // 產生新的 String
    let mut s = String::new();

    // read_to_string 也會回傳 Result
    // 因為這是函數內最後一個 expression，不需要寫 return
    match f.read_to_string(&mut s) {
        Ok(_) => Ok(s),
        Err(e) => Err(e),
    }
}

Shortcut for Propagating Errors: the `?` Operator

跟上面的例子功能一樣，但使用了 ?

Result 值之後的 ? 被定義為與上一個例子中定義的處理 Result 值的 match 表達式有著完全相同的工作方式，如果是 Err 會直接回傳給呼叫者。

? 的錯誤值是傳給 from 函數，定義於 From trait，可將錯誤轉型。from 可將錯誤轉換為 io:Error

use std::io;
use std::io::Read;
use std::fs::File;

fn read_username_from_file() -> Result<String, io::Error> {
    let mut f = File::open("hello.txt")?;
    let mut s = String::new();
    f.read_to_string(&mut s)?;
    Ok(s)
}

也可以這樣更縮減語法

use std::io;
use std::io::Read;
use std::fs::File;

fn read_username_from_file() -> Result<String, io::Error> {
    let mut s = String::new();

    File::open("hello.txt")?.read_to_string(&mut s)?;

    Ok(s)
}

rust 有提供另一個簡便的 method，將檔案讀入 String。

use std::io;
use std::fs;

fn read_username_from_file() -> Result<String, io::Error> {
    fs::read_to_string("hello.txt")
}

`?` 只能用在回傳 `Result` 的函數

因為 ? 只能用在回傳 Result 的函數中，因此如果用在 main，將會發生編譯錯誤

use std::fs::File;

fn main() {
    let f = File::open("hello.txt")?;
}

編譯錯誤

error[E0277]: the `?` operator can only be used in a function that returns `Result` or `Option` (or another type that implements `std::ops::Try`)
 --> src/main.rs:4:13
  |
4 |     let f = File::open("hello.txt")?;
  |             ^^^^^^^^^^^^^^^^^^^^^^^^ cannot use the `?` operator in a function that returns `()`
  |
  = help: the trait `std::ops::Try` is not implemented for `()`
  = note: required by `std::ops::Try::from_error`

error: aborting due to previous error

要改用 match 語法，或是修改 main 函數，讓他回傳 Result<T, E>

use std::error::Error;
use std::fs::File;

fn main() -> Result<(), Box<dyn Error>> {
    let f = File::open("hello.txt")?;

    Ok(())
}

Box<dyn Error> 稱為 trait object -> chap 17

什麼時候要使用 `panic!`

如果程式發生 panic，就表示無法恢復。比較好的方式，是使用 Result，讓呼叫者自己決定要怎麼處理。

有些狀況比較適合用 panic!，但不常見。

examples, prototype code, tests

如果是範例，使用 Result 會讓例子不明確。

在決定如何處理錯誤之前，unwrap 跟 expect 在 prototype 中很適合。

如果 method 在測試中失敗了，就明確讓該測試失敗。因為 panic! 就代表測試失敗，因此要確切地使用 unwrap 與 expect

有明確的商業邏輯時

當有明確的邏輯，確定 Result 一定是 Ok 時，可使用 unwrap，因 compiler 無法知道這種邏輯。但實際上，還是有可能會發生呼叫失敗的狀況。

use std::net::IpAddr;

let home: IpAddr = "127.0.0.1".parse().unwrap();

因為 "127.0.0.1" 確實是有效的 IP Address，就直接使用 unwrap

Guidelines for Error Handling

在可能導致有害的狀況下，要使用 panic!，bad state 就是某些假設條件、保證、合約或不變性被破壞時，例如 invalid values, 自相矛盾的值或傳遞了不存在的值。還有

bad state 不是某些預期爾會發生的狀況
後面的程式以這種有害狀況為條件，繼續執行
沒有好的方法，可將這種資訊編碼，成為可使用的資料

如果別人呼叫你的程式，並傳遞了無效的值，最好的方式就是 panic!，警告使用這個 library 的人，有 bug。同理，panic! 非常適合呼叫不能控制的外部程式碼時，因無法修復其回傳的無效狀態。

如果預期錯誤會發生，就要用 Result。例如解析器收到錯誤資料，或 http request 觸發了 rate limit。這時就將錯誤進行 propagation，讓呼叫者決定要怎麼處理。

當程式碼使用某些 values 前，要先檢查是不是 valid values，如果不是，就要 panic!。這是基於安全性的理由：嘗試使用無效資料，會造成安全漏洞。ex: out-of-bounds memory access。

函數通常會遵循 contracts，該行為只會在輸入資料滿足某些條件，才能正常運作。違反契約，就發生 panic 是合理的，這代表呼叫方有 bug。也沒有合理的方法可以恢復呼叫方的程式碼。

雖然函數中有很多次錯誤檢查很煩人。可用 rust 類別系統及 compiler 類別檢查協助。如果函數已經有特定類別的參數，compiler 會檢查一定有有效的 value。例如使用不同於 Option 的類別，程式期望有值而不是 None，程式碼不需要處理 Some, None 兩種狀況，compiler 可確保一定有值，因為無法向函數傳遞空值。另外像 u32 也可以確保永遠不會是負數。

先前用 if 判斷數字是否超過有效範圍

loop {
    // --snip--

    let guess: i32 = match guess.trim().parse() {
        Ok(num) => num,
        Err(_) => continue,
    };

    if guess < 1 || guess > 100 {
        println!("The secret number will be between 1 and 100.");
        continue;
    }

    match guess.cmp(&secret_number) {
    // --snip--
        }
}

更有效的方法，是建立新的類別，並將資料檢查放到 constructor 中。以下只有 1 ~ 100 的數字，才能建立新的 Guess 類別

pub struct Guess {
    value: i32,
}

impl Guess {
    pub fn new(value: i32) -> Guess {
        if value < 1 || value > 100 {
            panic!("Guess value must be between 1 and 100, got {}.", value);
        }

        Guess {
            value
        }
    }

    // getter 的功能，因為 value 是私有的
    pub fn value(&self) -> i32 {
        self.value
    }
}

References

std library 提供的其他 collection 的文件

2020/04/06

rust08 Common Collections

Collection 可包含多個值。跟內建的 array 與 tuple 不同，collection 是儲存在 heap，資料的大小可隨時異動。以下討論三個最常用的 collection

vector: 可逐項儲存數量可變的值
string: 是 char 的集合，也就是先前用過的 String
hash map: key - value pair。這是 map 的一種特殊實作的版本。

Storing Lists of Values with Vectors

vector 可儲存多個值，在記憶體中是一個接著一個排列。只能儲存相同類別的值。例如，適合儲存文件的逐行文字資料，或是購物車裡的商品價格。

建立新的 Vector

注意要加上 <i32> 資料類別，因為 compiler 無法判斷沒有任何值的 Vector 的資料型別。

let v: Vec<i32> = Vec::new();

vec! 是 Macro，因為已經有初始的值，compiler 就能推測出 v 的類別是 Vec<i32>

let v = vec![1, 2, 3];

更新 Vector

要能改變 v，必須宣告為 mut 可變

let mut v = Vec::new();

v.push(5);
v.push(6);
v.push(7);
v.push(8);

Dropping a Vector Drops Its Elements

當 vector 離開 scope 被丟棄時，裡面的內容也會被丟棄。

{
    let v = vec![1, 2, 3, 4];

    // 使用 v

} // v 離開 scope 並被丟棄

讀取 Vector 的 elements

用 index 或是 get method，index 的結果是 reference，get method 回傳的結果是 Option<&T>。

let v = vec![1, 2, 3, 4, 5];

// &v[2] 透過 index 取得第三個資料
let third: &i32 = &v[2];
println!("The third element is {}", third);

// 用 get method
match v.get(2) {
    Some(third) => println!("The third element is {}", third),
    None => println!("There is no third element."),
}

如果要取得超過長度的 index，就會在執行時發生 error，這部分錯誤無法在編譯時被發現。而 get 並不會 crash，而是回傳 None。

let v = vec![1, 2, 3, 4, 5];

// thread 'main' panicked at 'index out of bounds: the len is 5 but the index is 100'
let does_not_exist = &v[100];

// None
let does_not_exist = v.get(100);

當程式獲得一個有效的 reference，borrow checker 會處理 ownership 及 borrowing rules 確保 vector 的引用永遠有效。ex: 因為在相同作用域中同時存在可變和不可變引用的規則，無法在獲得了 vector 的第一個元素的不可變引用後，嘗試在 vector 末尾增加一個元素。

let mut v = vec![1, 2, 3, 4, 5];

let first = &v[0];

v.push(6);

println!("The first element is: {}", first);

編譯時會發生錯誤

error[E0502]: cannot borrow `v` as mutable because it is also borrowed as immutable

在 vector 的結尾增加新元素時，可能發生沒有足夠空間將所有所有元素依次相鄰存放，這時候會分配新記憶體並將舊的元素複製到新的空間中。這時，第一個元素的引用就指向了被釋放的記憶體。借用規則會阻止程式陷入這種狀況。

Iterating over the Values in a Vector

用 for

let v = vec![100, 32, 57];
for i in &v {
    println!("{}", i);
}

// 也可以取得可變引用，然後修改裡面的值
let mut v = vec![100, 32, 57];
for i in &mut v {
    *i += 50;
}

Using an Enum to Store Multiple Types

當需要在 vector 儲存不同類別的資料時，可以使用 enum。vector 裡面儲存的類別，就是相同的 enum

enum SpreadsheetCell {
    Int(i32),
    Float(f64),
    Text(String),
}

let row = vec![
    SpreadsheetCell::Int(3),
    SpreadsheetCell::Text(String::from("blue")),
    SpreadsheetCell::Float(10.12),
];

但如果一開始無法知道要儲存到 vector 的所有類別，就無法使用 enum，要改用 chap17 的 trait。

Storing UTF-8 Encoded Text with Strings

通常在 String 會遇到三個問題：rust 會確保找出所有可能的錯誤、String 資料結構比想像中複雜、UTF-8。

String 本身就是 collection of bytes，再外加一些 method 讓 bytes 可用 text 解譯。接下來會討論 String 跟其他 collections 的相異處，例如：因為人類跟機器解讀 String 方法不同，造成String 的 indexing 比較複雜。

What is a String?

rust 在核心中只有一種 string type: string slice str，通常會以引用的方式出現 &str。chap4 有討論過 string slices 就是 references 到某些存在別處的 UTF-8 encoded string data。例如 String literal，就是存在程式的 binary 中，因此也是 string slices。

rust std library 提供的 String type 是 growable, mutable, owned, UTF-8 encoded string type。當 Rustacean 談到 rust 的 "string" 時，通常是同時代表 String 以及 string slice &str 這兩個。雖然大部分都是關於 String，這兩個類別在 std 都被廣泛使用，同時他們都是 UTF-8 encoded。

std library 還有其他 string 類別，例如：OsString, OsStr, CString, CStr。還有其他 libray crate 提供更多 string。類別是以 String 或是 Str 結尾，對應到 owned 及 borrowed variants。

Creating new String

    // 建立新的 String
    let mut s = String::new();

    let data = "initial contents";
    // to_string 可用在任何實現了 Display trait 的類別
    let s = data.to_string();

    // 用 string literal 的 to_string 產生 String
    let s = "initial contents".to_string();

    // 用 String::from 產生 String
    let s = String::from("initial contents");

    // 可以儲存 UTF-8 的字串
    let hello = String::from("السلام عليكم");
    let hello = String::from("Dobrý den");
    let hello = String::from("Hello");
    let hello = String::from("שָׁלוֹם");
    let hello = String::from("नमस्ते");
    let hello = String::from("こんにちは");
    let hello = String::from("안녕하세요");
    let hello = String::from("你好");
    let hello = String::from("Olá");
    let hello = String::from("Здравствуйте");
    let hello = String::from("Hola");

Updating a String

可使用 push_str, push 增加 string 內容

    // push_str 增加 string slice，但 push_str 不需要獲得該 string 的 ownership
    let mut s = String::from("foo");
    s.push_str("bar");

    // 如果 push_str 獲取了 s2 的 ownership，就會無法列印資料
    let mut s1 = String::from("foo");
    let s2 = "bar";
    s1.push_str(s2);
    println!("s2 is {}", s2);

    // 用 push 增加 char
    let mut s = String::from("lo");
    s.push('l');

使用 + 或 format! 連接字串

    let s1 = String::from("Hello, ");
    let s2 = String::from("world!");
    let s3 = s1 + &s2; // 注意 s1 被移動了，不能繼續使用， s2 還可以繼續使用

    // + 的 函數定義類似這樣
    // fn add(self, s: &str) -> String {
    // &s2 是引用, add 只能將 String 及 &s 相加，不能將兩個 String 相加
    // 上面的 s2 會由 String 強制轉型 coerced 為 &str

    let s1 = String::from("tic");
    let s2 = String::from("tac");
    let s3 = String::from("toe");

    //let s = s1 + "-" + &s2 + "-" + &s3;
    // 可改用 format!，跟 println! 類似，且不會獲取 s2, s3 的 ownership
    let s = format!("{}-{}-{}", s1, s2, s3);

    println!("{}, {}, {}", s, s2, s3);

Indexing into Strings

如果用 index 語法取得 String 的一部分會發生錯誤，也就是說 rust 的 string 不支援 indexing

    let s1 = String::from("hello");
    let h = s1[0];

編譯錯誤

error[E0277]: the type `std::string::String` cannot be indexed by `{integer}`
 --> src/main.rs:5:13
  |
5 |     let h = s1[0];
  |             ^^^^^ `std::string::String` cannot be indexed by `{integer}`
  |
  = help: the trait `std::ops::Index<{integer}>` is not implemented for `std::string::String`

String 的實作方法

String 是一個 Vec<u8> 的封裝

    let len1 = String::from("Hola").len();
    // len 為 4

    let len2 = String::from("Здравствуйте").len();
    // len 為 24 不是 12，因為 unicode 每個字元需要 2 bytes

    println!("{}, {}", len1, len2);

因此 char 的 index 並不一定能對應到有效的 unicode。

假設 rust 可以這樣寫

let hello = "Здравствуйте";
let answer = &hello[0];

З 的第一個byte 為 208, 第二個是 151， answer 應該為 208，但 208 不是一個正常的 unicode。為了避免發生這樣的問題，rust 直接拒絕這樣的寫法，而是給我們編譯錯誤的訊息。

Bytes and Scalar Values and Grapheme Clusters

rust 有三種方式理解 string: bytes, scalar values, grapheme clusters (字形集合)

例如印度語單詞 “नमस्ते” 存在 Vector 為

[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
224, 165, 135]

有 18 bytes，但從 unicode 來說，應該要是

['न', 'म', 'स', '्', 'त', 'े']

但第四與第六都不是字母，以 grapheme cluster 方式理解，應該為

["न", "म", "स्", "ते"]

另外因為 String 透過 index 取得 slice 的時間預期為 O(1)，但 String 每次都必須要從開頭開始，無法確保O(1) 這樣的效能。

因此 rust 的 string 不支援 indexing

Slicing Strings

這是可能會造成程式 crash 的 method。可以用 [] 及 range 取得 string slice

    let hello = "Здравствуйте";

    // 這些字元都是 2 bytes
    // s 將會是 “Зд”
    let s = &hello[0..4];

    // 如果獲取 &hello[0..1] ，會造成程式 panic
    let s2 = &hello[0..1];

thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside 'З' (bytes 0..2) of `Здравствуйте`', src/libcore/str/mod.rs:2027:5

Iterating over Strings

for c in "नमस्ते".chars() {
    println!("{}", c);
}

न
म
स
्
त
े

也可以轉換為 bytes

for b in "नमस्ते".bytes() {
    println!("{}", b);
}

Storing Keys with Associated Values in Hash Maps

HashMap<K, V> 透過 hashing function 決定如何將 key, value 放入 memory

Creating a New Hash Map

    // HashMap 沒有倍 prelude 自動引用
    use std::collections::HashMap;

    // 類似 vector，HashMap 的 key 為 String, value 為 i32
    let mut scores = HashMap::new();
    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);


    // 也可以用 vector 的 collect 產生 Hash Map
    let teams  = vec![String::from("Blue"), String::from("Yellow")];
    let initial_scores = vec![10, 50];

    // 先透過 zip 產生 tuple 的 vector，再呼叫 collect
    let scores: HashMap<_, _> = teams.iter().zip(initial_scores.iter()).collect();
    println!("{:?}", scores);
    // {"Yellow": 50, "Blue": 10}

Hash Maps and Ownership

對於像 i32 實現 Copy trait 的類別，值可以複製到 Hash Map，但對於 String 有 ownership 的值，其值會因為 move 而轉職給 Hash Map

    use std::collections::HashMap;

    let field_name = String::from("Favorite color");
    let field_value = String::from("Blue");

    let mut map = HashMap::new();
    map.insert(field_name, field_value);
    // 這裡 field_name 和 field_value 不再有效，不能再使用

如果是用 reference，這些引用指向的值，雖 ownership 不會移動給 Hash Map，但必須在 map 有效時同樣有效。

存取 Hash Map 的 Values

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);

    let team_name = String::from("Blue");
    // get 會回傳 Option<V>，如果不存在會回傳 None
    let score = scores.get(&team_name);
    println!("{:?}", score);
    // Some(10)


    // use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Yellow"), 50);

    // 用 for iterate Hash Map
    for (key, value) in &scores {
        println!("{}: {}", key, value);
    }
    //Yellow: 50
    //Blue: 10

更新 Hash Map

覆蓋舊的 value: insert

    use std::collections::HashMap;

    let mut scores = HashMap::new();

    scores.insert(String::from("Blue"), 10);
    scores.insert(String::from("Blue"), 25);

    println!("{:?}", scores);
    // {"Blue": 25}

只在沒有 value 時 insert: entry

    use std::collections::HashMap;

    let mut scores = HashMap::new();
    scores.insert(String::from("Blue"), 10);

    scores.entry(String::from("Yellow")).or_insert(50);
    // 不會改成 50
    scores.entry(String::from("Blue")).or_insert(50);

    println!("{:?}", scores);
    // {"Yellow": 50, "Blue": 10}

根據舊的值，更新新值

例如記錄某個單詞出現幾次

    use std::collections::HashMap;

    let text = "hello world wonderful world";

    let mut map = HashMap::new();

    // or_insert 會回傳 &mut V，可變引用
    for word in text.split_whitespace() {
        let count = map.entry(word).or_insert(0);
        *count += 1;
    }

    println!("{:?}", map);
    // {"wonderful": 1, "hello": 1, "world": 2}

Hashing Functions

HashMap 預設使用 "cryptographically strong" hashing function，可防止 Denial of Service (DoS) 攻擊。這不是最快的演算法，但犧牲性能提高安全性。可利用 hasher 切換使用其他 hashing function。hasher 是實作 BuildHasher trait 的類別。crates.io 可找到其他常用的 hasher hashing function。

References