Rust 代码挑战系列 7

2024-04-12

这是一个学习 Rust 的小系列文章，通过完成来自 shuttle 平台所举办的 2023 Christmas Code Hunt 里的每个小挑战，来学习 rust web 框架的使用，此为第七篇文章。

本篇内的 Part 14 将介绍 axum 返回 html 内容，Part 15 介绍密码验证器的实现，从中了解一些琐碎知识点。

Part 14

该部分涉及的所有路由如下，后续不再强调路由配置：

Router::new()
 .route("/14/unsafe", post(handler::d14_1))
 .route("/14/safe", post(handler::d14_2));

Task 1

# 示例输入输出
curl -X POST http://localhost:8000/14/unsafe \
  -H "Content-Type: application/json" \
  -d '{"content": "<h1>Welcome to the North Pole!</h1>"}'

<html>
  <head>
    <title>CCH23 Day 14</title>
  </head>
  <body>
    <h1>Welcome to the North Pole!</h1>
  </body>
</html>

该任务里需要将请求体里的 html 字符串作页面内容的一部分返回。在 axum 里我们可以使用 Html 构造一个 Content-Type 为 text/html 的响应：

use axum::{body, response::Html, Json};
use serde::Deserialize;

#[derive(Deserialize)]
pub struct D14Body {
    content: String,
}

pub async fn d14_1(body: Json<D14Body>) -> Html<String> {
    let x = format!(
        "<html>
  <head>
    <title>CCH23 Day 14</title>
  </head>
  <body>
    {}
  </body>
</html>",
        body.0.content.as_str()
    )
    .to_string();

    Html(x)
}

上述中我们将请求体的 html 字符串拼接到页面中，然后使用 Html 构造一个响应返回。

Task 2

# 示例输入输出
curl -X POST http://localhost:8000/14/safe \
  -H "Content-Type: application/json" \
  -d '{"content": "<script>alert(\"XSS Attack!\")</script>"}'

<html>
  <head>
    <title>CCH23 Day 14</title>
  </head>
  <body>
    &lt;script&gt;alert(&quot;XSS Attack!&quot;)&lt;/script&gt;
  </body>
</html>

在 Task 1 中我们直接拼接客户端传来的字符串，这肯定是存在 XSS 攻击隐患的，现在我们看看如何对用户输入的内容进行转码来预防 XSS 攻击：

pub async fn d14_2(body: Json<D14Body>) -> Html<String> {
    let escaped = html_escape::encode_double_quoted_attribute(body.content.as_str());
    let x = format!(
        "<html>
  <head>
    <title>CCH23 Day 14</title>
  </head>
  <body>
    {}
  </body>
</html>",
        escaped
    )
    .to_string();
    Html(x)
}

上述我们使用 html_escape crate 来处理转义，该 crate 提供了多种不同的 encode 和 decode 方法，不同方法所处理的符号种类会有所不同，本例中需要对双引号也做转义。

Part 15

该部分涉及的所有路由如下：

Router::new()
 .route("/15/nice", post(handler::d15_1))
 .route("/15/game", post(handler::d15_2))

下面的两个任务都是给你一个字符串，然后根据不同规则返回不同的结果，规则内容可查看挑战原内容，此处不赘述：

Task 1

# 示例输入输出 1
curl -X POST http://localhost:8000/15/nice \
  -H 'Content-Type: application/json' \
  -d '{"input": "hello there"}'

# 200 OK
{"result":"nice"}

# 示例输入输出 2
curl -X POST http://localhost:8000/15/nice \
  -H 'Content-Type: application/json' \
  -d '{"input": "abcd"}'

# 400 Bad Request
{"result":"naughty"}

# 示例输入输出 3
curl -X POST http://localhost:8000/15/nice \
  -H 'Content-Type: application/json' \
  -d '{Grinch? GRINCH!}'

# 400 Bad Request
# response body does not matter

请求处理函数如下：

#[derive(Deserialize)]
pub struct D15Body {
    input: String,
}
pub async fn d15_1(body: Json<D15Body>) -> (StatusCode, Json<Value>) {
    let vowels = ['a', 'e', 'i', 'o', 'u'];
    let mut vowel_count = 0;
    let mut has_twice_letter = false;
    let mut prev_char = None;

    // chars() 返回了一个 iterator，c 是 char 类型
    for c in body.input.chars() {
        if !c.is_alphabetic() {
            continue;
        }

        if Some(c) == prev_char {
            has_twice_letter = true;
        }

        if vowels.contains(&c) {
            vowel_count += 1;
        }

        // rust 的 match 支持同时匹配多个变量
        match (prev_char, c) {
            (Some('a'), 'b') | (Some('c'), 'd') | (Some('p'), 'q') | (Some('x'), 'y') => {
                return (
                    StatusCode::BAD_REQUEST,
                    Json::from(json!({
                      "result":"naughty"
                    })),
                )
            }
            _ => {}
        }

        prev_char = Some(c);
    }

    if vowel_count >= 3 && has_twice_letter {
        return (
            StatusCode::OK,
            Json::from(json!({
              "result":"nice"
            })),
        );
    } else {
        return (
            StatusCode::BAD_REQUEST,
            Json::from(json!({
              "result":"naughty"
            })),
        );
    }
}

Task 2

该任务里的规则数变得更多了，共有九条规则，我们来看看每个规则都是如何判断的：


pub async fn d15_2(body: Json<D15Body>) -> (StatusCode, Json<Value>) {
    let input = body.input.as_str();
    // rule 1: 判断字符串长度必至少 8 位，直接使用 len() 即可
    if input.len() < 8 {
        return (
            StatusCode::BAD_REQUEST,
            Json::from(json!({
              "result":"naughty",
              "reason": "8 chars"
            })),
        );
    }

    // rule 2: 必须包含大写字母、小写字母和数字
    let mut has_upper_letter = false;
    let mut has_lower_letter = false;
    let mut digit_count = 0;
    let iter = input.chars().for_each(|c| {
        if c.is_alphabetic() {
            if c.is_uppercase() {
                has_upper_letter = true;
            } else if c.is_lowercase() {
                has_lower_letter = true;
            }
        } else if c.is_numeric() {
            digit_count += 1;
        }
    });

    if !has_upper_letter || !has_lower_letter || digit_count == 0 {
        return (
            StatusCode::BAD_REQUEST,
            Json::from(json!({
              "result":"naughty",
              "reason": "more types of chars"
            })),
        );
    }

    // rule 3: 至少包含 5 个数字
    if digit_count < 5 {
        return (
            StatusCode::BAD_REQUEST,
            Json::from(json!({
              "result":"naughty",
              "reason": "55555"
            })),
        );
    }

    // rule 4: 字符串中的连续数字之和需要是 2023
    // 这里使用正则筛出数字，正则相关使用在文章第二篇已提，此处不再赘述
    static RE_DIGIT: Lazy<Regex> = Lazy::new(|| Regex::new(r"\d+").unwrap());
    let digit_sum = RE_DIGIT
        .find_iter(input)
        .map(|x| usize::from_str_radix(x.as_str(), 10).unwrap_or_default())
        .sum::<usize>();
    if digit_sum != 2023 {
        return (
            StatusCode::BAD_REQUEST,
            Json::from(json!({
              "result":"naughty",
              "reason": "math is hard"
            })),
        );
    }

    // rule 5: 必须要包含字母 j、o、y 并按先后顺序
    static RE_JOY: Lazy<Regex> = Lazy::new(|| Regex::new(r"^.*j.+o.+y.*$").unwrap());

    if !RE_JOY.is_match(input) {
        return (
            StatusCode::NOT_ACCEPTABLE,
            Json::from(json!({
              "result":"naughty",
              "reason": "not joyful enough"
            })),
        );
    }

    // rule 6: 必须包含一个重复的字母，重复字母之间只有一个其他字母（如 xyx）
    let mut has_consecutive_letters = false;
    // rust 中的字符串使用 utf-8 编码，这是一种可变长度的编码，所以不能使用索引直接访问字符串，这种写法会触发编译报错（哇塞好 safe 的感觉）
    // 所以我们需要先将字符串转换成 char ，然后就可以使用索引了，比如下述写法：
    // let char_vec: Vec<char> = input.chars().collect();
    // char_vec[idx]
    // 或是
    // input.chars().nth(idx).unwrap()

    let char_vec: Vec<char> = input.chars().collect();
    for idx in 1..(char_vec.len() - 1) {
        let prev = char_vec[idx - 1];
        let cur = char_vec[idx];
        let next = char_vec[idx + 1];
        // 这里使用的是 is_ascii_alphabetic 而非 is_alphabetic 来判断
        if !(prev.is_ascii_alphabetic() && cur.is_ascii_alphabetic() && next.is_ascii_alphabetic())
        {
            continue;
        }
        let prev_num = prev.to_ascii_lowercase() as u8;
        let cur_num = cur.to_ascii_lowercase() as u8;
        let next_num = next.to_ascii_lowercase() as u8;
        if (prev_num == next_num && (cur_num + 1) % 26 == prev_num % 26) {
            has_consecutive_letters = true;
            break;
        }
    }
    if !has_consecutive_letters {
        return (
            StatusCode::UNAVAILABLE_FOR_LEGAL_REASONS,
            Json::from(json!({
              "result":"naughty",
              "reason": "illegal: no sandwich"
            })),
        );
    }

    // rule 7: 至少需要包含一个范围 [U+2980, U+2BFF] 里的 unicode 符号
    if !input.contains(|c| c >= '\u{2980}' && c <= '\u{2bff}') {
        return (
            StatusCode::RANGE_NOT_SATISFIABLE,
            Json::from(json!({
              "result":"naughty",
              "reason": "outranged"
            })),
        );
    }

    // rule 8: 至少包含一个 emoji
    // unicode property escape \p  https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape
    static RE_EMOJI: Lazy<Regex> = Lazy::new(|| Regex::new(r"\p{Extended_Pictographic}").unwrap());
    if !RE_EMOJI.is_match(input) {
        return (
            StatusCode::UPGRADE_REQUIRED,
            Json::from(json!({
              "result":"naughty",
              "reason": "😳"
            })),
        );
    }

    // rule 9: 字符串的 sha256 哈希值的十六进制表示形式必须以字母 a 结尾
    let hash = Sha256::new().chain_update(input).finalize();
    let hash = format!("{:x}", hash);
    if !hash.ends_with('a') {
        return (
            StatusCode::IM_A_TEAPOT,
            Json::from(json!({
              "result":"naughty",
              "reason": "not a coffee brewer"
            })),
        );
    }

    return (
        StatusCode::OK,
        Json::from(json!({
          "result":"nice",
          "reason": "that's a nice password"
        })),
    );
}

在 rule 9 中计算 sha256 哈希值时，需要从外部安装工具库：

cargo add sha2

其次，上述的 format!("{:x}", hash) 中的 :x 是如何来的呢？

在 std::fmt module 里列举了 format string 的完整说明，一对花括号 {} 表示了一个 formatting argument，用来代表引用到的外部值，在这花括号里面，冒号 : 前面可以跟变量名，冒号后面则用来设置不同的 formatting parameter，比如平时经常使用的 {:?} 的写法，其中的 ? 就是一个 formatting parameter，其本质就是调用 Debug trait（完整列表见文档中的 Formatting traits），这里我们将使用到的是 x 参数则是使用到了 LowerHex trait 来获取十六进制的小写表示。

小结

此处无小结。

Home