J’s Digest

Posts

Showing posts from April, 2025

【AskSense 八部曲：第 1 篇】當搜尋也能讀心思：AskSense 誕生的祕密推手

- April 29, 2025

啟發契機：生活中的詐騙陷阱你是不是也有過類似的經驗，收到一封看起來超正式的簡訊或郵件，上面寫著「您的帳戶出現異常交易，請立即驗證」？你心想：「哇，好像很重要，趕快點！」殊不知這正是詐騙集團的拿手好戲，他們用「官方口吻＋緊迫感」當釣餌，讓你根本沒時間思考就上鉤。傳統防詐系統只會傻傻地掃描「異常交易」、「驗證」這些字眼，就像只看食材卻不會下廚，遇到同義詞或變化就瞬間傻眼，結果很多人防線看似固若金湯，卻還是被耍得團團轉。為了讓每個人都能在第一時間分辨詐騙，我們需要一種能「讀懂整句話」的技術。現在，我們有了 AskSense，就好像口袋裡多帶了一位超靈敏的「語意偵探」，它不只盯字，還「讀句子背後的意思」。當一段可疑文字進來，AskSense 會把整句話拿去跟上萬條已知詐騙範例比對，只要兩個數學向量的夾角夠小（意圖夠接近），就會立刻警報：「嘿！這很可能是在耍你！」不管詐騙語句怎麼花俏包裝，都逃不過這道智慧防線。資料庫大揭密：打造乾淨的詐騙知識庫許多民眾並不知道，防詐系統的「智慧」其實來自於背後龐大的範例資料庫。為了讓 AskSense 能夠識別各種精心包裝的詐騙話術，我收集了超過 54,500 條在臺灣真實流傳的詐騙訊息，並額外蒐錄了許多用戶親身遭遇的受騙案例。這些原始文字先經過嚴格清洗：重複的句子被刪除，與詐騙無關的短句或垃圾訊息被過濾，並利用工具自動統一簡體與繁體、將「恭喜發財」「紅包」等容易誤判的慣用語標準化。這樣一來，模型訓練時只「看見」真正有助於辨別詐騙的高品質樣本，不必浪費運算力在無意義的噪聲上。更進一步地，我為這套系統量身打造了一份「停用詞表」，把「您好」「請問」這類純禮貌用語排除在外，讓語意向量計算時更能專注於那些最具威脅性的詞彙，如「驗證」「交易」「退款」等核心字眼。想像一下，如果一個系統每次只盯著「驗證」這個詞，卻忽略了前後語境，它很容易把合法銀行通知和詐騙簡訊混為一談；但透過停用無用詞、強化重要詞，AskSense 就能在高維向量空間中精準判斷——只要句子整體意圖接近已知騙局範例，立即發出警示，保護你免於財產損失。 AskSense 願景：什麼都能聽得懂在打造 AskSense 的過程中，我們一開始就選用一個既懂中文、也能處理英文的「多語言 Sentence-BERT」模型（paraphrase-multilingual-M...

About to Launch an 8-Part Series: AskSense from Conception to Deployment

- April 29, 2025

Trigger Moment: Uncovering the Real Need for Semantic Search The spark for this project actually came from our desire to help the customer support team tackle scam messages more effectively. With hundreds of support tickets coming in every day, many contained what looked like legitimate but cleverly disguised phishing attempts. You know the type: “Please verify your account security immediately,” or “You’ve been issued a refund; click this link.” Simply checking for keywords like “security” or “refund” didn’t cut it—this approach not only missed the dangers but also flooded us with a ton of false positives. I realized that to really get ahead of these scams, we needed the system to truly “understand” the full meaning of each message, taking into account context, intent, and nuance. By doing this, we could quickly spot suspicious messages as they came in and warn users in real time, which would help us effectively prevent fraud. So, I decided to revamp the original FAQ-search syst...

即將開啟 8 篇系列分享：AskSense 從構想到落地的全紀錄

- April 29, 2025

啟動契機：挖掘語意搜尋的真實需求真正的啟動契機，其實源自於筆者在一次專案中協助其他團隊偵測詐騙訊息的需求。一般人每天收到大量充斥著看似合法卻暗藏陷阱的留言，單純的關鍵字比對往往抓不到那些巧妙包裝的騙局話術：譬如「立刻確認您的帳戶安全」或「您已獲得退款，請點擊此連結」──這些句子如果只比對「安全」、「退款」等字面，既無法判定其危險風險，也容易造成大量誤報。我意識到，唯有讓機器「理解」整段文字的語意、掌握上下文關聯，才能在訊息剛一流入系統便迅速標記疑似詐騙並警示用戶，真正達到「防詐」的目的。於是，我將原本為 FAQ 搜尋打造的 AskSense 架構改造為一套詐騙偵測管線，先利用多語言 SBERT 將所有傳入訊息轉成語意向量，接著透過訓練好的分類模型或相似度閾值，將那些在「高風險向量空間」中的訊息一網打盡。為了避免遺漏新型詐騙手法，我又結合了「句子聚類」技術，定期對最新收集的可疑範本進行無監督分群分析，並手動標記、細化特徵。這個過程中，我不斷調整向量相似度閾值、優化聚類數量，並且為每一次錯報和漏報寫下詳細日誌，確保模型與流程能夠持續在真實場景下自我修正與進化。系列大綱：八個關鍵章節本系列將以八篇文章，循序拆解 AskSense 的內部密碼，從思路萌芽到雲端佈署，全方位記錄每一步技術抉擇與心路歷程：篇章標題核心內容 1 問題與動機專案一發想的真實痛點與用戶需求 2 初探向量化 Bag-of-Words、TF-IDF 試水與瓶頸 3 詞嵌入初體驗 Word2Vec 巨量模型的掙扎與輕量化 SBERT 的初步實驗 4 模型選擇抉擇性能、準確度、記憶體與多語言支持的權衡——為何最終落在 MiniLM 家族 5 多語言挑戰繁體中文 FAQ 支援、jieba 切詞、OpenCC 轉換與多語言模型的整合 6 優化與重構從「向量冷啟動太慢」到「依賴衝突爆炸」，我如何引入緩存、虛擬環境與增量加載策略 7 從 CLI 到雲端：部署實踐 Streamlit Cloud、CI/CD Pipeline、環境變數保護與日誌監控的實戰 8 未來與擴展 FAISS 向量索引、使用者回饋閉環、混合檢索架構與更多戰略技術棧與工具選擇：從 spaCy 到 ...

Flat LAN Architecture & Zero Trust Gaps: A Security Assessment from Streamlit Testing to IoT Isolation

- April 28, 2025

1. Background & Objectives I’ve been working on an NLP-based fraud‐text detection project using Streamlit as the Web GUI. By default, Streamlit spins up a Python HTTP server on localhost—binding to 0.0.0.0 or 127.0.0.1 and opening a port so that anyone on our LAN can load the interactive interface in their browser. This rapid setup is ideal for quick prototyping, but once that server listens on the LAN interface, every internal user gains direct access. To determine whether I’d effectively left the metaphorical “front door” wide open, I set out to audit our network’s segmentation and routing isolation. My goal was to see how easily an attacker could pivot from that Streamlit entry point to other systems in our flat network—and, by extension, how IoT devices like network printers and IP cameras would be exposed under the same conditions. 2. Testing Workflow & Technical Details Step Tool / Command What It Tells Us 1 ipconfig /all Discovers the machine’s IPv4 ad...

區域網路扁平架構與 Zero Trust 缺口：從 Streamlit 測試到 IoT 隔離的安全評估

- April 28, 2025

一、研究背景與目的近期，筆者在開發一套以自然語言處理（NLP）做詐騙文字偵測的專案，並以 Streamlit 作為 Web GUI 平台，將模型部署在本機端（localhost）並開啟特定埠供區域網路內同仁測試。Streamlit 運行時會在本機啟動一個 Python HTTP 伺服器，預設綁定在 0.0.0.0 或 127.0.0.1 上開放某一 Port，以便在瀏覽器中呈現互動介面。這種方式雖然快速且開發門檻低，但當伺服器綁定 LAN 介面後，內網使用者便能直接存取該測試介面。為了評估這是否在內網中打開了一扇「未關的大門」，我展開了網路分段與路由隔離的檢視，並思考在此扁平網路架構下，攻擊者是否能利用這些測試服務進行橫向移動。後續也將延伸至 IoT 設備（如網路印表機、IP 攝影機）的類似情境，以探討全網段的隔離與安全強化策略。二、檢驗流程與技術說明步驟工具／命令判斷依據 1 ipconfig /all 取得本機 IPv4 與子網遮罩，推算 L3 子網範圍。 2 arp -a 列出同子網內回應 ARP 的 MAC，若數量龐大且同屬同一範圍，代表 L2 扁平。 3 tracert 10.x.x.x 若僅顯示 1 hop 即達目標，代表無路由器介入，L3 亦為扁平子網。在第一步驟中，我使用 Windows 內建的 ipconfig /all 命令，該工具會讀取作業系統的網路介面設定，包括 IPv4/IPv6 位址、子網路遮罩、預設閘道及 DNS 伺服器等資訊。由於開發環境為 Windows， ipconfig 是最直接且無須額外安裝的方式；我曾考慮過在 Linux 上使用 ifconfig 或 ip addr ，但環境差異導致後者不適用於本機測試。此外，也曾嘗試透過 SNMP 查詢交換器表項，但因需額外設定 SNMP 社群字串並具有管理權限，後來捨棄以簡化流程。第二步驟採用的 arp -a 命令能列出本機 ARP 緩存中所有已解析的 IP↔MAC 對應，這代表在 Layer-2 廣播域內活躍的鄰居主機。 arp -a 屬於作業系統核心工具，免額外權限，且回應速度快；我曾一度考慮使用 Nmap 的 ARP 掃描 ( nmap -sn ) 來取得更多細節，但因 Nmap 執...

Finding Freedom and Natural Flow in Life and Relationships

- April 27, 2025

Throughout life's journey, we often try so hard to control outcomes — to ensure we are understood, liked, or to keep connections alive and opportunities secured. We attempt to plan and manage the future according to our expectations. But in truth, deep and lasting human connections are never something you can "earn" by force. And truly living a free life is not something you can "engineer." 🌿 Becoming Someone Who Can Simply "Be Themselves" In relationships, the degree to which others feel comfortable around us is a reflection of how comfortable we are within ourselves. If we are nervous, anxious, or fearful of losing something deep inside, even the most polished surface behavior cannot completely hide it — others will sense the invisible tension. True magnetism is born when we first allow ourselves to: Be nervous when we are nervous Be imperfect without shame Exist authentically without needing to pretend When we embrace ourselv...

How to Investigate Phishing Websites with Redirection (3/5)

- April 09, 2025

In this third installment of our five-part series on phishing website investigation, we dive deep into using Chrome DevTools to analyze and understand how redirection is controlled by JavaScript and other methods. This post is your step-by-step guide to uncovering the inner workings of redirection techniques on scam websites, with practical examples and clear tactics for both beginners and advanced users. 🧰 Introduction to Chrome DevTools Chrome DevTools is an indispensable toolkit for developers and cybersecurity professionals. It allows you to inspect the underlying code and behavior of any website. With DevTools, you can view: JavaScript Code: See exactly which scripts are running, including those that trigger redirections. HTML Structure: Understand the page layout and embedded elements. Redirection Behavior: Identify commands such as window.location that manipulate the browser’s location. Network Requests: Examine the complete redirect chain to trace where e...

How to Investigate Phishing Websites with Redirection (2/5)

- April 08, 2025

In our ongoing cybersecurity series, we are dissecting the redirection techniques often employed by phishing websites. In this post, we focus on Step 2: Recognizing Web Redirection Techniques . By understanding the various redirection methods in detail, you can better analyze and trace scam websites. This step covers: JavaScript-based redirects (e.g., using window.location.href ) HTML Meta Refresh Iframe-based stealth redirects HTTP header 302 redirects (with examples in Flask and PHP) Bonus tactics used by attackers, including URL shorteners and device-based targeting Below, we break down each of these techniques with examples, explanations, and best practices for detection. Recognizing Web Redirection Techniques Understanding Automatic Page Redirects In the coding realm, auto-redirecting a webpage is often implemented with just a few lines of code. Phishing sites commonly exploit these simple methods to transition users quickly to a different URL, often h...

How to Investigate Phishing Websites with Redirection (1/5)

- April 08, 2025

Meta Description: Learn how to detect, analyze, and trace phishing websites that use redirection tricks. A step-by-step cybersecurity guide using Chrome DevTools, Whois, and more. Introduction: Why You Need to Investigate Phishing Redirects Phishing scams remain one of the most prevalent and dangerous cyber threats today, and as attackers grow more creative with their methods, redirection scams have become increasingly common. In our 2025 edition guide, we delve into how redirection techniques are abused to disguise malicious websites, making it harder for users and traditional security filters to identify and block them. A Real-World Example: The .top Phishing Site Imagine encountering a seemingly harmless website with a URL ending in .top —a domain known for its low cost and ease of registration. These cheap domains are a favorite among cybercriminals who use them to set up phishing pages that quickly redirect visitors to more elaborate scams. For instance, an attacker mi...