Compare commits

...

37 Commits

Author SHA1 Message Date
Moon Man 5d4913bb93 Merge remote-tracking branch 'origin/rich-media-db' into spc2 2024-05-05 13:06:01 -05:00
Mark Felder 859ad4dbae Fix broken Rich Media parsing when the image URL is a relative path 2024-05-05 13:51:13 -04:00
Mark Felder b067fbde31 Respect the TTL returned in OpenGraph tags 2024-05-05 13:51:13 -04:00
Mark Felder 68dc81b59e Fix broken tests 2024-05-05 13:51:13 -04:00
Mark Felder 2079e92c5c Increase the :max_body for Rich Media to 5MB
Websites are increasingly getting more bloated with tricks like inlining content (e.g., CNN.com) which puts pages at or above 5MB. This value may still be too low.
2024-05-05 13:51:13 -04:00
Mark Felder a6407f9ba5 RichMedia refactor
Rich Media parsing was previously handled on-demand with a 2 second HTTP request timeout and retained only in Cachex. Every time a Pleroma instance is restarted it will have to request and parse the data for each status with a URL detected. When fetching a batch of statuses they were processed in parallel to attempt to keep the maximum latency at 2 seconds, but often resulted in a timeline appearing to hang during loading due to a URL that could not be successfully reached. URLs which had images links that expire (Amazon AWS) were parsed and inserted with a TTL to ensure the image link would not break.

Rich Media data is now cached in the database and fetched asynchronously. Cachex is used as a read-through cache. When the data becomes available we stream an update to the clients. If the result is returned quickly the experience is almost seamless. Activities were already processed for their Rich Media data during ingestion to warm the cache, so users should not normally encounter the asynchronous loading of the Rich Media data.

Implementation notes:

- The async worker is a Task with a globally unique process name to prevent duplicate processing of the same URL
- The Task will attempt to fetch the data 3 times with increasing sleep time between attempts
- The HTTP request obeys the default HTTP request timeout value instead of 2 seconds
- URLs that cannot be successfully parsed due to an unexpected error receives a negative cache entry for 15 minutes
- URLs that fail with an expected error will receive a negative cache with no TTL
- Activities that have no detected URLs insert a nil value in the Cachex :scrubber_cache so we do not repeat parsing the object content with Floki every time the activity is rendered
- Expiring image URLs are handled with an Oban job
- There is no automatic cleanup of the Rich Media data in the database, but it is safe to delete at any time
- The post draft/preview feature makes the URL processing synchronous so the rendered post preview will have an accurate rendering

Overall performance of timelines and creating new posts which contain URLs is greatly improved.
2024-05-05 13:51:13 -04:00
Moon Man b3c46387fa Merge remote-tracking branch 'origin/develop' into spc2 2024-03-23 11:49:56 -05:00
Moon Man b6344879d6 Merge remote-tracking branch 'origin/logger-metadata' into spc2 2024-03-23 11:49:46 -05:00
lain 987f44d811 Merge branch 'bookmark-folders' into 'develop'
Fix BookmarkFolderView, add test

See merge request pleroma/pleroma!4096
2024-03-20 13:26:47 +00:00
marcin mikołajczak 37ec645ff2 Fix BookmarkFolderView, add test
Signed-off-by: marcin mikołajczak <git@mkljczk.pl>
2024-03-20 13:24:43 +01:00
Mark Felder 462d5aa5cb logger: remove request_id metadata which is not useful 2024-03-19 20:53:40 -04:00
Mark Felder 99cee755d8 Show Logger metadata in dev 2024-03-19 12:15:10 -04:00
Mark Felder 40823462e7 Logger metadata for request path and authenticated user 2024-03-19 12:15:10 -04:00
Mark Felder 7dfd148ff8 Logger metadata for inbound federation requests 2024-03-19 12:15:10 -04:00
lain f775a1931b Merge branch 'transient-validators-defaults' into 'develop'
Set defaults values on transient objects (attachment, poll options) validators

See merge request pleroma/pleroma!4090
2024-03-19 12:44:13 +00:00
Lain Soykaf 4e8a1b40cb Merge branch 'develop' of git.pleroma.social:pleroma/pleroma into transient-validators-defaults 2024-03-19 16:26:02 +04:00
lain 8a14fdbe47 Update transient-validators-defaults.change 2024-03-19 12:03:43 +00:00
lain 4e37cd85ef Merge branch 'fix-bookmark-test' into 'develop'
CI: Move changelog check to later in the pipeline

See merge request pleroma/pleroma!4095
2024-03-19 12:02:10 +00:00
Lain Soykaf 040a980277 Add changelog 2024-03-19 15:03:16 +04:00
Lain Soykaf afae3a94a4 CI: Move changelog check to later in the pipeline
No reason to not run tests.
2024-03-19 13:54:35 +04:00
Lain Soykaf 9617189e96 Tests: Actually run the bookmark folder tests. 2024-03-19 13:51:04 +04:00
lain 8e37f19883 Merge branch 'test-improvements' into 'develop'
Tests: Explicitly set db pool size and max cases to the same value.

See merge request pleroma/pleroma!4094
2024-03-19 07:44:05 +00:00
Lain Soykaf 665947ab2a Tests: Reduced the max case number to make tests more stable. 2024-03-19 11:03:05 +04:00
Lain Soykaf 3cc8414c2e Add changelog 2024-03-19 10:38:29 +04:00
Lain Soykaf 923803a533 Tests: Explicitly set db pool size and max cases to the same value. 2024-03-19 10:34:37 +04:00
lain ca5766c0a7 Merge branch 'postgres-bump' into 'develop'
Update minimum Postgres version to 11.0; disable JIT

See merge request pleroma/pleroma!4093
2024-03-19 04:46:40 +00:00
Mark Felder 357553a64a Remove usage of :persistent_term for Postgres version storage, fix test
This test should not have been passing. The search result's activity id should not be the same id as the local post.

capture_log was not being used. Removed.
2024-03-18 16:27:52 -04:00
Mark Felder b822a912ad Remove test for postgres < 11 2024-03-18 16:15:40 -04:00
Mark Felder 1413d2e517 Remove vestiges of old Postgres support 2024-03-18 15:42:15 -04:00
Mark Felder 7f97fbc1ae Update minimum Postgres version to 11.0; disable JIT
This release is where JIT was introduced and it should be disabled. Pleroma's queries do not benefit from JIT, but it can increase latency of queries.
2024-03-18 15:36:26 -04:00
Haelwenn (lanodan) Monnier 4ad1d02d7e
changelog.d/transient-validators-defaults.change: insert 2024-03-15 16:25:02 +01:00
Haelwenn (lanodan) Monnier 48c22a67de
QuestionOptionsValidator: set default AS types 2024-03-15 16:22:18 +01:00
Haelwenn (lanodan) Monnier 8b651fab1d
AttachmentValidator: Set "Link" as default type 2024-03-15 16:22:18 +01:00
Moon Man 9ca62f74be Merge remote-tracking branch 'origin/logger-metadata' into spc2 2024-02-28 12:42:19 -06:00
Mark Felder fda58c3707 Show Logger metadata in dev 2023-12-17 18:20:34 -05:00
Mark Felder 241c7175bd Logger metadata for request path and authenticated user 2023-12-17 18:20:22 -05:00
Mark Felder f01ad493f3 Logger metadata for inbound federation requests 2023-12-09 18:32:26 -05:00
50 changed files with 1100 additions and 522 deletions

View File

@ -26,10 +26,10 @@ cache: &global_cache_policy
- _build
stages:
- check-changelog
- build
- lint
- test
- check-changelog
- benchmark
- deploy
- release
@ -113,7 +113,7 @@ benchmark:
variables:
MIX_ENV: benchmark
services:
- name: postgres:9.6-alpine
- name: postgres:11.22-alpine
alias: postgres
command: ["postgres", "-c", "fsync=off", "-c", "synchronous_commit=off", "-c", "full_page_writes=off"]
script:

View File

View File

@ -0,0 +1 @@
Disable jit by default for PostgreSQL

View File

@ -0,0 +1 @@
Refactored Rich Media to cache the content in the database. Fetching operations that could block status rendering have been eliminated.

View File

View File

@ -0,0 +1 @@
Set default values on validators for transient objects (attachment, poll options)

View File

@ -131,13 +131,13 @@
config :logger, :console,
level: :debug,
format: "\n$time $metadata[$level] $message\n",
metadata: [:request_id]
metadata: [:actor, :path, :request_id, :type, :user]
config :logger, :ex_syslogger,
level: :debug,
ident: "pleroma",
format: "$metadata[$level] $message",
metadata: [:request_id]
metadata: [:actor, :path, :request_id, :type, :user]
config :mime, :types, %{
"application/xml" => ["xml"],
@ -415,10 +415,6 @@
config :pleroma, :mrf_inline_quote, template: "<bdi>RT:</bdi> {url}"
config :pleroma, :mrf_force_mention,
mention_parent: true,
mention_quoted: true
config :pleroma, :rich_media,
enabled: true,
ignore_hosts: [],
@ -428,7 +424,11 @@
Pleroma.Web.RichMedia.Parsers.OEmbed
],
failure_backoff: 60_000,
ttl_setters: [Pleroma.Web.RichMedia.Parser.TTL.AwsSignedUrl]
ttl_setters: [
Pleroma.Web.RichMedia.Parser.TTL.AwsSignedUrl,
Pleroma.Web.RichMedia.Parser.TTL.Opengraph
],
max_body: 5_000_000
config :pleroma, :media_proxy,
enabled: false,
@ -575,7 +575,8 @@
attachments_cleanup: 1,
new_users_digest: 1,
mute_expire: 5,
search_indexing: 10
search_indexing: 10,
rich_media_expiration: 2
],
plugins: [Oban.Plugins.Pruner],
crontab: [

View File

@ -35,8 +35,8 @@
# configured to run both http and https servers on
# different ports.
# Do not include metadata nor timestamps in development logs
config :logger, :console, format: "[$level] $message\n"
# Do not include timestamps in development logs
config :logger, :console, format: "$metadata[$level] $message\n"
# Set a higher stacktrace during development. Avoid configuring such
# in production as building large stacktraces may be expensive.

View File

@ -49,7 +49,7 @@
hostname: System.get_env("DB_HOST") || "localhost",
port: System.get_env("DB_PORT") || "5432",
pool: Ecto.Adapters.SQL.Sandbox,
pool_size: 50
pool_size: System.schedulers_online() * 2
config :pleroma, :dangerzone, override_repo_pool_size: true
@ -61,7 +61,8 @@
config :pleroma, :rich_media,
enabled: false,
ignore_hosts: [],
ignore_tld: ["local", "localdomain", "lan"]
ignore_tld: ["local", "localdomain", "lan"],
max_body: 2_000_000
config :pleroma, :instance,
multi_factor_authentication: [
@ -174,6 +175,8 @@
config :pleroma, Pleroma.Emoji.Loader, test_emoji: true
config :pleroma, Pleroma.Web.RichMedia.Backfill, provider: Pleroma.Web.RichMedia.Backfill
if File.exists?("./config/test.secret.exs") do
import_config "test.secret.exs"
else

View File

@ -12,8 +12,8 @@ Note: This article is potentially outdated because at this time we may not have
### 必要なソフトウェア
- PostgreSQL 9.6以上 (Ubuntu16.04では9.5しか提供されていないので,[](https://www.postgresql.org/download/linux/ubuntu/)こちらから新しいバージョンを入手してください)
- `postgresql-contrib` 9.6以上 (同上)
- PostgreSQL 11.0以上 (Ubuntu16.04では9.5しか提供されていないので,[](https://www.postgresql.org/download/linux/ubuntu/)こちらから新しいバージョンを入手してください)
- `postgresql-contrib` 11.0以上 (同上)
- Elixir 1.8 以上 ([Debianのリポジトリからインストールしないこと ここからインストールすること!](https://elixir-lang.org/install.html#unix-and-unix-like)。または [asdf](https://github.com/asdf-vm/asdf) をpleromaユーザーでインストールしてください)
- `erlang-dev`
- `erlang-nox`

View File

@ -1,6 +1,6 @@
## Required dependencies
* PostgreSQL >=9.6
* PostgreSQL >=11.0
* Elixir >=1.11.0 <1.15
* Erlang OTP >=22.2.0 (supported: <27)
* git

View File

@ -119,28 +119,7 @@ def start(_type, _args) do
max_restarts = Application.get_env(:pleroma, __MODULE__)[:max_restarts]
opts = [strategy: :one_for_one, name: Pleroma.Supervisor, max_restarts: max_restarts]
result = Supervisor.start_link(children, opts)
set_postgres_server_version()
result
end
defp set_postgres_server_version do
version =
with %{rows: [[version]]} <- Ecto.Adapters.SQL.query!(Pleroma.Repo, "show server_version"),
{num, _} <- Float.parse(version) do
num
else
e ->
Logger.warning(
"Could not get the postgres version: #{inspect(e)}.\nSetting the default value of 9.6"
)
9.6
end
:persistent_term.put({Pleroma.Repo, :postgres_version}, version)
Supervisor.start_link(children, opts)
end
def load_custom_modules do

View File

@ -65,20 +65,16 @@ def ensure_scrubbed_html(
end
end
@spec extract_first_external_url_from_object(Pleroma.Object.t()) ::
{:ok, String.t()} | {:error, :no_content}
@spec extract_first_external_url_from_object(Pleroma.Object.t()) :: String.t() | nil
def extract_first_external_url_from_object(%{data: %{"content" => content}})
when is_binary(content) do
url =
content
|> Floki.parse_fragment!()
|> Floki.find("a:not(.mention,.hashtag,.attachment,[rel~=\"tag\"])")
|> Enum.take(1)
|> Floki.attribute("href")
|> Enum.at(0)
{:ok, url}
content
|> Floki.parse_fragment!()
|> Floki.find("a:not(.mention,.hashtag,.attachment,[rel~=\"tag\"])")
|> Enum.take(1)
|> Floki.attribute("href")
|> Enum.at(0)
end
def extract_first_external_url_from_object(_), do: {:error, :no_content}
def extract_first_external_url_from_object(_), do: nil
end

View File

@ -23,19 +23,12 @@ def search(user, search_query, options \\ []) do
offset = Keyword.get(options, :offset, 0)
author = Keyword.get(options, :author)
search_function =
if :persistent_term.get({Pleroma.Repo, :postgres_version}) >= 11 do
:websearch
else
:plain
end
try do
Activity
|> Activity.with_preloaded_object()
|> Activity.restrict_deactivated_users()
|> restrict_public(user)
|> query_with(index_type, search_query, search_function)
|> query_with(index_type, search_query, :websearch)
|> maybe_restrict_local(user)
|> maybe_restrict_author(author)
|> maybe_restrict_blocked(user)

View File

@ -147,9 +147,7 @@ def insert(map, local \\ true, fake \\ false, bypass_actor_check \\ false) when
# Splice in the child object if we have one.
activity = Maps.put_if_present(activity, :object, object)
ConcurrentLimiter.limit(Pleroma.Web.RichMedia.Helpers, fn ->
Task.start(fn -> Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity) end)
end)
Pleroma.Web.RichMedia.Card.get_by_activity(activity)
# Add local posts to search index
if local, do: Pleroma.Search.add_to_index(activity)
@ -177,7 +175,7 @@ def insert(map, local \\ true, fake \\ false, bypass_actor_check \\ false) when
id: "pleroma:fakeid"
}
Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
Pleroma.Web.RichMedia.Card.get_by_activity(activity)
{:ok, activity}
{:remote_limit_pass, _} ->

View File

@ -52,6 +52,7 @@ defmodule Pleroma.Web.ActivityPub.ActivityPubController do
when action in [:activity, :object]
)
plug(:log_inbox_metadata when action in [:inbox])
plug(:set_requester_reachable when action in [:inbox])
plug(:relay_active? when action in [:relay])
@ -521,6 +522,13 @@ defp set_requester_reachable(%Plug.Conn{} = conn, _) do
conn
end
defp log_inbox_metadata(conn = %{params: %{"actor" => actor, "type" => type}}, _) do
Logger.metadata(actor: actor, type: type)
conn
end
defp log_inbox_metadata(conn, _), do: conn
def upload_media(%{assigns: %{user: %User{} = user}} = conn, %{"file" => file} = data) do
with {:ok, object} <-
ActivityPub.upload(

View File

@ -12,13 +12,13 @@ defmodule Pleroma.Web.ActivityPub.ObjectValidators.AttachmentValidator do
@primary_key false
embedded_schema do
field(:id, :string)
field(:type, :string)
field(:type, :string, default: "Link")
field(:mediaType, ObjectValidators.MIME, default: "application/octet-stream")
field(:name, :string)
field(:blurhash, :string)
embeds_many :url, UrlObjectValidator, primary_key: false do
field(:type, :string)
field(:type, :string, default: "Link")
field(:href, ObjectValidators.Uri)
field(:mediaType, ObjectValidators.MIME, default: "application/octet-stream")
field(:width, :integer)

View File

@ -14,10 +14,10 @@ defmodule Pleroma.Web.ActivityPub.ObjectValidators.QuestionOptionsValidator do
embeds_one :replies, Replies, primary_key: false do
field(:totalItems, :integer)
field(:type, :string)
field(:type, :string, default: "Collection")
end
field(:type, :string)
field(:type, :string, default: "Note")
end
def changeset(struct, data) do

View File

@ -227,9 +227,7 @@ def handle(%{data: %{"type" => "Create"}} = activity, meta) do
end
end
ConcurrentLimiter.limit(Pleroma.Web.RichMedia.Helpers, fn ->
Task.start(fn -> Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity) end)
end)
Pleroma.Web.RichMedia.Card.get_by_activity(activity)
Pleroma.Search.add_to_index(Map.put(activity, :object, object))

View File

@ -38,6 +38,8 @@ defmodule Pleroma.Web.Endpoint do
plug(Plug.Telemetry, event_prefix: [:phoenix, :endpoint])
plug(Pleroma.Web.Plugs.LoggerMetadataPath)
plug(Pleroma.Web.Plugs.SetLocalePlug)
plug(CORSPlug)
plug(Pleroma.Web.Plugs.HTTPSecurityPlug)

View File

@ -25,6 +25,7 @@ defmodule Pleroma.Web.MastodonAPI.StatusController do
alias Pleroma.Web.OAuth.Token
alias Pleroma.Web.Plugs.OAuthScopesPlug
alias Pleroma.Web.Plugs.RateLimiter
alias Pleroma.Web.RichMedia.Card
plug(Pleroma.Web.ApiSpec.CastAndValidate, replace_params: false)
@ -480,9 +481,9 @@ def card(
_
) do
with %Activity{} = activity <- Activity.get_by_id(status_id),
true <- Visibility.visible_for_user?(activity, user) do
data = Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
render(conn, "card.json", data)
true <- Visibility.visible_for_user?(activity, user),
%Card{} = card_data <- Card.get_by_activity(activity) do
render(conn, "card.json", card_data)
else
_ -> render_error(conn, :not_found, "Record not found")
end

View File

@ -21,6 +21,7 @@ defmodule Pleroma.Web.MastodonAPI.StatusView do
alias Pleroma.Web.MastodonAPI.StatusView
alias Pleroma.Web.MediaProxy
alias Pleroma.Web.PleromaAPI.EmojiReactionController
alias Pleroma.Web.RichMedia.Card
import Pleroma.Web.ActivityPub.Visibility, only: [get_visibility: 1, visible_for_user?: 2]
@ -29,9 +30,7 @@ defmodule Pleroma.Web.MastodonAPI.StatusView do
# pagination is restricted to 40 activities at a time
defp fetch_rich_media_for_activities(activities) do
Enum.each(activities, fn activity ->
spawn(fn ->
Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
end)
spawn(fn -> Card.get_by_activity(activity) end)
end)
end
@ -113,9 +112,7 @@ def render("index.json", opts) do
# To do: check AdminAPIControllerTest on the reasons behind nil activities in the list
activities = Enum.filter(opts.activities, & &1)
# Start fetching rich media before doing anything else, so that later calls to get the cards
# only block for timeout in the worst case, as opposed to
# length(activities_with_links) * timeout
# Start prefetching rich media before doing anything else
fetch_rich_media_for_activities(activities)
replied_to_activities = get_replied_to_activities(activities)
quoted_activities = get_quoted_activities(activities)
@ -364,7 +361,11 @@ def render("show.json", %{activity: %{data: %{"object" => _object}} = activity}
summary = object.data["summary"] || ""
card = render("card.json", Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity))
card =
case Card.get_by_activity(activity) do
%Card{} = result -> render("card.json", result)
_ -> nil
end
url =
if user.local do
@ -567,15 +568,8 @@ def render("source.json", %{activity: %{data: %{"object" => _object}} = activity
}
end
def render("card.json", %{rich_media: rich_media, page_url: page_url}) do
page_url_data = URI.parse(page_url)
page_url_data =
if is_binary(rich_media["url"]) do
URI.merge(page_url_data, URI.parse(rich_media["url"]))
else
page_url_data
end
def render("card.json", %Card{fields: rich_media}) do
page_url_data = URI.parse(rich_media["url"])
page_url = page_url_data |> to_string

View File

@ -13,10 +13,8 @@ def render("show.json", %{folder: %BookmarkFolder{} = folder}) do
%{
id: folder.id |> to_string(),
name: folder.name,
emoji: get_emoji(folder.emoji),
source: %{
emoji: folder.emoji
}
emoji: folder.emoji,
emoji_url: get_emoji_url(folder.emoji)
}
end
@ -24,18 +22,18 @@ def render("index.json", %{folders: folders} = opts) do
render_many(folders, __MODULE__, "show.json", Map.delete(opts, :folders))
end
defp get_emoji(nil) do
defp get_emoji_url(nil) do
nil
end
defp get_emoji(emoji) do
defp get_emoji_url(emoji) do
if Emoji.unicode?(emoji) do
emoji
nil
else
emoji = Emoji.get(emoji)
if emoji != nil do
Endpoint.url() |> URI.merge(emoji.relative_url) |> to_string()
Endpoint.url() |> URI.merge(emoji.file) |> to_string()
else
nil
end

View File

@ -9,6 +9,7 @@ defmodule Pleroma.Web.PleromaAPI.Chat.MessageReferenceView do
alias Pleroma.User
alias Pleroma.Web.CommonAPI.Utils
alias Pleroma.Web.MastodonAPI.StatusView
alias Pleroma.Web.RichMedia.Card
@cachex Pleroma.Config.get([:cachex, :provider], Cachex)
@ -23,6 +24,12 @@ def render(
}
}
) do
card =
case Card.get_by_object(object) do
%Card{} = card_data -> StatusView.render("card.json", card_data)
_ -> nil
end
%{
id: id |> to_string(),
content: chat_message["content"],
@ -34,11 +41,7 @@ def render(
chat_message["attachment"] &&
StatusView.render("attachment.json", attachment: chat_message["attachment"]),
unread: unread,
card:
StatusView.render(
"card.json",
Pleroma.Web.RichMedia.Helpers.fetch_data_for_object(object)
)
card: card
}
|> put_idempotency_key()
end

View File

@ -0,0 +1,12 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.Plugs.LoggerMetadataPath do
def init(opts), do: opts
def call(conn, _) do
Logger.metadata(path: conn.request_path)
conn
end
end

View File

@ -0,0 +1,18 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.Plugs.LoggerMetadataUser do
alias Pleroma.User
def init(opts), do: opts
def call(%{assigns: %{user: user = %User{}}} = conn, _) do
Logger.metadata(user: user.nickname)
conn
end
def call(conn, _) do
conn
end
end

View File

@ -0,0 +1,101 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.Backfill.Task do
alias Pleroma.Web.RichMedia.Backfill
def run(args) do
Task.Supervisor.start_child(Pleroma.TaskSupervisor, Backfill, :run, [args],
name: {:global, {:rich_media, args.url_hash}}
)
end
end
defmodule Pleroma.Web.RichMedia.Backfill do
alias Pleroma.Web.RichMedia.Card
alias Pleroma.Web.RichMedia.Parser
alias Pleroma.Web.RichMedia.Parser.TTL
alias Pleroma.Workers.RichMediaExpirationWorker
require Logger
@backfiller Pleroma.Config.get([__MODULE__, :provider], Pleroma.Web.RichMedia.Backfill.Task)
@cachex Pleroma.Config.get([:cachex, :provider], Cachex)
@max_attempts 3
@retry 5_000
def start(%{url: url} = args) when is_binary(url) do
url_hash = Card.url_to_hash(url)
args =
args
|> Map.put(:attempt, 1)
|> Map.put(:url_hash, url_hash)
@backfiller.run(args)
end
def run(%{url: url, url_hash: url_hash, attempt: attempt} = args)
when attempt <= @max_attempts do
case Parser.parse(url) do
{:ok, fields} ->
{:ok, card} = Card.create(url, fields)
maybe_schedule_expiration(url, fields)
if Map.has_key?(args, :activity_id) do
stream_update(args)
end
warm_cache(url_hash, card)
{:error, {:invalid_metadata, fields}} ->
Logger.debug("Rich media incomplete or invalid metadata for #{url}: #{inspect(fields)}")
negative_cache(url_hash)
{:error, :body_too_large} ->
Logger.error("Rich media error for #{url}: :body_too_large")
negative_cache(url_hash)
{:error, {:content_type, type}} ->
Logger.debug("Rich media error for #{url}: :content_type is #{type}")
negative_cache(url_hash)
e ->
Logger.debug("Rich media error for #{url}: #{inspect(e)}")
:timer.sleep(@retry * attempt)
run(%{args | attempt: attempt + 1})
end
end
def run(%{url: url, url_hash: url_hash}) do
Logger.debug("Rich media failure for #{url}")
negative_cache(url_hash, :timer.minutes(15))
end
defp maybe_schedule_expiration(url, fields) do
case TTL.process(fields, url) do
{:ok, ttl} when is_number(ttl) ->
timestamp = DateTime.from_unix!(ttl)
RichMediaExpirationWorker.new(%{"url" => url}, scheduled_at: timestamp)
|> Oban.insert()
_ ->
:ok
end
end
defp stream_update(%{activity_id: activity_id}) do
Pleroma.Activity.get_by_id(activity_id)
|> Pleroma.Activity.normalize()
|> Pleroma.Web.ActivityPub.ActivityPub.stream_out()
end
defp warm_cache(key, val), do: @cachex.put(:rich_media_cache, key, val)
defp negative_cache(key, ttl \\ nil), do: @cachex.put(:rich_media_cache, key, nil, ttl: ttl)
end

View File

@ -0,0 +1,157 @@
defmodule Pleroma.Web.RichMedia.Card do
use Ecto.Schema
import Ecto.Changeset
import Ecto.Query
alias Pleroma.Activity
alias Pleroma.HTML
alias Pleroma.Object
alias Pleroma.Repo
alias Pleroma.Web.RichMedia.Backfill
alias Pleroma.Web.RichMedia.Parser
@cachex Pleroma.Config.get([:cachex, :provider], Cachex)
@config_impl Application.compile_env(:pleroma, [__MODULE__, :config_impl], Pleroma.Config)
@type t :: %__MODULE__{}
schema "rich_media_card" do
field(:url_hash, :binary)
field(:fields, :map)
timestamps()
end
@doc false
def changeset(card, attrs) do
card
|> cast(attrs, [:url_hash, :fields])
|> validate_required([:url_hash, :fields])
|> unique_constraint(:url_hash)
end
@spec create(String.t(), map()) :: {:ok, t()}
def create(url, fields) do
url_hash = url_to_hash(url)
fields = Map.put_new(fields, "url", url)
%__MODULE__{}
|> changeset(%{url_hash: url_hash, fields: fields})
|> Repo.insert(on_conflict: {:replace, [:fields]}, conflict_target: :url_hash)
end
@spec delete(String.t()) :: {:ok, Ecto.Schema.t()} | {:error, Ecto.Changeset.t()} | :ok
def delete(url) do
url_hash = url_to_hash(url)
@cachex.del(:rich_media_cache, url_hash)
case get_by_url(url) do
%__MODULE{} = card -> Repo.delete(card)
nil -> :ok
end
end
@spec get_by_url(String.t() | nil) :: t() | nil | :error
def get_by_url(url) when is_binary(url) do
if @config_impl.get([:rich_media, :enabled]) do
url_hash = url_to_hash(url)
@cachex.fetch!(:rich_media_cache, url_hash, fn _ ->
result =
__MODULE__
|> where(url_hash: ^url_hash)
|> Repo.one()
case result do
%__MODULE__{} = card -> {:commit, card}
_ -> {:ignore, nil}
end
end)
else
:error
end
end
def get_by_url(nil), do: nil
@spec get_or_backfill_by_url(String.t(), map()) :: t() | nil
def get_or_backfill_by_url(url, backfill_opts \\ %{}) do
case get_by_url(url) do
%__MODULE__{} = card ->
card
nil ->
backfill_opts = Map.put(backfill_opts, :url, url)
Backfill.start(backfill_opts)
nil
:error ->
nil
end
end
@spec get_by_object(Object.t()) :: t() | nil | :error
def get_by_object(object) do
case HTML.extract_first_external_url_from_object(object) do
nil -> nil
url -> get_or_backfill_by_url(url)
end
end
@spec get_by_activity(Activity.t()) :: t() | nil | :error
# Fake/Draft activity
def get_by_activity(%Activity{id: "pleroma:fakeid"} = activity) do
with %Object{} = object <- Object.normalize(activity, fetch: false),
url when not is_nil(url) <- HTML.extract_first_external_url_from_object(object) do
case get_by_url(url) do
# Cache hit
%__MODULE__{} = card ->
card
# Cache miss, but fetch for rendering the Draft
_ ->
with {:ok, fields} <- Parser.parse(url),
{:ok, card} <- create(url, fields) do
card
else
_ -> nil
end
end
else
_ ->
nil
end
end
def get_by_activity(activity) do
with %Object{} = object <- Object.normalize(activity, fetch: false),
{_, nil} <- {:cached, get_cached_url(object, activity.id)} do
nil
else
{:cached, url} ->
get_or_backfill_by_url(url, %{activity_id: activity.id})
_ ->
:error
end
end
@spec url_to_hash(String.t()) :: String.t()
def url_to_hash(url) do
:crypto.hash(:sha256, url) |> Base.encode16(case: :lower)
end
defp get_cached_url(object, activity_id) do
key = "URL|#{activity_id}"
@cachex.fetch!(:scrubber_cache, key, fn _ ->
url = HTML.extract_first_external_url_from_object(object)
Activity.HTML.add_cache_key_for(activity_id, key)
{:commit, url}
end)
end
end

View File

@ -3,65 +3,13 @@
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.Helpers do
alias Pleroma.Activity
alias Pleroma.HTML
alias Pleroma.Object
alias Pleroma.Web.RichMedia.Parser
@cachex Pleroma.Config.get([:cachex, :provider], Cachex)
@config_impl Application.compile_env(:pleroma, [__MODULE__, :config_impl], Pleroma.Config)
@options [
pool: :media,
max_body: 2_000_000,
recv_timeout: 2_000
]
def fetch_data_for_object(object) do
with true <- @config_impl.get([:rich_media, :enabled]),
{:ok, page_url} <-
HTML.extract_first_external_url_from_object(object),
{:ok, rich_media} <- Parser.parse(page_url) do
%{page_url: page_url, rich_media: rich_media}
else
_ -> %{}
end
end
def fetch_data_for_activity(%Activity{data: %{"type" => "Create"}} = activity) do
with true <- @config_impl.get([:rich_media, :enabled]),
%Object{} = object <- Object.normalize(activity, fetch: false) do
if object.data["fake"] do
fetch_data_for_object(object)
else
key = "URL|#{activity.id}"
@cachex.fetch!(:scrubber_cache, key, fn _ ->
result = fetch_data_for_object(object)
cond do
match?(%{page_url: _, rich_media: _}, result) ->
Activity.HTML.add_cache_key_for(activity.id, key)
{:commit, result}
true ->
{:ignore, %{}}
end
end)
end
else
_ -> %{}
end
end
def fetch_data_for_activity(_), do: %{}
alias Pleroma.Config
def rich_media_get(url) do
headers = [{"user-agent", Pleroma.Application.user_agent() <> "; Bot"}]
head_check =
case Pleroma.HTTP.head(url, headers, @options) do
case Pleroma.HTTP.head(url, headers, http_options()) do
# If the HEAD request didn't reach the server for whatever reason,
# we assume the GET that comes right after won't either
{:error, _} = e ->
@ -76,7 +24,7 @@ def rich_media_get(url) do
:ok
end
with :ok <- head_check, do: Pleroma.HTTP.get(url, headers, @options)
with :ok <- head_check, do: Pleroma.HTTP.get(url, headers, http_options())
end
defp check_content_type(headers) do
@ -92,12 +40,13 @@ defp check_content_type(headers) do
end
end
@max_body @options[:max_body]
defp check_content_length(headers) do
max_body = Keyword.get(http_options(), :max_body)
case List.keyfind(headers, "content-length", 0) do
{_, maybe_content_length} ->
case Integer.parse(maybe_content_length) do
{content_length, ""} when content_length <= @max_body -> :ok
{content_length, ""} when content_length <= max_body -> :ok
{_, ""} -> {:error, :body_too_large}
_ -> :ok
end
@ -106,4 +55,11 @@ defp check_content_length(headers) do
:ok
end
end
defp http_options() do
[
pool: :media,
max_body: Config.get([:rich_media, :max_body], 5_000_000)
]
end
end

View File

@ -5,134 +5,28 @@
defmodule Pleroma.Web.RichMedia.Parser do
require Logger
@cachex Pleroma.Config.get([:cachex, :provider], Cachex)
@config_impl Application.compile_env(:pleroma, [__MODULE__, :config_impl], Pleroma.Config)
defp parsers do
Pleroma.Config.get([:rich_media, :parsers])
end
def parse(nil), do: {:error, "No URL provided"}
def parse(nil), do: nil
@spec parse(String.t()) :: {:ok, map()} | {:error, any()}
def parse(url) do
with :ok <- validate_page_url(url),
{:ok, data} <- get_cached_or_parse(url),
{:ok, _} <- set_ttl_based_on_image(data, url) do
{:ok, data} <- parse_url(url) do
data = Map.put(data, "url", url)
{:ok, data}
end
end
defp get_cached_or_parse(url) do
case @cachex.fetch(:rich_media_cache, url, fn ->
case parse_url(url) do
{:ok, _} = res ->
{:commit, res}
{:error, reason} = e ->
# Unfortunately we have to log errors here, instead of doing that
# along with ttl setting at the bottom. Otherwise we can get log spam
# if more than one process was waiting for the rich media card
# while it was generated. Ideally we would set ttl here as well,
# so we don't override it number_of_waiters_on_generation
# times, but one, obviously, can't set ttl for not-yet-created entry
# and Cachex doesn't support returning ttl from the fetch callback.
log_error(url, reason)
{:commit, e}
end
end) do
{action, res} when action in [:commit, :ok] ->
case res do
{:ok, _data} = res ->
res
{:error, reason} = e ->
if action == :commit, do: set_error_ttl(url, reason)
e
end
{:error, e} ->
{:error, {:cachex_error, e}}
end
end
defp set_error_ttl(_url, :body_too_large), do: :ok
defp set_error_ttl(_url, {:content_type, _}), do: :ok
# The TTL is not set for the errors above, since they are unlikely to change
# with time
defp set_error_ttl(url, _reason) do
ttl = Pleroma.Config.get([:rich_media, :failure_backoff], 60_000)
@cachex.expire(:rich_media_cache, url, ttl)
:ok
end
defp log_error(url, {:invalid_metadata, data}) do
Logger.debug(fn -> "Incomplete or invalid metadata for #{url}: #{inspect(data)}" end)
end
defp log_error(url, reason) do
Logger.warning(fn -> "Rich media error for #{url}: #{inspect(reason)}" end)
end
@doc """
Set the rich media cache based on the expiration time of image.
Adopt behaviour `Pleroma.Web.RichMedia.Parser.TTL`
## Example
defmodule MyModule do
@behaviour Pleroma.Web.RichMedia.Parser.TTL
def ttl(data, url) do
image_url = Map.get(data, :image)
# do some parsing in the url and get the ttl of the image
# and return ttl is unix time
parse_ttl_from_url(image_url)
end
end
Define the module in the config
config :pleroma, :rich_media,
ttl_setters: [MyModule]
"""
@spec set_ttl_based_on_image(map(), String.t()) ::
{:ok, integer() | :noop} | {:error, :no_key}
def set_ttl_based_on_image(data, url) do
case get_ttl_from_image(data, url) do
ttl when is_number(ttl) ->
ttl = ttl * 1000
case @cachex.expire_at(:rich_media_cache, url, ttl) do
{:ok, true} -> {:ok, ttl}
{:ok, false} -> {:error, :no_key}
end
_ ->
{:ok, :noop}
end
end
defp get_ttl_from_image(data, url) do
[:rich_media, :ttl_setters]
|> Pleroma.Config.get()
|> Enum.reduce({:ok, nil}, fn
module, {:ok, _ttl} ->
module.ttl(data, url)
_, error ->
error
end)
end
def parse_url(url) do
defp parse_url(url) do
with {:ok, %Tesla.Env{body: html}} <- Pleroma.Web.RichMedia.Helpers.rich_media_get(url),
{:ok, html} <- Floki.parse_document(html) do
html
|> maybe_parse()
|> Map.put("url", url)
|> clean_parsed_data()
|> check_parsed_data()
end

View File

@ -4,4 +4,17 @@
defmodule Pleroma.Web.RichMedia.Parser.TTL do
@callback ttl(map(), String.t()) :: integer() | nil
@spec process(map(), String.t()) :: {:ok, integer() | nil}
def process(data, url) do
[:rich_media, :ttl_setters]
|> Pleroma.Config.get()
|> Enum.reduce_while({:ok, nil}, fn
module, acc ->
case module.ttl(data, url) do
ttl when is_number(ttl) -> {:halt, {:ok, ttl}}
_ -> {:cont, acc}
end
end)
end
end

View File

@ -7,7 +7,7 @@ defmodule Pleroma.Web.RichMedia.Parser.TTL.AwsSignedUrl do
@impl true
def ttl(data, _url) do
image = Map.get(data, :image)
image = Map.get(data, "image")
if aws_signed_url?(image) do
image
@ -15,14 +15,15 @@ def ttl(data, _url) do
|> format_query_params()
|> get_expiration_timestamp()
else
{:error, "Not aws signed url #{inspect(image)}"}
nil
end
end
defp aws_signed_url?(image) when is_binary(image) and image != "" do
%URI{host: host, query: query} = URI.parse(image)
String.contains?(host, "amazonaws.com") and String.contains?(query, "X-Amz-Expires")
is_binary(host) and String.contains?(host, "amazonaws.com") and
String.contains?(query, "X-Amz-Expires")
end
defp aws_signed_url?(_), do: nil

View File

@ -0,0 +1,19 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.Parser.TTL.Opengraph do
@behaviour Pleroma.Web.RichMedia.Parser.TTL
@impl true
def ttl(%{"ttl" => ttl_string}, _url) do
with ttl <- String.to_integer(ttl_string) do
now = DateTime.utc_now() |> DateTime.to_unix()
now + ttl
else
_ -> nil
end
end
def ttl(_, _), do: nil
end

View File

@ -29,6 +29,7 @@ defmodule Pleroma.Web.Router do
pipeline :browser do
plug(:accepts, ["html"])
plug(:fetch_session)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :oauth do
@ -67,12 +68,14 @@ defmodule Pleroma.Web.Router do
plug(:fetch_session)
plug(:authenticate)
plug(OpenApiSpex.Plug.PutApiSpec, module: Pleroma.Web.ApiSpec)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :no_auth_or_privacy_expectations_api do
plug(:base_api)
plug(:after_auth)
plug(Pleroma.Web.Plugs.IdempotencyPlug)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
# Pipeline for app-related endpoints (no user auth checks — app-bound tokens must be supported)
@ -83,12 +86,14 @@ defmodule Pleroma.Web.Router do
pipeline :api do
plug(:expect_public_instance_or_user_authentication)
plug(:no_auth_or_privacy_expectations_api)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :authenticated_api do
plug(:expect_user_authentication)
plug(:no_auth_or_privacy_expectations_api)
plug(Pleroma.Web.Plugs.EnsureAuthenticatedPlug)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :admin_api do
@ -99,6 +104,7 @@ defmodule Pleroma.Web.Router do
plug(Pleroma.Web.Plugs.EnsureAuthenticatedPlug)
plug(Pleroma.Web.Plugs.UserIsStaffPlug)
plug(Pleroma.Web.Plugs.IdempotencyPlug)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :require_admin do
@ -179,6 +185,7 @@ defmodule Pleroma.Web.Router do
plug(:browser)
plug(:authenticate)
plug(Pleroma.Web.Plugs.EnsureUserTokenAssignsPlug)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :well_known do
@ -193,6 +200,7 @@ defmodule Pleroma.Web.Router do
pipeline :pleroma_api do
plug(:accepts, ["html", "json"])
plug(OpenApiSpex.Plug.PutApiSpec, module: Pleroma.Web.ApiSpec)
plug(Pleroma.Web.Plugs.LoggerMetadataUser)
end
pipeline :mailbox_preview do

View File

@ -0,0 +1,15 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Workers.RichMediaExpirationWorker do
alias Pleroma.Web.RichMedia.Card
use Oban.Worker,
queue: :rich_media_expiration
@impl Oban.Worker
def perform(%Job{args: %{"url" => url} = _args}) do
Card.delete(url)
end
end

View File

@ -0,0 +1,14 @@
defmodule Pleroma.Repo.Migrations.CreateRichMediaCard do
use Ecto.Migration
def change do
create table(:rich_media_card) do
add(:url_hash, :bytea)
add(:fields, :map)
timestamps()
end
create(unique_index(:rich_media_card, [:url_hash]))
end
end

392
test/fixtures/rich_media/reddit.html vendored Normal file

File diff suppressed because one or more lines are too long

View File

@ -202,7 +202,7 @@ test "extracts the url" do
})
object = Object.normalize(activity, fetch: false)
{:ok, url} = HTML.extract_first_external_url_from_object(object)
url = HTML.extract_first_external_url_from_object(object)
assert url == "https://github.com/komeiji-satori/Dress"
end
@ -217,7 +217,7 @@ test "skips mentions" do
})
object = Object.normalize(activity, fetch: false)
{:ok, url} = HTML.extract_first_external_url_from_object(object)
url = HTML.extract_first_external_url_from_object(object)
assert url == "https://github.com/syuilo/misskey/blob/develop/docs/setup.en.md"
@ -233,7 +233,7 @@ test "skips hashtags" do
})
object = Object.normalize(activity, fetch: false)
{:ok, url} = HTML.extract_first_external_url_from_object(object)
url = HTML.extract_first_external_url_from_object(object)
assert url == "https://www.pixiv.net/member_illust.php?mode=medium&illust_id=72255140"
end
@ -249,7 +249,7 @@ test "skips microformats hashtags" do
})
object = Object.normalize(activity, fetch: false)
{:ok, url} = HTML.extract_first_external_url_from_object(object)
url = HTML.extract_first_external_url_from_object(object)
assert url == "https://www.pixiv.net/member_illust.php?mode=medium&illust_id=72255140"
end
@ -261,7 +261,7 @@ test "does not crash when there is an HTML entity in a link" do
object = Object.normalize(activity, fetch: false)
assert {:ok, nil} = HTML.extract_first_external_url_from_object(object)
assert nil == HTML.extract_first_external_url_from_object(object)
end
test "skips attachment links" do
@ -275,7 +275,7 @@ test "skips attachment links" do
object = Object.normalize(activity, fetch: false)
assert {:ok, nil} = HTML.extract_first_external_url_from_object(object)
assert nil == HTML.extract_first_external_url_from_object(object)
end
end
end

View File

@ -35,21 +35,6 @@ test "it does not find local-only posts for anonymous users" do
assert [] = Search.search(nil, "wednesday")
end
test "using plainto_tsquery on postgres < 11" do
old_version = :persistent_term.get({Pleroma.Repo, :postgres_version})
:persistent_term.put({Pleroma.Repo, :postgres_version}, 10.0)
on_exit(fn -> :persistent_term.put({Pleroma.Repo, :postgres_version}, old_version) end)
user = insert(:user)
{:ok, post} = CommonAPI.post(user, %{status: "it's wednesday my dudes"})
{:ok, _post2} = CommonAPI.post(user, %{status: "it's wednesday my bros"})
# plainto doesn't understand complex queries
assert [result] = Search.search(nil, "wednesday -dudes")
assert result.id == post.id
end
test "using websearch_to_tsquery" do
user = insert(:user)
{:ok, _post} = CommonAPI.post(user, %{status: "it's wednesday my dudes"})

View File

@ -322,26 +322,20 @@ test "search", %{conn: conn} do
end
test "search fetches remote statuses and prefers them over other results", %{conn: conn} do
old_version = :persistent_term.get({Pleroma.Repo, :postgres_version})
:persistent_term.put({Pleroma.Repo, :postgres_version}, 10.0)
on_exit(fn -> :persistent_term.put({Pleroma.Repo, :postgres_version}, old_version) end)
{:ok, %{id: activity_id}} =
CommonAPI.post(insert(:user), %{
status: "check out http://mastodon.example.org/@admin/99541947525187367"
})
capture_log(fn ->
{:ok, %{id: activity_id}} =
CommonAPI.post(insert(:user), %{
status: "check out http://mastodon.example.org/@admin/99541947525187367"
})
%{"url" => result_url, "id" => result_id} =
conn
|> get("/api/v1/search?q=http://mastodon.example.org/@admin/99541947525187367")
|> json_response_and_validate_schema(200)
|> Map.get("statuses")
|> List.first()
results =
conn
|> get("/api/v1/search?q=http://mastodon.example.org/@admin/99541947525187367")
|> json_response_and_validate_schema(200)
assert [
%{"url" => "http://mastodon.example.org/@admin/99541947525187367"},
%{"id" => ^activity_id}
] = results["statuses"]
end)
refute match?(^result_id, activity_id)
assert match?(^result_url, "http://mastodon.example.org/@admin/99541947525187367")
end
test "search doesn't show statuses that it shouldn't", %{conn: conn} do

View File

@ -17,6 +17,7 @@ defmodule Pleroma.Web.MastodonAPI.StatusViewTest do
alias Pleroma.Web.CommonAPI
alias Pleroma.Web.MastodonAPI.AccountView
alias Pleroma.Web.MastodonAPI.StatusView
alias Pleroma.Web.RichMedia.Card
require Bitwise
@ -732,56 +733,55 @@ test "it returns a a dictionary tags" do
describe "rich media cards" do
test "a rich media card without a site name renders correctly" do
page_url = "http://example.com"
page_url = "https://example.com"
card = %{
url: page_url,
image: page_url <> "/example.jpg",
title: "Example website"
}
{:ok, card} =
Card.create(page_url, %{image: page_url <> "/example.jpg", title: "Example website"})
%{provider_name: "example.com"} =
StatusView.render("card.json", %{page_url: page_url, rich_media: card})
%{provider_name: "example.com"} = StatusView.render("card.json", card)
end
test "a rich media card without a site name or image renders correctly" do
page_url = "http://example.com"
page_url = "https://example.com"
card = %{
url: page_url,
title: "Example website"
fields = %{
"url" => page_url,
"title" => "Example website"
}
%{provider_name: "example.com"} =
StatusView.render("card.json", %{page_url: page_url, rich_media: card})
{:ok, card} = Card.create(page_url, fields)
%{provider_name: "example.com"} = StatusView.render("card.json", card)
end
test "a rich media card without an image renders correctly" do
page_url = "http://example.com"
page_url = "https://example.com"
card = %{
url: page_url,
site_name: "Example site name",
title: "Example website"
fields = %{
"url" => page_url,
"site_name" => "Example site name",
"title" => "Example website"
}
%{provider_name: "example.com"} =
StatusView.render("card.json", %{page_url: page_url, rich_media: card})
{:ok, card} = Card.create(page_url, fields)
%{provider_name: "example.com"} = StatusView.render("card.json", card)
end
test "a rich media card with all relevant data renders correctly" do
page_url = "http://example.com"
page_url = "https://example.com"
card = %{
url: page_url,
site_name: "Example site name",
title: "Example website",
image: page_url <> "/example.jpg",
description: "Example description"
fields = %{
"url" => page_url,
"site_name" => "Example site name",
"title" => "Example website",
"image" => page_url <> "/example.jpg",
"description" => "Example description"
}
%{provider_name: "example.com"} =
StatusView.render("card.json", %{page_url: page_url, rich_media: card})
{:ok, card} = Card.create(page_url, fields)
%{provider_name: "example.com"} = StatusView.render("card.json", card)
end
test "a rich media card has all media proxied" do
@ -791,25 +791,25 @@ test "a rich media card has all media proxied" do
ConfigMock
|> stub_with(Pleroma.Test.StaticConfig)
page_url = "http://example.com"
page_url = "https://example.com"
card = %{
url: page_url,
site_name: "Example site name",
title: "Example website",
image: page_url <> "/example.jpg",
audio: page_url <> "/example.ogg",
video: page_url <> "/example.mp4",
description: "Example description"
fields = %{
"url" => page_url,
"site_name" => "Example site name",
"title" => "Example website",
"image" => page_url <> "/example.jpg",
"audio" => page_url <> "/example.ogg",
"video" => page_url <> "/example.mp4",
"description" => "Example description"
}
strcard = for {k, v} <- card, into: %{}, do: {to_string(k), v}
{:ok, card} = Card.create(page_url, fields)
%{
provider_name: "example.com",
image: image,
pleroma: %{opengraph: og}
} = StatusView.render("card.json", %{page_url: page_url, rich_media: strcard})
} = StatusView.render("card.json", card)
assert String.match?(image, ~r/\/proxy\//)
assert String.match?(og["image"], ~r/\/proxy\//)

View File

@ -33,9 +33,7 @@ test "it lists bookmark folders", %{conn: conn, user: user} do
"id" => ^folder_id,
"name" => "Bookmark folder",
"emoji" => nil,
"source" => %{
"emoji" => nil
}
"emoji_url" => nil
}
] = result
end
@ -57,9 +55,24 @@ test "it creates a bookmark folder", %{conn: conn} do
assert %{
"name" => "Bookmark folder",
"emoji" => "📁",
"source" => %{
"emoji" => "📁"
}
"emoji_url" => nil
} = result
end
test "it creates a bookmark folder with custom emoji", %{conn: conn} do
result =
conn
|> put_req_header("content-type", "application/json")
|> post("/api/v1/pleroma/bookmark_folders", %{
name: "Bookmark folder",
emoji: ":firefox:"
})
|> json_response_and_validate_schema(200)
assert %{
"name" => "Bookmark folder",
"emoji" => ":firefox:",
"emoji_url" => "http://localhost:4001/emoji/Firefox.gif"
} = result
end

View File

@ -9,7 +9,6 @@ defmodule Pleroma.Web.PleromaAPI.ChatMessageReferenceViewTest do
alias Pleroma.Chat
alias Pleroma.Chat.MessageReference
alias Pleroma.Object
alias Pleroma.StaticStubbedConfigMock
alias Pleroma.UnstubbedConfigMock, as: ConfigMock
alias Pleroma.Web.ActivityPub.ActivityPub
alias Pleroma.Web.CommonAPI
@ -18,6 +17,8 @@ defmodule Pleroma.Web.PleromaAPI.ChatMessageReferenceViewTest do
import Mox
import Pleroma.Factory
setup do: clear_config([:rich_media, :enabled], true)
test "it displays a chat message" do
user = insert(:user)
recipient = insert(:user)
@ -62,16 +63,7 @@ test "it displays a chat message" do
assert match?([%{shortcode: "firefox"}], chat_message[:emojis])
assert chat_message[:idempotency_key] == "123"
StaticStubbedConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
Tesla.Mock.mock_global(fn
%{url: "https://example.com/ogp"} ->
%Tesla.Env{status: 200, body: File.read!("test/fixtures/rich_media/ogp.html")}
end)
Tesla.Mock.mock_global(fn env -> apply(HttpRequestMock, :request, [env]) end)
{:ok, activity} =
CommonAPI.post_chat_message(recipient, user, "gkgkgk https://example.com/ogp",

View File

@ -0,0 +1,71 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2024 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.CardTest do
use Pleroma.DataCase, async: true
alias Pleroma.UnstubbedConfigMock, as: ConfigMock
alias Pleroma.Web.CommonAPI
alias Pleroma.Web.RichMedia.Card
import Mox
import Pleroma.Factory
import Tesla.Mock
setup do
mock_global(fn env -> apply(HttpRequestMock, :request, [env]) end)
ConfigMock
|> stub_with(Pleroma.Test.StaticConfig)
:ok
end
setup do: clear_config([:rich_media, :enabled], true)
test "crawls URL in activity" do
user = insert(:user)
url = "https://example.com/ogp"
url_hash = Card.url_to_hash(url)
{:ok, activity} =
CommonAPI.post(user, %{
status: "[test](#{url})",
content_type: "text/markdown"
})
assert %Card{url_hash: ^url_hash, fields: _} = Card.get_by_activity(activity)
end
test "recrawls URLs on updates" do
original_url = "https://google.com/"
original_url_hash = Card.url_to_hash(original_url)
updated_url = "https://yahoo.com/"
updated_url_hash = Card.url_to_hash(updated_url)
user = insert(:user)
{:ok, activity} = CommonAPI.post(user, %{status: "I like this site #{original_url}"})
# Force a backfill
Card.get_by_activity(activity)
assert match?(
%Card{url_hash: ^original_url_hash, fields: _},
Card.get_by_activity(activity)
)
{:ok, _} = CommonAPI.update(user, activity, %{status: "I like this site #{updated_url}"})
activity = Pleroma.Activity.get_by_id(activity.id)
# Force a backfill
Card.get_by_activity(activity)
assert match?(
%Card{url_hash: ^updated_url_hash, fields: _},
Card.get_by_activity(activity)
)
end
end

View File

@ -1,137 +0,0 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2022 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.HelpersTest do
use Pleroma.DataCase, async: false
alias Pleroma.StaticStubbedConfigMock, as: ConfigMock
alias Pleroma.Web.CommonAPI
alias Pleroma.Web.RichMedia.Helpers
import Mox
import Pleroma.Factory
import Tesla.Mock
setup do
mock_global(fn env -> apply(HttpRequestMock, :request, [env]) end)
ConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> false
path -> Pleroma.Test.StaticConfig.get(path)
end)
|> stub(:get, fn
path, default -> Pleroma.Test.StaticConfig.get(path, default)
end)
:ok
end
test "refuses to crawl incomplete URLs" do
user = insert(:user)
{:ok, activity} =
CommonAPI.post(user, %{
status: "[test](example.com/ogp)",
content_type: "text/markdown"
})
ConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
assert %{} == Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
end
test "refuses to crawl malformed URLs" do
user = insert(:user)
{:ok, activity} =
CommonAPI.post(user, %{
status: "[test](example.com[]/ogp)",
content_type: "text/markdown"
})
ConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
assert %{} == Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
end
test "crawls valid, complete URLs" do
user = insert(:user)
{:ok, activity} =
CommonAPI.post(user, %{
status: "[test](https://example.com/ogp)",
content_type: "text/markdown"
})
ConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
assert %{page_url: "https://example.com/ogp", rich_media: _} =
Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
end
test "recrawls URLs on updates" do
original_url = "https://google.com/"
updated_url = "https://yahoo.com/"
Pleroma.StaticStubbedConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
user = insert(:user)
{:ok, activity} = CommonAPI.post(user, %{status: "I like this site #{original_url}"})
assert match?(
%{page_url: ^original_url, rich_media: _},
Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
)
{:ok, _} = CommonAPI.update(user, activity, %{status: "I like this site #{updated_url}"})
activity = Pleroma.Activity.get_by_id(activity.id)
assert match?(
%{page_url: ^updated_url, rich_media: _},
Pleroma.Web.RichMedia.Helpers.fetch_data_for_activity(activity)
)
end
test "refuses to crawl URLs of private network from posts" do
user = insert(:user)
{:ok, activity} =
CommonAPI.post(user, %{status: "http://127.0.0.1:4000/notice/9kCP7VNyPJXFOXDrgO"})
{:ok, activity2} = CommonAPI.post(user, %{status: "https://10.111.10.1/notice/9kCP7V"})
{:ok, activity3} = CommonAPI.post(user, %{status: "https://172.16.32.40/notice/9kCP7V"})
{:ok, activity4} = CommonAPI.post(user, %{status: "https://192.168.10.40/notice/9kCP7V"})
{:ok, activity5} = CommonAPI.post(user, %{status: "https://pleroma.local/notice/9kCP7V"})
ConfigMock
|> stub(:get, fn
[:rich_media, :enabled] -> true
path -> Pleroma.Test.StaticConfig.get(path)
end)
assert %{} == Helpers.fetch_data_for_activity(activity)
assert %{} == Helpers.fetch_data_for_activity(activity2)
assert %{} == Helpers.fetch_data_for_activity(activity3)
assert %{} == Helpers.fetch_data_for_activity(activity4)
assert %{} == Helpers.fetch_data_for_activity(activity5)
end
end

View File

@ -3,8 +3,22 @@
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.Parser.TTL.AwsSignedUrlTest do
# Relies on Cachex, needs to be synchronous
use Pleroma.DataCase
use Pleroma.DataCase, async: false
use Oban.Testing, repo: Pleroma.Repo
import Mox
alias Pleroma.UnstubbedConfigMock, as: ConfigMock
alias Pleroma.Web.RichMedia.Card
setup do
ConfigMock
|> stub_with(Pleroma.Test.StaticConfig)
clear_config([:rich_media, :enabled], true)
:ok
end
test "s3 signed url is parsed correct for expiration time" do
url = "https://pleroma.social/amz"
@ -43,26 +57,29 @@ test "s3 signed url is parsed and correct ttl is set for rich media" do
<meta name="twitter:site" content="Pleroma" />
<meta name="twitter:title" content="Pleroma" />
<meta name="twitter:description" content="Pleroma" />
<meta name="twitter:image" content="#{Map.get(metadata, :image)}" />
<meta name="twitter:image" content="#{Map.get(metadata, "image")}" />
"""
Tesla.Mock.mock(fn
%{
method: :get,
url: "https://pleroma.social/amz"
url: ^url
} ->
%Tesla.Env{status: 200, body: body}
%{method: :head} ->
%Tesla.Env{status: 200}
end)
Cachex.put(:rich_media_cache, url, metadata)
Card.get_or_backfill_by_url(url)
Pleroma.Web.RichMedia.Parser.set_ttl_based_on_image(metadata, url)
assert_enqueued(worker: Pleroma.Workers.RichMediaExpirationWorker, args: %{"url" => url})
{:ok, cache_ttl} = Cachex.ttl(:rich_media_cache, url)
[%Oban.Job{scheduled_at: scheduled_at}] = all_enqueued()
# as there is delay in setting and pulling the data from cache we ignore 1 second
# make it 2 seconds for flakyness
assert_in_delta(valid_till * 1000, cache_ttl, 2000)
timestamp_dt = Timex.parse!(timestamp, "{ISO:Basic:Z}")
assert DateTime.diff(scheduled_at, timestamp_dt) == valid_till
end
defp construct_s3_url(timestamp, valid_till) do
@ -71,11 +88,11 @@ defp construct_s3_url(timestamp, valid_till) do
defp construct_metadata(timestamp, valid_till, url) do
%{
image: construct_s3_url(timestamp, valid_till),
site: "Pleroma",
title: "Pleroma",
description: "Pleroma",
url: url
"image" => construct_s3_url(timestamp, valid_till),
"site" => "Pleroma",
"title" => "Pleroma",
"description" => "Pleroma",
"url" => url
}
end
end

View File

@ -0,0 +1,41 @@
# Pleroma: A lightweight social networking server
# Copyright © 2017-2024 Pleroma Authors <https://pleroma.social/>
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.Parser.TTL.OpengraphTest do
use Pleroma.DataCase
use Oban.Testing, repo: Pleroma.Repo
import Mox
alias Pleroma.UnstubbedConfigMock, as: ConfigMock
alias Pleroma.Web.RichMedia.Card
setup do
ConfigMock
|> stub_with(Pleroma.Test.StaticConfig)
clear_config([:rich_media, :enabled], true)
:ok
end
test "OpenGraph TTL value is honored" do
url = "https://reddit.com/r/somepost"
Tesla.Mock.mock(fn
%{
method: :get,
url: ^url
} ->
%Tesla.Env{status: 200, body: File.read!("test/fixtures/rich_media/reddit.html")}
%{method: :head} ->
%Tesla.Env{status: 200}
end)
Card.get_or_backfill_by_url(url)
assert_enqueued(worker: Pleroma.Workers.RichMediaExpirationWorker, args: %{"url" => url})
end
end

View File

@ -3,7 +3,7 @@
# SPDX-License-Identifier: AGPL-3.0-only
defmodule Pleroma.Web.RichMedia.ParserTest do
use Pleroma.DataCase, async: false
use Pleroma.DataCase
alias Pleroma.Web.RichMedia.Parser
@ -104,4 +104,27 @@ test "does a HEAD request to check if the body is too large" do
test "does a HEAD request to check if the body is html" do
assert {:error, {:content_type, _}} = Parser.parse("https://example.com/pdf-file")
end
test "refuses to crawl incomplete URLs" do
url = "example.com/ogp"
assert :error == Parser.parse(url)
end
test "refuses to crawl malformed URLs" do
url = "example.com[]/ogp"
assert :error == Parser.parse(url)
end
test "refuses to crawl URLs of private network from posts" do
[
"http://127.0.0.1:4000/notice/9kCP7VNyPJXFOXDrgO",
"https://10.111.10.1/notice/9kCP7V",
"https://172.16.32.40/notice/9kCP7V",
"https://192.168.10.40/notice/9kCP7V",
"https://pleroma.local/notice/9kCP7V"
]
|> Enum.each(fn url ->
assert :error == Parser.parse(url)
end)
end
end

View File

@ -4,6 +4,8 @@
Code.put_compiler_option(:warnings_as_errors, true)
ExUnit.configure(max_cases: System.schedulers_online())
ExUnit.start(exclude: [:federated, :erratic])
if match?({:unix, :darwin}, :os.type()) do